
* Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * remove submodule * add mPLUG MiniGPT4 * Update Readme.md * Update Readme.md * Update Readme.md --------- Co-authored-by: Yuliang Liu <34134635+Yuliang-Liu@users.noreply.github.com>
1.7 KiB
OwlEval
We have compiled some examples and their corresponding questions from recent open-source work, and organized them into OwlEval.
Following we will introduce the OwlEval and the data format in this document.
Data Format
questions
questions.jsonl
contains case images and information about their corresponding questions
Each row contains the following field:
-
image
: Indicates the name of the picture -
question_id
: Indicate the question id number, there are 82 questions -
question
: Represents specific problem information -
type
:Indicate whether the problem is a single-turn problem or a multi-turn problem
For example:
{"image": "1.jpg", "question_id": 1, "question": "What is funny about this image? Describe it panel by panel.", "type": ["single"]}
answer
This contains the responses of each model for each question, integrated into six jsonl:
llava_13b_answer.jsonl
minigpt4_13b_answer.jsonl
MMreact_answer.jsonl
mPLUG_Owl_7b_answer.jsonl
BLIP2_13b_answer.jsonl
openflamingo_answer.jsonl
For each answer/xxx.jsonl
it contains the following information:
-
image
: Indicates the name of the picture -
question_id
: Indicate the question id number, there are 82 questions -
question
: Represents specific problem information -
answer
: Replie given by the model -
model_id
: The ID of the model the answer is generated by
For example:
{"image": "10.jpg", "question_id": 15, "question": "How many bedrooms are there in this floor plan?", "answer": "There are three bedrooms in this floor plan.", "model_id": "llava-13b"}
cases
This folder contains 50 evaluation pictures, where 21 from mini GPT-4, 13 from mm-react, 9 from blip-2, 3 from GPT-4 and 4 collected by us