60 lines
1.7 KiB
Markdown
60 lines
1.7 KiB
Markdown
![]() |
# OwlEval
|
|||
|
|
|||
|
We have compiled some examples and their corresponding questions from recent open-source work, and organized them into OwlEval.
|
|||
|
|
|||
|
Following we will introduce the OwlEval and the data format in this document.
|
|||
|
|
|||
|
## Data Format
|
|||
|
|
|||
|
### questions
|
|||
|
|
|||
|
`questions.jsonl` contains case images and information about their corresponding questions
|
|||
|
|
|||
|
Each row contains the following field:
|
|||
|
|
|||
|
- `image`: Indicates the name of the picture
|
|||
|
- `question_id`: Indicate the question id number, there are 82 questions
|
|||
|
|
|||
|
- `question`: Represents specific problem information
|
|||
|
- `type`:Indicate whether the problem is a single-turn problem or a multi-turn problem
|
|||
|
|
|||
|
For example:
|
|||
|
|
|||
|
```json
|
|||
|
{"image": "1.jpg", "question_id": 1, "question": "What is funny about this image? Describe it panel by panel.", "type": ["single"]}
|
|||
|
```
|
|||
|
|
|||
|
### answer
|
|||
|
|
|||
|
This contains the responses of each model for each question, integrated into six jsonl:
|
|||
|
|
|||
|
`llava_13b_answer.jsonl`
|
|||
|
|
|||
|
`minigpt4_13b_answer.jsonl`
|
|||
|
|
|||
|
`MMreact_answer.jsonl`
|
|||
|
|
|||
|
`mPLUG_Owl_7b_answer.jsonl`
|
|||
|
|
|||
|
`BLIP2_13b_answer.jsonl`
|
|||
|
|
|||
|
`openflamingo_answer.jsonl`
|
|||
|
|
|||
|
For each `answer/xxx.jsonl` it contains the following information:
|
|||
|
|
|||
|
- `image`: Indicates the name of the picture
|
|||
|
- `question_id`: Indicate the question id number, there are 82 questions
|
|||
|
|
|||
|
- `question`: Represents specific problem information
|
|||
|
- `answer`: Replie given by the model
|
|||
|
- `model_id`: The ID of the model the answer is generated by
|
|||
|
|
|||
|
For example:
|
|||
|
|
|||
|
```json
|
|||
|
{"image": "10.jpg", "question_id": 15, "question": "How many bedrooms are there in this floor plan?", "answer": "There are three bedrooms in this floor plan.", "model_id": "llava-13b"}
|
|||
|
```
|
|||
|
|
|||
|
### cases
|
|||
|
|
|||
|
This folder contains 50 evaluation pictures, where 21 from mini GPT-4, 13 from mm-react, 9 from blip-2, 3 from GPT-4 and 4 collected by us
|