2023-06-01 09:57:03 +08:00
[Paper ](https://arxiv.org/pdf/2305.07895.pdf ).
2023-05-12 21:43:16 +08:00
2023-06-01 09:57:03 +08:00
# Results
2023-05-12 22:01:31 +08:00
2023-06-01 09:57:03 +08:00
Results are available in answer_save folder.
2023-05-12 21:43:16 +08:00
2023-06-01 09:57:03 +08:00

Visualization results

# Data Download
| Data file | Size |
| --- | ---: |
|[text recognition ](https://pan.baidu.com/s/1Ba950d94u8RQmtqvkLBk-A ) code:iwyn | 1.37GB |
TextVQA, KIE and HME will be updated soon.
We assume that your symlinked `data` directory has the following structure:
```
data
|_ IC13_857
|_ IC15_1811
|_ ...
|_ ESTVQA
|_ textVQA
|_ ...
|_ FUNSD
|_ POIE
```
# Usage
eval on all datasets
```Shell
python eval.py --model_name LLaVA --eval_all
```
eval on one dataset
```Shell
python eval.py --model_name LLaVA --eval_textVQA
```
```Shell
python eval.py --model_name LLaVA --eval_ocr --ocr_dataset_name "ct80 IIIT5K"
```
The results will be saved at answer folder.
If you want to add a new model, please write its inference function under the folder "models", and update the get_model function in eval.py. An example inference code is as follows:
```Shell
import torch
from PIL import Image
from lavis.models import load_model_and_preprocess
from ..process import pad_image, resize_image
class lavis:
def __init__ (self, model_name, model_type, device) -> None:
model, vis_processors, txt_processors = load_model_and_preprocess(name = model_name, model_type = model_type, is_eval=True, device=device)
self.model_name = model_name
self.model = model
self.vis_processors = vis_processors
self.txt_processors = txt_processors
self.device = device
def generate(self, image, question, name='resize'):
if 'opt' in self.model_name:
prompt = f'Question: {question} Answer:'
elif 't5' in self.model_name:
prompt = f'Question: {question} Short answer:'
else:
prompt = f'Question: {question} Answer:'
image = Image.open(image).convert("RGB")
if name == "pad":
image = pad_image(image, (224,224))
elif name == "resize":
image = resize_image(image, (224,224))
image = self.vis_processors["eval" ](image ).unsqueeze(0).to(self.device)
prompt = self.txt_processors["eval" ](prompt )
answer = self.model.predict_answers(samples={"image": image, "text_input": prompt}, inference_method="generate", max_len=48, min_len=1)[0]
return answer
```
# Related Projects
- [LLaVA ](https://github.com/haotian-liu/LLaVA.git )
- [MiniGPT4 ](https://github.com/Vision-CAIR/MiniGPT-4.git )
- [mPLUG-Owl ](https://github.com/X-PLUG/mPLUG-Owl.git )
- [OpenFlamingo ](https://github.com/mlfoundations/open_flamingo.git )
- [Lavis ](https://github.com/salesforce/LAVIS.git )