
* Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * remove submodule * add mPLUG MiniGPT4 * Update Readme.md * Update Readme.md * Update Readme.md --------- Co-authored-by: Yuliang Liu <34134635+Yuliang-Liu@users.noreply.github.com>
91 lines
2.8 KiB
Markdown
91 lines
2.8 KiB
Markdown
[Paper](https://arxiv.org/pdf/2305.07895.pdf).
|
||
|
||
# Results
|
||
|
||
Results are available in answer_save folder.
|
||
|
||

|
||
|
||
Visualization results
|
||
|
||

|
||
|
||
|
||
# Data Download
|
||
| Data file | Size |
|
||
| --- | ---: |
|
||
|[text recognition](https://pan.baidu.com/s/1Ba950d94u8RQmtqvkLBk-A) code:iwyn | 1.37GB |
|
||
|
||
TextVQA, KIE and HME will be updated soon.
|
||
|
||
We assume that your symlinked `data` directory has the following structure:
|
||
|
||
```
|
||
data
|
||
|_ IC13_857
|
||
|_ IC15_1811
|
||
|_ ...
|
||
|_ ESTVQA
|
||
|_ textVQA
|
||
|_ ...
|
||
|_ FUNSD
|
||
|_ POIE
|
||
```
|
||
|
||
|
||
# Usage
|
||
|
||
eval on all datasets
|
||
```Shell
|
||
python eval.py --model_name LLaVA --eval_all
|
||
```
|
||
|
||
eval on one dataset
|
||
```Shell
|
||
python eval.py --model_name LLaVA --eval_textVQA
|
||
```
|
||
```Shell
|
||
python eval.py --model_name LLaVA --eval_ocr --ocr_dataset_name "ct80 IIIT5K"
|
||
```
|
||
The results will be saved at answer folder.
|
||
|
||
If you want to add a new model, please write its inference function under the folder "models", and update the get_model function in eval.py. An example inference code is as follows:
|
||
|
||
```Shell
|
||
import torch
|
||
from PIL import Image
|
||
from lavis.models import load_model_and_preprocess
|
||
from ..process import pad_image, resize_image
|
||
class lavis:
|
||
def __init__(self, model_name, model_type, device) -> None:
|
||
model, vis_processors, txt_processors = load_model_and_preprocess(name = model_name, model_type = model_type, is_eval=True, device=device)
|
||
self.model_name = model_name
|
||
self.model = model
|
||
self.vis_processors = vis_processors
|
||
self.txt_processors = txt_processors
|
||
self.device = device
|
||
def generate(self, image, question, name='resize'):
|
||
if 'opt' in self.model_name:
|
||
prompt = f'Question: {question} Answer:'
|
||
elif 't5' in self.model_name:
|
||
prompt = f'Question: {question} Short answer:'
|
||
else:
|
||
prompt = f'Question: {question} Answer:'
|
||
image = Image.open(image).convert("RGB")
|
||
if name == "pad":
|
||
image = pad_image(image, (224,224))
|
||
elif name == "resize":
|
||
image = resize_image(image, (224,224))
|
||
image = self.vis_processors["eval"](image).unsqueeze(0).to(self.device)
|
||
prompt = self.txt_processors["eval"](prompt)
|
||
answer = self.model.predict_answers(samples={"image": image, "text_input": prompt}, inference_method="generate", max_len=48, min_len=1)[0]
|
||
return answer
|
||
```
|
||
|
||
# Related Projects
|
||
- [LLaVA](https://github.com/haotian-liu/LLaVA.git)
|
||
- [MiniGPT4](https://github.com/Vision-CAIR/MiniGPT-4.git)
|
||
- [mPLUG-Owl](https://github.com/X-PLUG/mPLUG-Owl.git)
|
||
- [OpenFlamingo](https://github.com/mlfoundations/open_flamingo.git)
|
||
- [Lavis](https://github.com/salesforce/LAVIS.git)
|