3213a65d9649ad4ff8ab2ef2af986103829ec238

* Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * Update Readme.md * remove submodule * add mPLUG MiniGPT4 * Update Readme.md * Update Readme.md * Update Readme.md --------- Co-authored-by: Yuliang Liu <34134635+Yuliang-Liu@users.noreply.github.com>
Results
Results are available in answer_save folder.
Visualization results
Data Download
Data file | Size |
---|---|
text recognition code:iwyn | 1.37GB |
TextVQA, KIE and HME will be updated soon.
We assume that your symlinked data
directory has the following structure:
data
|_ IC13_857
|_ IC15_1811
|_ ...
|_ ESTVQA
|_ textVQA
|_ ...
|_ FUNSD
|_ POIE
Usage
eval on all datasets
python eval.py --model_name LLaVA --eval_all
eval on one dataset
python eval.py --model_name LLaVA --eval_textVQA
python eval.py --model_name LLaVA --eval_ocr --ocr_dataset_name "ct80 IIIT5K"
The results will be saved at answer folder.
If you want to add a new model, please write its inference function under the folder "models", and update the get_model function in eval.py. An example inference code is as follows:
import torch
from PIL import Image
from lavis.models import load_model_and_preprocess
from ..process import pad_image, resize_image
class lavis:
def __init__(self, model_name, model_type, device) -> None:
model, vis_processors, txt_processors = load_model_and_preprocess(name = model_name, model_type = model_type, is_eval=True, device=device)
self.model_name = model_name
self.model = model
self.vis_processors = vis_processors
self.txt_processors = txt_processors
self.device = device
def generate(self, image, question, name='resize'):
if 'opt' in self.model_name:
prompt = f'Question: {question} Answer:'
elif 't5' in self.model_name:
prompt = f'Question: {question} Short answer:'
else:
prompt = f'Question: {question} Answer:'
image = Image.open(image).convert("RGB")
if name == "pad":
image = pad_image(image, (224,224))
elif name == "resize":
image = resize_image(image, (224,224))
image = self.vis_processors["eval"](image).unsqueeze(0).to(self.device)
prompt = self.txt_processors["eval"](prompt)
answer = self.model.predict_answers(samples={"image": image, "text_input": prompt}, inference_method="generate", max_len=48, min_len=1)[0]
return answer
Related Projects
Description
Languages
Python
100%