lz 3213a65d96 add readme (#10)
* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* Update Readme.md

* remove submodule

* add mPLUG MiniGPT4

* Update Readme.md

* Update Readme.md

* Update Readme.md

---------

Co-authored-by: Yuliang Liu <34134635+Yuliang-Liu@users.noreply.github.com>
2023-06-01 09:57:03 +08:00
2023-05-27 17:21:39 +08:00
2023-05-27 17:21:39 +08:00
2023-06-01 09:57:03 +08:00
2023-05-27 17:21:39 +08:00
2023-06-01 09:57:03 +08:00

Paper.

Results

Results are available in answer_save folder.

image

Visualization results

rvk

Data Download

Data file Size
text recognition code:iwyn 1.37GB

TextVQA, KIE and HME will be updated soon.

We assume that your symlinked data directory has the following structure:

data
|_ IC13_857
|_ IC15_1811
|_ ...
|_ ESTVQA
|_ textVQA
|_ ...
|_ FUNSD
|_ POIE

Usage

eval on all datasets

python eval.py --model_name LLaVA --eval_all

eval on one dataset

python eval.py --model_name LLaVA --eval_textVQA
python eval.py --model_name LLaVA --eval_ocr --ocr_dataset_name "ct80 IIIT5K"

The results will be saved at answer folder.

If you want to add a new model, please write its inference function under the folder "models", and update the get_model function in eval.py. An example inference code is as follows

import torch
from PIL import Image
from lavis.models import load_model_and_preprocess
from ..process import pad_image, resize_image
class lavis:
    def __init__(self, model_name, model_type, device) -> None:
        model, vis_processors, txt_processors = load_model_and_preprocess(name = model_name, model_type = model_type, is_eval=True, device=device)
        self.model_name = model_name
        self.model = model
        self.vis_processors = vis_processors
        self.txt_processors = txt_processors
        self.device = device
    def generate(self, image, question, name='resize'):
        if 'opt' in self.model_name:
            prompt = f'Question: {question} Answer:'
        elif 't5' in self.model_name:
            prompt = f'Question: {question} Short answer:'
        else:
            prompt = f'Question: {question} Answer:'
        image = Image.open(image).convert("RGB")
        if name == "pad":
            image = pad_image(image, (224,224))
        elif name == "resize":
            image = resize_image(image, (224,224))
        image = self.vis_processors["eval"](image).unsqueeze(0).to(self.device)
        prompt = self.txt_processors["eval"](prompt)
        answer = self.model.predict_answers(samples={"image": image, "text_input": prompt}, inference_method="generate", max_len=48, min_len=1)[0]
        return answer

Related Projects

Description
Languages
Python 100%