Merge pull request #15 from echo840/main

add ReCTS&IAM
This commit is contained in:
lz
2023-06-09 10:39:35 +08:00
committed by GitHub
186 changed files with 294249 additions and 26 deletions
+5 -4
View File
@@ -6,11 +6,12 @@ We conducted a comprehensive study of existing publicly available multimodal mod
Results are available in answer_save folder. It should be noted that for BLIP2OPT, when using the inference code on Hugging Face, the accuracy of text recognition is high, but the model outputs nothing for the VQA tasks. Conversely, when using the LAVIS library for inference, the accuracy of text recognition is low, while the VQA accuracy is normal. We believe that the inference process of BLIP2OPT still needs to be optimized. In our experiments, we take the maximum value of the two methods as the final result.
![image](https://github.com/echo840/MultimodalOCR/assets/87795401/523e0421-7eca-4d15-89f1-3f7348321055)
![table](https://github.com/echo840/MultimodalOCR/assets/87795401/b7cb6ab7-2e6c-462c-84ae-41b9d209ce48)
Visualization results
![修改](https://github.com/echo840/MultimodalOCR/assets/87795401/b74ff847-534c-49ca-a31e-8f8854380a34)
![rvk](https://github.com/echo840/MultimodalOCR/assets/87795401/21982aba-d063-4a52-a045-8d16e0e98f71)
![Multilingualism](https://github.com/echo840/MultimodalOCR/assets/87795401/8bf5c8ab-bec7-4b77-b2bb-7a319975a762)
# Data Download
@@ -22,12 +23,12 @@ Visualization results
|[textVQA](https://textvqa.org/dataset/) val set|6.6GB|
|[docVQA](https://rrc.cvc.uab.es/?ch=17&com=downloads) Task 1 Validation set|0.8GB|
|[ESTVQA](https://cloudstor.aarnet.edu.au/plus/s/LSishuuSE5DBKJp)|5.2GB|
|[SROIE](https://rrc.cvc.uab.es/?ch=13&com=downloads) Task 3 test set|0.19GB|
|[SROIE](https://rrc.cvc.uab.es/?ch=13&com=downloads)|0.19GB|
|[FUNSD](https://guillaumejaume.github.io/FUNSD/download/)|16MB|
|[POIE](https://drive.google.com/file/d/1eEMNiVeLlD-b08XW_GfAGfPmmII-GDYs/view)|0.43GB|
|[HME100K](https://ai.100tal.com/openData/formulaRecognition)|0.69GB|
|[Google cloud](https://drive.google.com/drive/folders/1plgZf4XIuiOGjx4b17E1rvTA2UKpZRe1?usp=drive_link)|9.38GB|
TextVQA, KIE and HME will be updated soon.
We assume that your symlinked `data` directory has the following structure:
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
+3 -1
View File
@@ -20,5 +20,7 @@
"WordArt": 0.6260754467240238,
"FUNSD": 0.01020408163265306,
"HME": 0.0004,
"POIE": 0.0208827717133365
"POIE": 0.0208827717133365,
"IAM": 0.504,
"ReCTS": 0.0
}
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
View File
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
View File
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
View File
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
+20 -1
View File
@@ -20,5 +20,24 @@
"docVQA": 0.029725182277061134,
"FUNSD": 0.011904761904761904,
"HME": 0.0,
"POIE": 0.013130833728840373
"POIE": 0.013130833728840373,
"IAM": 0.23933333333333334,
"ReCTS": 0.0
}
{
"IIIT5K": 0.48,
"svt": 0.5038639876352395,
"IC13_857": 0.48891481913652274,
"IC15_1811": 0.4218663721700718,
"svtp": 0.5038759689922481,
"ct80": 0.5729166666666666,
"cocotext": 0.2625303152789006,
"ctw": 0.41857506361323155,
"totaltext": 0.4057246706042708,
"HOST": 0.34519867549668876,
"WOST": 0.4105960264900662,
"WordArt": 0.514228987425546,
"IAM": 0.289,
"help":"we replace all special chars with \" \" instead of \"\", which is useful for MiniGPT4."
}
+1
View File
@@ -11,5 +11,6 @@
"HOST": 0.34519867549668876,
"WOST": 0.4105960264900662,
"WordArt":0.514228987425546,
"IAM": 0.289,
"help":"we replace all special chars with \" \" instead of \"\", which is useful for MiniGPT4."
}
Regular → Executable
View File
Regular → Executable
View File
View File
View File
View File
View File
View File
View File
View File
File diff suppressed because it is too large Load Diff
View File
View File
View File
View File
File diff suppressed because it is too large Load Diff
View File
View File
View File
View File
View File
View File
View File
View File
View File
+3 -1
View File
@@ -19,5 +19,7 @@
"SROIE": 0.0011985617259288853,
"FUNSD": 0.008503401360544218,
"POIE": 0.021199177345356746,
"HME": 0.0
"HME": 0.0,
"IAM": 0.4553333333333333,
"ReCTS": 0.0
}
View File
View File
View File
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
File diff suppressed because it is too large Load Diff
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
View File
Regular → Executable
+3 -1
View File
@@ -20,5 +20,7 @@
"WordArt": 0.7379219060225016,
"HME": 0.0004,
"FUNSD": 0.011904761904761904,
"POIE": 0.025154247745609874
"POIE": 0.025154247745609874,
"IAM": 0.405,
"ReCTS": 0.0
}
Regular → Executable
View File
Regular → Executable
View File

Some files were not shown because too many files have changed in this diff Show More