@@ -41,17 +41,17 @@ python ./scripts/monkey.py --image_folder ./OCRBench_Images --OCRBench_file ./OC
|
||||
If you wish to refer to the baseline results published here, please use the following BibTeX entries:
|
||||
```BibTeX
|
||||
@article{Liu_2024,
|
||||
title={OCRBench: on the hidden mystery of OCR in large multimodal models},
|
||||
volume={67},
|
||||
ISSN={1869-1919},
|
||||
url={http://dx.doi.org/10.1007/s11432-024-4235-6},
|
||||
DOI={10.1007/s11432-024-4235-6},
|
||||
number={12},
|
||||
journal={Science China Information Sciences},
|
||||
publisher={Springer Science and Business Media LLC},
|
||||
author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
|
||||
year={2024},
|
||||
month=dec }
|
||||
title={OCRBench: on the hidden mystery of OCR in large multimodal models},
|
||||
volume={67},
|
||||
ISSN={1869-1919},
|
||||
url={http://dx.doi.org/10.1007/s11432-024-4235-6},
|
||||
DOI={10.1007/s11432-024-4235-6},
|
||||
number={12},
|
||||
journal={Science China Information Sciences},
|
||||
publisher={Springer Science and Business Media LLC},
|
||||
author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
|
||||
year={2024},
|
||||
month=dec }
|
||||
```
|
||||
|
||||
|
||||
|
@@ -2,7 +2,7 @@
|
||||
|
||||
> Scoring the Optical Character Recognition (OCR) capabilities of Large Multimodal Models (LMMs) has witnessed growing interest recently. Existing benchmarks have highlighted the impressive performance of LMMs in text recognition; however, their abilities in certain challenging tasks, such as text localization, handwritten content extraction, and logical reasoning, remain underexplored. To bridge this gap, we introduce OCRBench v2, a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4X more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10,000 human-verified question-answering pairs and a high proportion of difficult samples. After carefully benchmarking state-of-the-art LMMs on OCRBench v2, we find that 36 out of 38 LMMs score below 50 (100 in total) and suffer from five-type limitations, including less frequently encountered text recognition, fine-grained perception, layout perception, complex element parsing, and logical reasoning.
|
||||
|
||||
**[Project Page](https://github.com/Yuliang-Liu/MultimodalOCR)** | **Paper(Coming soon)** | **[OCRBench v2 Leaderboard](https://huggingface.co/spaces/ling99/OCRBench-v2-leaderboard)**
|
||||
**[Project Page](https://github.com/Yuliang-Liu/MultimodalOCR)** | **[Paper](https://arxiv.org/abs/2501.00321)** | **[OCRBench v2 Leaderboard](https://huggingface.co/spaces/ling99/OCRBench-v2-leaderboard)**
|
||||
|
||||
<p align="center">
|
||||
<img src="https://v1.ax1x.com/2024/12/30/7VhCnP.jpg" width="88%" height="80%">
|
||||
@@ -81,4 +81,14 @@ python ./eval_scripts/get_score.py --json_file ./res_folder/internvl2_5_26b.json
|
||||
The data are collected from public datasets and community user contributions. This dataset is for research purposes only and not for commercial use. If you have any copyright concerns, please contact ling_fu@hust.edu.cn.
|
||||
|
||||
# Citation
|
||||
Coming soon
|
||||
```BibTeX
|
||||
@misc{fu2024ocrbenchv2improvedbenchmark,
|
||||
title={OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning},
|
||||
author={Ling Fu and Biao Yang and Zhebin Kuang and Jiajun Song and Yuzhe Li and Linghao Zhu and Qidi Luo and Xinyu Wang and Hao Lu and Mingxin Huang and Zhang Li and Guozhi Tang and Bin Shan and Chunhui Lin and Qi Liu and Binghong Wu and Hao Feng and Hao Liu and Can Huang and Jingqun Tang and Wei Chen and Lianwen Jin and Yuliang Liu and Xiang Bai},
|
||||
year={2024},
|
||||
eprint={2501.00321},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2501.00321},
|
||||
}
|
||||
```
|
||||
|
32
README.md
32
README.md
@@ -38,17 +38,27 @@
|
||||
If you wish to refer to the baseline results published here, please use the following BibTeX entries:
|
||||
```BibTeX
|
||||
@article{Liu_2024,
|
||||
title={OCRBench: on the hidden mystery of OCR in large multimodal models},
|
||||
volume={67},
|
||||
ISSN={1869-1919},
|
||||
url={http://dx.doi.org/10.1007/s11432-024-4235-6},
|
||||
DOI={10.1007/s11432-024-4235-6},
|
||||
number={12},
|
||||
journal={Science China Information Sciences},
|
||||
publisher={Springer Science and Business Media LLC},
|
||||
author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
|
||||
year={2024},
|
||||
month=dec }
|
||||
title={OCRBench: on the hidden mystery of OCR in large multimodal models},
|
||||
volume={67},
|
||||
ISSN={1869-1919},
|
||||
url={http://dx.doi.org/10.1007/s11432-024-4235-6},
|
||||
DOI={10.1007/s11432-024-4235-6},
|
||||
number={12},
|
||||
journal={Science China Information Sciences},
|
||||
publisher={Springer Science and Business Media LLC},
|
||||
author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
|
||||
year={2024},
|
||||
month=dec }
|
||||
|
||||
@misc{fu2024ocrbenchv2improvedbenchmark,
|
||||
title={OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning},
|
||||
author={Ling Fu and Biao Yang and Zhebin Kuang and Jiajun Song and Yuzhe Li and Linghao Zhu and Qidi Luo and Xinyu Wang and Hao Lu and Mingxin Huang and Zhang Li and Guozhi Tang and Bin Shan and Chunhui Lin and Qi Liu and Binghong Wu and Hao Feng and Hao Liu and Can Huang and Jingqun Tang and Wei Chen and Lianwen Jin and Yuliang Liu and Xiang Bai},
|
||||
year={2024},
|
||||
eprint={2501.00321},
|
||||
archivePrefix={arXiv},
|
||||
primaryClass={cs.CV},
|
||||
url={https://arxiv.org/abs/2501.00321},
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
|
Reference in New Issue
Block a user