MultimodalOCR/README.md

**This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).**

<div align="center" xmlns="http://www.w3.org/1999/html">
<h1 align="center">
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
</h1>
  
[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=google-analytics)](https://99franklin.github.io/ocrbench_v2/)
[![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321)
[![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2)
[![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue)
[![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MultimodalOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed)
</div>


> **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**<br>
> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai <br>
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) 
[![dataset](https://img.shields.io/badge/HuggingFace-Download-green?logo=huggingface)](https://huggingface.co/datasets/ling99/OCRBench_v2)
[![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing)


**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md).

<p align="center">
    <img src="https://v1.ax1x.com/2024/12/30/7VhCnP.jpg" width="88%" height="80%">
<p>
  
> **OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models**<br>
> Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai <br>
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2305.07895) 
[![Dataset](https://img.shields.io/badge/Dataset-Available-green)](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md)


**OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md).

<p align="center">
  <img src="./OCRBench/images/all_data.png" width="88%" height="80%">
</p>

# News 
* ```2025.6.21``` 🚀 We realese the private dataset of OCRBench v2 and  will update [Leaderboard](https://99franklin.github.io/ocrbench_v2/) every quarter.
* ```2024.12.31``` 🚀 [OCRBench v2](./OCRBench_v2/README.md) is released.
* ```2024.12.11``` 🚀 OCRBench has been accepted by [Science China Information Sciences](https://link.springer.com/article/10.1007/s11432-024-4235-6).
* ```2024.5.19 ``` 🚀 We realese [DTVQA](https://github.com/ShuoZhang2003/DT-VQA), to explore the Capabilities of Large Multimodal Models on Dense Text.
* ```2024.5.01 ``` 🚀 Thanks to [SWHL](https://github.com/Yuliang-Liu/MultimodalOCR/issues/29) for releasing [ChineseOCRBench](https://huggingface.co/datasets/SWHL/ChineseOCRBench).
* ```2024.3.26 ``` 🚀 OCRBench is now supported in [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
* ```2024.3.12 ``` 🚀 We plan to construct OCRBench v2 to include more ocr tasks and data. Any contribution will be appreciated.
* ```2024.2.25 ``` 🚀 OCRBench is now supported in [VLMEvalKit](https://github.com/open-compass/VLMEvalKit).


# Other Related Multilingual Datasets
| Data | Link | Description |
| --- | --- | --- |
| EST-VQA Dataset (CVPR 2020, English and Chinese) | [Link](https://github.com/xinke-wang/EST-VQA) | On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering. |
| Swahili Dataset (ICDAR 2024) | [Link](https://arxiv.org/abs/2405.11437) | The First Swahili Language Scene Text Detection and Recognition Dataset. |
| Urdu Dataset (ICDAR 2024) | [Link](https://arxiv.org/abs/2405.12533) | Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering. |
| MTVQA (9 languages) | [Link](https://arxiv.org/abs/2405.11985) | MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. |
| EVOBC (Oracle Bone Script Evolution Dataset) | [Link](https://arxiv.org/abs/2401.12467) | We systematically collected ancient characters from authoritative texts and websites spanning six historical stages. |
| HUST-OBC (Oracle Bone Script Character Dataset) | [Link](https://arxiv.org/abs/2401.15365) | For deciphering oracle bone script characters. |

# Citation
If you wish to refer to the baseline results published here, please use the following BibTeX entries:
```BibTeX
@article{Liu_2024,
    title={OCRBench: on the hidden mystery of OCR in large multimodal models},
    volume={67},
    ISSN={1869-1919},
    url={http://dx.doi.org/10.1007/s11432-024-4235-6},
    DOI={10.1007/s11432-024-4235-6},
    number={12},
    journal={Science China Information Sciences},
    publisher={Springer Science and Business Media LLC},
    author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
    year={2024},
    month=dec }
  
@misc{fu2024ocrbenchv2improvedbenchmark,
    title={OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning}, 
    author={Ling Fu and Biao Yang and Zhebin Kuang and Jiajun Song and Yuzhe Li and Linghao Zhu and Qidi Luo and Xinyu Wang and Hao Lu and Mingxin Huang and Zhang Li and Guozhi Tang and Bin Shan and Chunhui Lin and Qi Liu and Binghong Wu and Hao Feng and Hao Liu and Can Huang and Jingqun Tang and Wei Chen and Lianwen Jin and Yuliang Liu and Xiang Bai},
    year={2024},
    eprint={2501.00321},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2501.00321}, 
}
```
-												Update README.md
											
										
										
											2025-06-20 20:05:13 +08:00
+								**This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).**
 								<div align="center" xmlns="http://www.w3.org/1999/html">
 								<h1 align="center">
 								OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
 								</h1>
-												Update README.md
											
										
										
											2025-06-20 20:49:41 +08:00
+								[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=google-analytics)](https://99franklin.github.io/ocrbench_v2/)
-												Update README.md
											
										
										
											2025-06-20 20:18:12 +08:00
+								[![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321)
 								[![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2)
 								[![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue)
 								[![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MultimodalOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed)
-												Update README.md
											
										
										
											2025-06-20 20:05:13 +08:00
+								</div>
 								> **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**<br>
 								> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai <br>
 								[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321)
-												Update README.md
											
										
										
											2025-06-20 20:25:16 +08:00
+								[![dataset](https://img.shields.io/badge/HuggingFace-Download-green?logo=huggingface)](https://huggingface.co/datasets/ling99/OCRBench_v2)
-												Update README.md
											
										
										
											2025-06-20 20:23:48 +08:00
+								[![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing)
-												Update README.md
											
										
										
											2025-06-20 20:05:13 +08:00
-												Update README.md
											
										
										
											2025-06-20 20:26:16 +08:00
+								**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md).
-												Update README.md
											
										
										
											2025-06-20 20:05:13 +08:00
-												Update README.md
											
										
										
											2025-06-20 20:26:16 +08:00
+								<p align="center">
 								    <img src="https://v1.ax1x.com/2024/12/30/7VhCnP.jpg" width="88%" height="80%">
 								<p>
-												Update README.md
											
										
										
											2025-06-20 21:02:29 +08:00
 								> **OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models**<br>
 								> Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai <br>
 								[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2305.07895)
-												Update README.md
											
										
										
											2025-06-20 21:03:01 +08:00
+								[![Dataset](https://img.shields.io/badge/Dataset-Available-green)](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md)
-												Update README.md
											
										
										
											2025-06-20 21:02:29 +08:00
-												Create README.md
											
										
										
											2024-01-17 22:31:01 +08:00
-												add OCRBench v2

											
										
										
											2024-12-30 19:30:31 +08:00
+								**OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md).
 								<p align="center">
 								  <img src="./OCRBench/images/all_data.png" width="88%" height="80%">
 								</p>
-												Update README.md
											
										
										
											2024-03-04 15:40:10 +08:00
+								# News
-												Update README.md
											
										
										
											2025-06-20 23:42:58 +08:00
+								* ```2025.6.21``` 🚀 We realese the private dataset of OCRBench v2 and  will update [Leaderboard](https://99franklin.github.io/ocrbench_v2/) every quarter.
-												add OCRBench v2

											
										
										
											2024-12-30 19:30:31 +08:00
+								* ```2024.12.31``` 🚀 [OCRBench v2](./OCRBench_v2/README.md) is released.
-												Update README.md
											
										
										
											2024-12-18 19:01:26 +08:00
+								* ```2024.12.11``` 🚀 OCRBench has been accepted by [Science China Information Sciences](https://link.springer.com/article/10.1007/s11432-024-4235-6).
-												Update README.md
											
										
										
											2024-05-19 15:25:30 +08:00
+								* ```2024.5.19 ``` 🚀 We realese [DTVQA](https://github.com/ShuoZhang2003/DT-VQA), to explore the Capabilities of Large Multimodal Models on Dense Text.
-												Update README.md
											
										
										
											2024-05-02 10:11:16 +08:00
+								* ```2024.5.01 ``` 🚀 Thanks to [SWHL](https://github.com/Yuliang-Liu/MultimodalOCR/issues/29) for releasing [ChineseOCRBench](https://huggingface.co/datasets/SWHL/ChineseOCRBench).
-												Update README.md
											
										
										
											2024-03-26 14:57:30 +08:00
+								* ```2024.3.26 ``` 🚀 OCRBench is now supported in [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval).
-												Update README.md
											
										
										
											2024-03-12 12:05:35 +08:00
+								* ```2024.3.12 ``` 🚀 We plan to construct OCRBench v2 to include more ocr tasks and data. Any contribution will be appreciated.
-												Update README.md
											
										
										
											2024-03-04 15:40:43 +08:00
+								* ```2024.2.25 ``` 🚀 OCRBench is now supported in [VLMEvalKit](https://github.com/open-compass/VLMEvalKit).
-												Update README.md
											
										
										
											2024-03-04 15:40:10 +08:00
-												Update README.md
											
										
										
											2024-01-17 22:56:04 +08:00
-												Update README.md
											
										
										
											2024-05-23 11:14:18 +08:00
+								# Other Related Multilingual Datasets
 								| Data | Link | Description |
 								| --- | --- | --- |
 								| EST-VQA Dataset (CVPR 2020, English and Chinese) | [Link](https://github.com/xinke-wang/EST-VQA) | On the General Value of Evidence, and Bilingual Scene-Text Visual Question Answering. |
 								| Swahili Dataset (ICDAR 2024) | [Link](https://arxiv.org/abs/2405.11437) | The First Swahili Language Scene Text Detection and Recognition Dataset. |
 								| Urdu Dataset (ICDAR 2024) | [Link](https://arxiv.org/abs/2405.12533) | Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering. |
 								| MTVQA (9 languages) | [Link](https://arxiv.org/abs/2405.11985) | MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering. |
-												Update README.md
											
										
										
											2024-05-23 11:42:08 +08:00
+								| EVOBC (Oracle Bone Script Evolution Dataset) | [Link](https://arxiv.org/abs/2401.12467) | We systematically collected ancient characters from authoritative texts and websites spanning six historical stages. |
 								| HUST-OBC (Oracle Bone Script Character Dataset) | [Link](https://arxiv.org/abs/2401.15365) | For deciphering oracle bone script characters. |
-												Update README.md
											
										
										
											2024-05-23 11:14:18 +08:00
-												Update README.md
											
										
										
											2024-01-17 23:00:13 +08:00
+								# Citation
 								If you wish to refer to the baseline results published here, please use the following BibTeX entries:
 								```BibTeX
-												Update README.md
											
										
										
											2024-12-18 18:05:50 +08:00
+								@article{Liu_2024,
-												add paper link

											
										
										
											2025-01-03 15:13:35 +08:00
+								    title={OCRBench: on the hidden mystery of OCR in large multimodal models},
 								    volume={67},
 								    ISSN={1869-1919},
 								    url={http://dx.doi.org/10.1007/s11432-024-4235-6},
 								    DOI={10.1007/s11432-024-4235-6},
 								    number={12},
 								    journal={Science China Information Sciences},
 								    publisher={Springer Science and Business Media LLC},
 								    author={Liu, Yuliang and Li, Zhang and Huang, Mingxin and Yang, Biao and Yu, Wenwen and Li, Chunyuan and Yin, Xu-Cheng and Liu, Cheng-Lin and Jin, Lianwen and Bai, Xiang},
 								    year={2024},
 								    month=dec }
-												add paper link

											
										
										
											2025-01-03 15:10:20 +08:00
 								@misc{fu2024ocrbenchv2improvedbenchmark,
-												add paper link

											
										
										
											2025-01-03 15:13:35 +08:00
+								    title={OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning},
 								    author={Ling Fu and Biao Yang and Zhebin Kuang and Jiajun Song and Yuzhe Li and Linghao Zhu and Qidi Luo and Xinyu Wang and Hao Lu and Mingxin Huang and Zhang Li and Guozhi Tang and Bin Shan and Chunhui Lin and Qi Liu and Binghong Wu and Hao Feng and Hao Liu and Can Huang and Jingqun Tang and Wei Chen and Lianwen Jin and Yuliang Liu and Xiang Bai},
 								    year={2024},
 								    eprint={2501.00321},
 								    archivePrefix={arXiv},
 								    primaryClass={cs.CV},
 								    url={https://arxiv.org/abs/2501.00321},
-												add paper link

											
										
										
											2025-01-03 15:10:20 +08:00
+								}
-												Update README.md
											
										
										
											2024-01-17 23:00:13 +08:00
+								```
-												Create README.md
											
										
										
											2024-01-17 22:31:01 +08:00