41
README.md
41
README.md
@@ -1,20 +1,45 @@
|
||||
# OCRBench & OCRBench v2
|
||||
|
||||
**This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).**
|
||||
|
||||
<div align="center" xmlns="http://www.w3.org/1999/html">
|
||||
<h1 align="center">
|
||||
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning
|
||||
</h1>
|
||||
|
||||
[](https://99franklin.github.io/ocrbench_v2/)
|
||||
[](https://arxiv.org/abs/2501.00321)
|
||||
[](https://huggingface.co/datasets/ling99/OCRBench_v2)
|
||||
[](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue)
|
||||
[](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed)
|
||||
</div>
|
||||
|
||||
|
||||
> **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**<br>
|
||||
> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai <br>
|
||||
[](https://arxiv.org/abs/2501.00321)
|
||||
[](https://huggingface.co/datasets/ling99/OCRBench_v2)
|
||||
[](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing)
|
||||
|
||||
|
||||
**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md).
|
||||
|
||||
<p align="center">
|
||||
<img src="https://v1.ax1x.com/2024/12/30/7VhCnP.jpg" width="88%" height="80%">
|
||||
<p>
|
||||
|
||||
> **OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models**<br>
|
||||
> Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai <br>
|
||||
[](https://arxiv.org/abs/2305.07895)
|
||||
[](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md)
|
||||
|
||||
|
||||
**OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md).
|
||||
|
||||
<p align="center">
|
||||
<img src="./OCRBench/images/all_data.png" width="88%" height="80%">
|
||||
</p>
|
||||
|
||||
**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md).
|
||||
|
||||
<p align="center">
|
||||
<img src="https://v1.ax1x.com/2024/12/30/7VhCnP.jpg" width="88%" height="80%">
|
||||
<p>
|
||||
|
||||
# News
|
||||
* ```2025.6.21``` 🚀 We realese the private dataset of OCRBench v2 and will update [Leaderboard](https://99franklin.github.io/ocrbench_v2/) every quarter.
|
||||
* ```2024.12.31``` 🚀 [OCRBench v2](./OCRBench_v2/README.md) is released.
|
||||
* ```2024.12.11``` 🚀 OCRBench has been accepted by [Science China Information Sciences](https://link.springer.com/article/10.1007/s11432-024-4235-6).
|
||||
* ```2024.5.19 ``` 🚀 We realese [DTVQA](https://github.com/ShuoZhang2003/DT-VQA), to explore the Capabilities of Large Multimodal Models on Dense Text.
|
||||
|
Reference in New Issue
Block a user