From b1056d48083427c1610f5ce8060e19a82811ff52 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:05:13 +0800 Subject: [PATCH 01/16] Update README.md --- README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/README.md b/README.md index aefbedf..b925b1c 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,32 @@ +# OCRBench & OCRBench v2 + +**This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).** + +
+

+OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning +

+ + +[![Leaderboard](https://99franklin.github.io/ocrbench_v2/) +[![arXiv](https://img.shields.io/badge/Arxiv-MonkeyOCR-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) +[![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) +[![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MonkeyOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues) +[![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MonkeyOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue%20state%3Aclosed) +[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=MonkeyOCR&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR) +
+ + +> **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**
+> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
+[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) +[![Source_code](https://img.shields.io/badge/Code-Available-white)](README.md) +[![dataset](https://img.shields.io/badge/HuggingFace-gray)](https://arxiv.org/abs/2501.00321) +[![ataset](https://img.shields.io/badge/Google Drive-green)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) + + + + # OCRBench & OCRBench v2 **This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).** From 164e3c5ff2f8a2d24ffd3f34f9d42d47600b80c0 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:18:12 +0800 Subject: [PATCH 02/16] Update README.md --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index b925b1c..79b2171 100644 --- a/README.md +++ b/README.md @@ -8,12 +8,12 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis -[![Leaderboard](https://99franklin.github.io/ocrbench_v2/) -[![arXiv](https://img.shields.io/badge/Arxiv-MonkeyOCR-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) -[![HuggingFace](https://img.shields.io/badge/HuggingFace%20Weights-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) -[![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MonkeyOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues) -[![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MonkeyOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue%20state%3Aclosed) -[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=MonkeyOCR&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR) +[![Leaderboard(https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)]](https://99franklin.github.io/ocrbench_v2/) +[![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) +[![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) +[![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) +[![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MultimodalOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed) +[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=OCRBenchV2&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR) @@ -22,7 +22,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis [![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![Source_code](https://img.shields.io/badge/Code-Available-white)](README.md) [![dataset](https://img.shields.io/badge/HuggingFace-gray)](https://arxiv.org/abs/2501.00321) -[![ataset](https://img.shields.io/badge/Google Drive-green)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) +[![dataset](https://img.shields.io/badge/Google Drive-green)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) From e2c99b52eab7103e3568d9e2d31c22aad20bdf42 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:19:57 +0800 Subject: [PATCH 03/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 79b2171..6dbe0e6 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis -[![Leaderboard(https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)]](https://99franklin.github.io/ocrbench_v2/) +[![Leaderboard(https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) From 405e8f52b2e143b270e45e7b105b6bdd806eab0b Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:20:19 +0800 Subject: [PATCH 04/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6dbe0e6..0a8ae14 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis -[![Leaderboard(https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)](https://99franklin.github.io/ocrbench_v2/) +[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) From 8049c795db1667d29ce35fb70fa52a3e7cb3a425 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:21:34 +0800 Subject: [PATCH 05/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 0a8ae14..e573396 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) [![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MultimodalOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed) -[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=OCRBenchV2&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR) +[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=MultimodalOCR&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR) From abef6ff79a3706abb0a72b533f5e983dd92c7a68 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:23:48 +0800 Subject: [PATCH 06/16] Update README.md --- README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e573396..4aade89 100644 --- a/README.md +++ b/README.md @@ -6,7 +6,6 @@

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning

- [![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) @@ -21,8 +20,8 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis > Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![Source_code](https://img.shields.io/badge/Code-Available-white)](README.md) -[![dataset](https://img.shields.io/badge/HuggingFace-gray)](https://arxiv.org/abs/2501.00321) -[![dataset](https://img.shields.io/badge/Google Drive-green)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) +[![dataset](https://img.shields.io/badge/HuggingFace-gray)](https://huggingface.co/datasets/ling99/OCRBench_v2) +[![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) From 593d5b750dcb3059d3e6dfd2fee26fd3e6363561 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:24:39 +0800 Subject: [PATCH 07/16] Update README.md --- README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 4aade89..83aa43a 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis > **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**
> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) -[![Source_code](https://img.shields.io/badge/Code-Available-white)](README.md) -[![dataset](https://img.shields.io/badge/HuggingFace-gray)](https://huggingface.co/datasets/ling99/OCRBench_v2) +[![dataset](https://img.shields.io/badge/HuggingFace-Dataset-gray?logo=huggingface)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) From 8f305511e274eeee9370f00fb8bdd368cdf626a5 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:25:16 +0800 Subject: [PATCH 08/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 83aa43a..3ae4707 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis > **OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning**
> Ling Fu, Zhebin Kuang, Jiajun Song, Mingxin Huang, Biao Yang, Yuzhe Li, Linghao Zhu, Qidi Luo, Xinyu Wang, Hao Lu, Zhang Li, Guozhi Tang, Bin Shan, Chunhui Lin, Qi Liu, Binghong Wu, Hao Feng, Hao Liu, Can Huang, Jingqun Tang, Wei Chen, Lianwen Jin, Yuliang Liu, Xiang Bai
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) -[![dataset](https://img.shields.io/badge/HuggingFace-Dataset-gray?logo=huggingface)](https://huggingface.co/datasets/ling99/OCRBench_v2) +[![dataset](https://img.shields.io/badge/HuggingFace-Download-green?logo=huggingface)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) From ad8dd82e4ca411d7a6d9c3c7174b7a72a8ee2505 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:25:46 +0800 Subject: [PATCH 09/16] Update README.md --- README.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/README.md b/README.md index 3ae4707..0c9b705 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,3 @@ -# OCRBench & OCRBench v2 - **This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).**
From 1eef3f65c70815e473c625c51bc98be66f8c31fc Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:26:16 +0800 Subject: [PATCH 10/16] Update README.md --- README.md | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index 0c9b705..f6bcbe2 100644 --- a/README.md +++ b/README.md @@ -21,11 +21,11 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis [![Google Drive](https://img.shields.io/badge/Google%20Drive-Download-green?logo=google-drive)](https://drive.google.com/file/d/1Hk1TMu--7nr5vJ7iaNwMQZ_Iw9W_KI3C/view?usp=sharing) +**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md). - -# OCRBench & OCRBench v2 - -**This is the repository of the [OCRBench](./OCRBench/README.md) & [OCRBench v2](./OCRBench_v2/README.md).** +

+ +

**OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md). @@ -33,12 +33,6 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis

-**OCRBench v2** is a large-scale bilingual text-centric benchmark with currently the most comprehensive set of tasks (4× more tasks than the previous multi-scene benchmark OCRBench), the widest coverage of scenarios (31 diverse scenarios including street scene, receipt, formula, diagram, and so on), and thorough evaluation metrics, with a total of 10, 000 human-verified question-answering pairs and a high proportion of difficult samples. More details can be found in [OCRBench v2 README](./OCRBench_v2/README.md). - -

- -

- # News * ```2024.12.31``` 🚀 [OCRBench v2](./OCRBench_v2/README.md) is released. * ```2024.12.11``` 🚀 OCRBench has been accepted by [Science China Information Sciences](https://link.springer.com/article/10.1007/s11432-024-4235-6). From 00df3ca6ccee94a164cf81dfce7acf19f6f3890c Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:45:43 +0800 Subject: [PATCH 11/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f6bcbe2..e3c502f 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning -[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg)](https://99franklin.github.io/ocrbench_v2/) +[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=trophy)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) From fa5d812fea4f84c6e712e173e00709d82928f880 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:47:45 +0800 Subject: [PATCH 12/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e3c502f..74e4dc3 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning -[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=trophy)](https://99franklin.github.io/ocrbench_v2/) +[![🏆 Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=trophy)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) From 1b7fe0b2bf908f43ebc583d4571f2877668aef04 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 20:49:41 +0800 Subject: [PATCH 13/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 74e4dc3..91c27b8 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning -[![🏆 Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=trophy)](https://99franklin.github.io/ocrbench_v2/) +[![Leaderboard](https://img.shields.io/badge/Leaderboard-OCRBenchV2-blue.svg?logo=google-analytics)](https://99franklin.github.io/ocrbench_v2/) [![arXiv](https://img.shields.io/badge/Arxiv-OCRBenchV2-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2501.00321) [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) From b115d52e67bf5e51852c02187af281394fe115b8 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 21:02:29 +0800 Subject: [PATCH 14/16] Update README.md --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index 91c27b8..2a87f67 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,12 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis

+ +> **OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models**
+> Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai
+[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2305.07895) +[![Dataset](https://img.shields.io/badge/Dataset-Available-lightblue)](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md) + **OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md). From 594a5727c737511f4a72fcb856b7ef4a15aaef20 Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 21:03:01 +0800 Subject: [PATCH 15/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2a87f67..5d8467b 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis > **OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models**
> Yuliang Liu, Zhang Li, Mingxin Huang, Biao Yang, Wenwen Yu, Chunyuan Li, Xucheng Yin, Cheng-lin Liu, Lianwen Jin, Xiang Bai
[![arXiv](https://img.shields.io/badge/Arxiv-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2305.07895) -[![Dataset](https://img.shields.io/badge/Dataset-Available-lightblue)](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md) +[![Dataset](https://img.shields.io/badge/Dataset-Available-green)](https://github.com/qywh2023/OCRbench/blob/main/OCRBench/README.md) **OCRBench** is a comprehensive evaluation benchmark designed to assess the OCR capabilities of Large Multimodal Models. It comprises five components: Text Recognition, SceneText-Centric VQA, Document-Oriented VQA, Key Information Extraction, and Handwritten Mathematical Expression Recognition. The benchmark includes 1000 question-answer pairs, and all the answers undergo manual verification and correction to ensure a more precise evaluation. More details can be found in [OCRBench README](./OCRBench/README.md). From c07fedbb47dfcaef2e03b642718d4c2205625fbf Mon Sep 17 00:00:00 2001 From: qywh2023 <134821122+qywh2023@users.noreply.github.com> Date: Fri, 20 Jun 2025 23:42:58 +0800 Subject: [PATCH 16/16] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5d8467b..5e894e5 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,6 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis [![HuggingFace](https://img.shields.io/badge/dataset-black.svg?logo=HuggingFace)](https://huggingface.co/datasets/ling99/OCRBench_v2) [![GitHub issues](https://img.shields.io/github/issues/Yuliang-Liu/MultimodalOCR?color=critical&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aopen+is%3Aissue) [![GitHub closed issues](https://img.shields.io/github/issues-closed/Yuliang-Liu/MultimodalOCR?color=success&label=Issues)](https://github.com/Yuliang-Liu/MultimodalOCR/issues?q=is%3Aissue+is%3Aclosed) -[![GitHub views](https://komarev.com/ghpvc/?username=Yuliang-Liu&repo=MultimodalOCR&color=brightgreen&label=Views)](https://github.com/Yuliang-Liu/MultimodalOCR)

@@ -40,6 +39,7 @@ OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Vis

# News +* ```2025.6.21``` 🚀 We realese the private dataset of OCRBench v2 and will update [Leaderboard](https://99franklin.github.io/ocrbench_v2/) every quarter. * ```2024.12.31``` 🚀 [OCRBench v2](./OCRBench_v2/README.md) is released. * ```2024.12.11``` 🚀 OCRBench has been accepted by [Science China Information Sciences](https://link.springer.com/article/10.1007/s11432-024-4235-6). * ```2024.5.19 ``` 🚀 We realese [DTVQA](https://github.com/ShuoZhang2003/DT-VQA), to explore the Capabilities of Large Multimodal Models on Dense Text.