1.5 KiB
1.5 KiB
Evaluation Results Summary
Quick Overview
- Dataset: 56 document samples
- Best Approach: Crop (No Shadow Removal)
- Performance Gain: +14.1% F1-score improvement over baseline
Performance Comparison (Ranked from Lowest to Highest)
| Approach | Precision | Recall | F1-Score | Field Accuracy | Improvement vs. Baseline |
|---|---|---|---|---|---|
| No Preprocessing | 79.0% | 68.7% | 73.5% | 68.7% | Baseline |
| Crop + PaddleOCR + Shadow Removal + Cache | 92.5% | 88.3% | 90.3% | 88.3% | +16.8% |
| Crop + Shadow Removal + Cache | 93.6% | 88.5% | 91.0% | 88.5% | +17.5% |
| Crop + PaddleOCR + Shadow Removal | 93.6% | 89.4% | 91.5% | 89.4% | +18.0% |
| Crop | 94.8% | 89.9% | 92.3% | 89.9% | +18.8% |
Top Performing Fields
- Gender: 85.1% F1 (Crop + PaddleOCR + Shadow Removal)
- Birth Date: 80.5% F1 (Crop + PaddleOCR + Shadow Removal)
- Document Type: 85.4% F1 (Crop + PaddleOCR + Shadow Removal)
- Surname: 82.9% F1 (Crop + PaddleOCR + Shadow Removal)
Key Insights
- Cropping provides the biggest performance boost
- PaddleOCR + Shadow Removal adds small but consistent improvement
- Shadow removal shows mixed results depending on field type
- Caching has minimal impact on accuracy
Recommendations
- Use Crop + PaddleOCR + Shadow Removal for production
- Focus on optimizing high-value fields
- Investigate MRZ line extraction further
- Target 65%+ overall F1-score
See README.md for detailed analysis