5.3 KiB
CNI Task - IQA Metrics Evaluation Results
Overview
This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images.
Evaluation Date: 2025-08-26
Total Images: 81
Total Metrics Evaluated: 15
Key Findings
- 9/15 metrics have statistically significant Pearson correlations (p < 0.05)
- 10/15 metrics have statistically significant Spearman correlations (p < 0.05)
- Best performing metric: DEQA with correlation 0.6185
- Average absolute correlation: 0.2713
Top Performing Metrics
🏆 Best Overall Performer: DEQA
Metric | Pearson Correlation | Spearman Correlation | Overall Score | Significance |
---|---|---|---|---|
deqa | 0.6185 | 0.6059 | 0.6122 | ✅ Both |
uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
Complete Rankings
Rank | Metric | Pearson Corr | Spearman Corr | Overall Score | Significant |
---|---|---|---|---|---|
1 | deqa | 0.6185 | 0.6059 | 0.6122 | ✅ Both |
2 | uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
3 | dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
4 | nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
5 | brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
6 | hyperiqa | -0.3271 | -0.3106 | 0.3189 | ✅ Both |
7 | nima | 0.2989 | 0.3321 | 0.3155 | ✅ Both |
8 | topiq_nr | -0.2244 | -0.2445 | 0.2345 | ✅ Both |
9 | maniqa | -0.2106 | -0.2420 | 0.2263 | ⚠️ Spearman only |
10 | musiq | -0.2013 | -0.2386 | 0.2200 | ⚠️ Spearman only |
11 | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | 0.2109 | ⚠️ Pearson only |
12 | unique | -0.1875 | -0.1971 | 0.1923 | ❌ None |
13 | piqe | 0.1958 | 0.1763 | 0.1860 | ❌ None |
14 | paq2piq | -0.1445 | -0.1548 | 0.1497 | ❌ None |
15 | niqe | -0.0627 | 0.0745 | 0.0686 | ❌ None |
Correlation Analysis
Positive Correlations (Higher IQA Score = Higher Human Quality)
Metric | Pearson | Spearman | Significance |
---|---|---|---|
deqa | 0.6185 | 0.6059 | ✅ Both |
uranker | 0.3349 | 0.3909 | ✅ Both |
nima | 0.2989 | 0.3321 | ✅ Both |
piqe | 0.1958 | 0.1763 | ❌ None |
Negative Correlations (Lower IQA Score = Higher Human Quality)
Metric | Pearson | Spearman | Significance |
---|---|---|---|
dbcnn | -0.3721 | -0.3489 | ✅ Both |
nrqm | -0.3493 | -0.3699 | ✅ Both |
hyperiqa | -0.3271 | -0.3106 | ✅ Both |
brisque | -0.3159 | -0.3859 | ✅ Both |
clipiqa+_vitL14_512 | -0.2259 | -0.1960 | ⚠️ Mixed |
topiq_nr | -0.2244 | -0.2445 | ✅ Both |
maniqa | -0.2106 | -0.2420 | ⚠️ Mixed |
musiq | -0.2013 | -0.2386 | ⚠️ Mixed |
unique | -0.1875 | -0.1971 | ❌ None |
paq2piq | -0.1445 | -0.1548 | ❌ None |
niqe | -0.0627 | 0.0745 | ❌ None |
Statistical Significance Summary
✅ Highly Significant (Both Pearson and Spearman, p < 0.05)
- deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr
⚠️ Partially Significant (One correlation type, p < 0.05)
- maniqa, musiq, clipiqa+_vitL14_512
❌ Not Significant (Both correlations, p ≥ 0.05)
- unique, piqe, paq2piq, niqe
Recommendations
🎯 Primary Recommendation
Use DEQA as the primary IQA metric for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.
🔄 Robust Evaluation Strategy
Combine multiple metrics for comprehensive assessment:
- deqa (primary) - Strong positive correlation
- uranker (secondary) - Good positive correlation
- nima (validation) - Moderate positive correlation
⚠️ Important Notes
- Some metrics show negative correlations, indicating different quality interpretations
- Consider dataset-specific calibration for better performance
- Results may vary with different image types or quality ranges
Methodology
Data Sources
- IQA Metrics: 15 different IQA metrics computed for 81 identity document images
- Human Labels: Quality annotations (coherence scores 1-5) for the same 81 images
- Correlation Analysis: Both Pearson (linear) and Spearman (rank) correlations
Statistical Analysis
- Correlation Types: Pearson (linear) and Spearman (rank-order)
- Significance Threshold: p < 0.05
- Overall Score: Average of absolute Pearson and Spearman correlations
Files Generated
detailed_iqa_correlation_results.csv
- Complete analysis dataiqa_ranking_table.csv
- Performance rankingsdetailed_data_*.csv
- Individual metric data (15 files)iqa_correlation_comparison.png
- Visualization plots
Conclusion
DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems.
Last updated: 2025-08-26