# CNI Task - IQA Metrics Evaluation Results ## Overview This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images. **Evaluation Date:** 2025-08-26 **Total Images:** 81 **Total Metrics Evaluated:** 15 ## Key Findings - **9/15 metrics** have statistically significant Pearson correlations (p < 0.05) - **10/15 metrics** have statistically significant Spearman correlations (p < 0.05) - **Best performing metric:** DEQA with correlation 0.6185 - **Average absolute correlation:** 0.2713 ## Top Performing Metrics ### 🏆 Best Overall Performer: DEQA | Metric | Pearson Correlation | Spearman Correlation | Overall Score | Significance | |--------|-------------------|---------------------|---------------|--------------| | **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both | | uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both | | dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both | | nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both | | brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both | ## Complete Rankings | Rank | Metric | Pearson Corr | Spearman Corr | Overall Score | Significant | |------|--------|--------------|---------------|---------------|-------------| | 1 | **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both | | 2 | uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both | | 3 | dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both | | 4 | nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both | | 5 | brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both | | 6 | hyperiqa | -0.3271 | -0.3106 | 0.3189 | ✅ Both | | 7 | nima | 0.2989 | 0.3321 | 0.3155 | ✅ Both | | 8 | topiq_nr | -0.2244 | -0.2445 | 0.2345 | ✅ Both | | 9 | maniqa | -0.2106 | -0.2420 | 0.2263 | ⚠️ Spearman only | | 10 | musiq | -0.2013 | -0.2386 | 0.2200 | ⚠️ Spearman only | | 11 | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | 0.2109 | ⚠️ Pearson only | | 12 | unique | -0.1875 | -0.1971 | 0.1923 | ❌ None | | 13 | piqe | 0.1958 | 0.1763 | 0.1860 | ❌ None | | 14 | paq2piq | -0.1445 | -0.1548 | 0.1497 | ❌ None | | 15 | niqe | -0.0627 | 0.0745 | 0.0686 | ❌ None | ## Correlation Analysis ### Positive Correlations (Higher IQA Score = Higher Human Quality) | Metric | Pearson | Spearman | Significance | |--------|---------|----------|--------------| | **deqa** | **0.6185** | **0.6059** | ✅ Both | | uranker | 0.3349 | 0.3909 | ✅ Both | | nima | 0.2989 | 0.3321 | ✅ Both | | piqe | 0.1958 | 0.1763 | ❌ None | ### Negative Correlations (Lower IQA Score = Higher Human Quality) | Metric | Pearson | Spearman | Significance | |--------|---------|----------|--------------| | dbcnn | -0.3721 | -0.3489 | ✅ Both | | nrqm | -0.3493 | -0.3699 | ✅ Both | | hyperiqa | -0.3271 | -0.3106 | ✅ Both | | brisque | -0.3159 | -0.3859 | ✅ Both | | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | ⚠️ Mixed | | topiq_nr | -0.2244 | -0.2445 | ✅ Both | | maniqa | -0.2106 | -0.2420 | ⚠️ Mixed | | musiq | -0.2013 | -0.2386 | ⚠️ Mixed | | unique | -0.1875 | -0.1971 | ❌ None | | paq2piq | -0.1445 | -0.1548 | ❌ None | | niqe | -0.0627 | 0.0745 | ❌ None | ## Statistical Significance Summary ### ✅ Highly Significant (Both Pearson and Spearman, p < 0.05) - deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr ### ⚠️ Partially Significant (One correlation type, p < 0.05) - maniqa, musiq, clipiqa+_vitL14_512 ### ❌ Not Significant (Both correlations, p ≥ 0.05) - unique, piqe, paq2piq, niqe ## Recommendations ### 🎯 Primary Recommendation **Use DEQA as the primary IQA metric** for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments. ### 🔄 Robust Evaluation Strategy Combine multiple metrics for comprehensive assessment: 1. **deqa** (primary) - Strong positive correlation 2. **uranker** (secondary) - Good positive correlation 3. **nima** (validation) - Moderate positive correlation ### ⚠️ Important Notes - Some metrics show negative correlations, indicating different quality interpretations - Consider dataset-specific calibration for better performance - Results may vary with different image types or quality ranges ## Methodology ### Data Sources - **IQA Metrics**: 15 different IQA metrics computed for 81 identity document images - **Human Labels**: Quality annotations (coherence scores 1-5) for the same 81 images - **Correlation Analysis**: Both Pearson (linear) and Spearman (rank) correlations ### Statistical Analysis - **Correlation Types**: Pearson (linear) and Spearman (rank-order) - **Significance Threshold**: p < 0.05 - **Overall Score**: Average of absolute Pearson and Spearman correlations ### Files Generated - `detailed_iqa_correlation_results.csv` - Complete analysis data - `iqa_ranking_table.csv` - Performance rankings - `detailed_data_*.csv` - Individual metric data (15 files) - `iqa_correlation_comparison.png` - Visualization plots ## Conclusion DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems. --- *Last updated: 2025-08-26*