Files
IQA-Metric-Benchmark/docs/task/cni/evaluation_results.md
2025-08-26 09:35:24 +00:00

5.3 KiB

CNI Task - IQA Metrics Evaluation Results

Overview

This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images.

Evaluation Date: 2025-08-26
Total Images: 81
Total Metrics Evaluated: 15

Key Findings

  • 9/15 metrics have statistically significant Pearson correlations (p < 0.05)
  • 10/15 metrics have statistically significant Spearman correlations (p < 0.05)
  • Best performing metric: DEQA with correlation 0.6185
  • Average absolute correlation: 0.2713

Top Performing Metrics

🏆 Best Overall Performer: DEQA

Metric Pearson Correlation Spearman Correlation Overall Score Significance
deqa 0.6185 0.6059 0.6122 Both
uranker 0.3349 0.3909 0.3629 Both
dbcnn -0.3721 -0.3489 0.3605 Both
nrqm -0.3493 -0.3699 0.3596 Both
brisque -0.3159 -0.3859 0.3509 Both

Complete Rankings

Rank Metric Pearson Corr Spearman Corr Overall Score Significant
1 deqa 0.6185 0.6059 0.6122 Both
2 uranker 0.3349 0.3909 0.3629 Both
3 dbcnn -0.3721 -0.3489 0.3605 Both
4 nrqm -0.3493 -0.3699 0.3596 Both
5 brisque -0.3159 -0.3859 0.3509 Both
6 hyperiqa -0.3271 -0.3106 0.3189 Both
7 nima 0.2989 0.3321 0.3155 Both
8 topiq_nr -0.2244 -0.2445 0.2345 Both
9 maniqa -0.2106 -0.2420 0.2263 ⚠️ Spearman only
10 musiq -0.2013 -0.2386 0.2200 ⚠️ Spearman only
11 clipiqa+_vitL14_512 -0.2259 -0.1960 0.2109 ⚠️ Pearson only
12 unique -0.1875 -0.1971 0.1923 None
13 piqe 0.1958 0.1763 0.1860 None
14 paq2piq -0.1445 -0.1548 0.1497 None
15 niqe -0.0627 0.0745 0.0686 None

Correlation Analysis

Positive Correlations (Higher IQA Score = Higher Human Quality)

Metric Pearson Spearman Significance
deqa 0.6185 0.6059 Both
uranker 0.3349 0.3909 Both
nima 0.2989 0.3321 Both
piqe 0.1958 0.1763 None

Negative Correlations (Lower IQA Score = Higher Human Quality)

Metric Pearson Spearman Significance
dbcnn -0.3721 -0.3489 Both
nrqm -0.3493 -0.3699 Both
hyperiqa -0.3271 -0.3106 Both
brisque -0.3159 -0.3859 Both
clipiqa+_vitL14_512 -0.2259 -0.1960 ⚠️ Mixed
topiq_nr -0.2244 -0.2445 Both
maniqa -0.2106 -0.2420 ⚠️ Mixed
musiq -0.2013 -0.2386 ⚠️ Mixed
unique -0.1875 -0.1971 None
paq2piq -0.1445 -0.1548 None
niqe -0.0627 0.0745 None

Statistical Significance Summary

Highly Significant (Both Pearson and Spearman, p < 0.05)

  • deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr

⚠️ Partially Significant (One correlation type, p < 0.05)

  • maniqa, musiq, clipiqa+_vitL14_512

Not Significant (Both correlations, p ≥ 0.05)

  • unique, piqe, paq2piq, niqe

Recommendations

🎯 Primary Recommendation

Use DEQA as the primary IQA metric for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.

🔄 Robust Evaluation Strategy

Combine multiple metrics for comprehensive assessment:

  1. deqa (primary) - Strong positive correlation
  2. uranker (secondary) - Good positive correlation
  3. nima (validation) - Moderate positive correlation

⚠️ Important Notes

  • Some metrics show negative correlations, indicating different quality interpretations
  • Consider dataset-specific calibration for better performance
  • Results may vary with different image types or quality ranges

Methodology

Data Sources

  • IQA Metrics: 15 different IQA metrics computed for 81 identity document images
  • Human Labels: Quality annotations (coherence scores 1-5) for the same 81 images
  • Correlation Analysis: Both Pearson (linear) and Spearman (rank) correlations

Statistical Analysis

  • Correlation Types: Pearson (linear) and Spearman (rank-order)
  • Significance Threshold: p < 0.05
  • Overall Score: Average of absolute Pearson and Spearman correlations

Files Generated

  • detailed_iqa_correlation_results.csv - Complete analysis data
  • iqa_ranking_table.csv - Performance rankings
  • detailed_data_*.csv - Individual metric data (15 files)
  • iqa_correlation_comparison.png - Visualization plots

Conclusion

DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems.


Last updated: 2025-08-26