Files
IQA-Metric-Benchmark/docs/task/cni/evaluation_results.md
2025-08-26 09:35:24 +00:00

129 lines
5.3 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CNI Task - IQA Metrics Evaluation Results
## Overview
This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images.
**Evaluation Date:** 2025-08-26
**Total Images:** 81
**Total Metrics Evaluated:** 15
## Key Findings
- **9/15 metrics** have statistically significant Pearson correlations (p < 0.05)
- **10/15 metrics** have statistically significant Spearman correlations (p < 0.05)
- **Best performing metric:** DEQA with correlation 0.6185
- **Average absolute correlation:** 0.2713
## Top Performing Metrics
### 🏆 Best Overall Performer: DEQA
| Metric | Pearson Correlation | Spearman Correlation | Overall Score | Significance |
|--------|-------------------|---------------------|---------------|--------------|
| **deqa** | **0.6185** | **0.6059** | **0.6122** | Both |
| uranker | 0.3349 | 0.3909 | 0.3629 | Both |
| dbcnn | -0.3721 | -0.3489 | 0.3605 | Both |
| nrqm | -0.3493 | -0.3699 | 0.3596 | Both |
| brisque | -0.3159 | -0.3859 | 0.3509 | Both |
## Complete Rankings
| Rank | Metric | Pearson Corr | Spearman Corr | Overall Score | Significant |
|------|--------|--------------|---------------|---------------|-------------|
| 1 | **deqa** | **0.6185** | **0.6059** | **0.6122** | Both |
| 2 | uranker | 0.3349 | 0.3909 | 0.3629 | Both |
| 3 | dbcnn | -0.3721 | -0.3489 | 0.3605 | Both |
| 4 | nrqm | -0.3493 | -0.3699 | 0.3596 | Both |
| 5 | brisque | -0.3159 | -0.3859 | 0.3509 | Both |
| 6 | hyperiqa | -0.3271 | -0.3106 | 0.3189 | Both |
| 7 | nima | 0.2989 | 0.3321 | 0.3155 | Both |
| 8 | topiq_nr | -0.2244 | -0.2445 | 0.2345 | Both |
| 9 | maniqa | -0.2106 | -0.2420 | 0.2263 | Spearman only |
| 10 | musiq | -0.2013 | -0.2386 | 0.2200 | Spearman only |
| 11 | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | 0.2109 | Pearson only |
| 12 | unique | -0.1875 | -0.1971 | 0.1923 | None |
| 13 | piqe | 0.1958 | 0.1763 | 0.1860 | None |
| 14 | paq2piq | -0.1445 | -0.1548 | 0.1497 | None |
| 15 | niqe | -0.0627 | 0.0745 | 0.0686 | None |
## Correlation Analysis
### Positive Correlations (Higher IQA Score = Higher Human Quality)
| Metric | Pearson | Spearman | Significance |
|--------|---------|----------|--------------|
| **deqa** | **0.6185** | **0.6059** | Both |
| uranker | 0.3349 | 0.3909 | Both |
| nima | 0.2989 | 0.3321 | Both |
| piqe | 0.1958 | 0.1763 | None |
### Negative Correlations (Lower IQA Score = Higher Human Quality)
| Metric | Pearson | Spearman | Significance |
|--------|---------|----------|--------------|
| dbcnn | -0.3721 | -0.3489 | Both |
| nrqm | -0.3493 | -0.3699 | Both |
| hyperiqa | -0.3271 | -0.3106 | Both |
| brisque | -0.3159 | -0.3859 | Both |
| clipiqa+_vitL14_512 | -0.2259 | -0.1960 | Mixed |
| topiq_nr | -0.2244 | -0.2445 | Both |
| maniqa | -0.2106 | -0.2420 | Mixed |
| musiq | -0.2013 | -0.2386 | Mixed |
| unique | -0.1875 | -0.1971 | None |
| paq2piq | -0.1445 | -0.1548 | None |
| niqe | -0.0627 | 0.0745 | None |
## Statistical Significance Summary
### ✅ Highly Significant (Both Pearson and Spearman, p < 0.05)
- deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr
### ⚠️ Partially Significant (One correlation type, p < 0.05)
- maniqa, musiq, clipiqa+_vitL14_512
### ❌ Not Significant (Both correlations, p ≥ 0.05)
- unique, piqe, paq2piq, niqe
## Recommendations
### 🎯 Primary Recommendation
**Use DEQA as the primary IQA metric** for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.
### 🔄 Robust Evaluation Strategy
Combine multiple metrics for comprehensive assessment:
1. **deqa** (primary) - Strong positive correlation
2. **uranker** (secondary) - Good positive correlation
3. **nima** (validation) - Moderate positive correlation
### ⚠️ Important Notes
- Some metrics show negative correlations, indicating different quality interpretations
- Consider dataset-specific calibration for better performance
- Results may vary with different image types or quality ranges
## Methodology
### Data Sources
- **IQA Metrics**: 15 different IQA metrics computed for 81 identity document images
- **Human Labels**: Quality annotations (coherence scores 1-5) for the same 81 images
- **Correlation Analysis**: Both Pearson (linear) and Spearman (rank) correlations
### Statistical Analysis
- **Correlation Types**: Pearson (linear) and Spearman (rank-order)
- **Significance Threshold**: p < 0.05
- **Overall Score**: Average of absolute Pearson and Spearman correlations
### Files Generated
- `detailed_iqa_correlation_results.csv` - Complete analysis data
- `iqa_ranking_table.csv` - Performance rankings
- `detailed_data_*.csv` - Individual metric data (15 files)
- `iqa_correlation_comparison.png` - Visualization plots
## Conclusion
DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems.
---
*Last updated: 2025-08-26*