init structure
This commit is contained in:
128
docs/task/cni/evaluation_results.md
Normal file
128
docs/task/cni/evaluation_results.md
Normal file
@@ -0,0 +1,128 @@
|
||||
# CNI Task - IQA Metrics Evaluation Results
|
||||
|
||||
## Overview
|
||||
|
||||
This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images.
|
||||
|
||||
**Evaluation Date:** 2025-08-26
|
||||
**Total Images:** 81
|
||||
**Total Metrics Evaluated:** 15
|
||||
|
||||
## Key Findings
|
||||
|
||||
- **9/15 metrics** have statistically significant Pearson correlations (p < 0.05)
|
||||
- **10/15 metrics** have statistically significant Spearman correlations (p < 0.05)
|
||||
- **Best performing metric:** DEQA with correlation 0.6185
|
||||
- **Average absolute correlation:** 0.2713
|
||||
|
||||
## Top Performing Metrics
|
||||
|
||||
### 🏆 Best Overall Performer: DEQA
|
||||
|
||||
| Metric | Pearson Correlation | Spearman Correlation | Overall Score | Significance |
|
||||
|--------|-------------------|---------------------|---------------|--------------|
|
||||
| **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both |
|
||||
| uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
|
||||
| dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
|
||||
| nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
|
||||
| brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
|
||||
|
||||
## Complete Rankings
|
||||
|
||||
| Rank | Metric | Pearson Corr | Spearman Corr | Overall Score | Significant |
|
||||
|------|--------|--------------|---------------|---------------|-------------|
|
||||
| 1 | **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both |
|
||||
| 2 | uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
|
||||
| 3 | dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
|
||||
| 4 | nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
|
||||
| 5 | brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
|
||||
| 6 | hyperiqa | -0.3271 | -0.3106 | 0.3189 | ✅ Both |
|
||||
| 7 | nima | 0.2989 | 0.3321 | 0.3155 | ✅ Both |
|
||||
| 8 | topiq_nr | -0.2244 | -0.2445 | 0.2345 | ✅ Both |
|
||||
| 9 | maniqa | -0.2106 | -0.2420 | 0.2263 | ⚠️ Spearman only |
|
||||
| 10 | musiq | -0.2013 | -0.2386 | 0.2200 | ⚠️ Spearman only |
|
||||
| 11 | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | 0.2109 | ⚠️ Pearson only |
|
||||
| 12 | unique | -0.1875 | -0.1971 | 0.1923 | ❌ None |
|
||||
| 13 | piqe | 0.1958 | 0.1763 | 0.1860 | ❌ None |
|
||||
| 14 | paq2piq | -0.1445 | -0.1548 | 0.1497 | ❌ None |
|
||||
| 15 | niqe | -0.0627 | 0.0745 | 0.0686 | ❌ None |
|
||||
|
||||
## Correlation Analysis
|
||||
|
||||
### Positive Correlations (Higher IQA Score = Higher Human Quality)
|
||||
|
||||
| Metric | Pearson | Spearman | Significance |
|
||||
|--------|---------|----------|--------------|
|
||||
| **deqa** | **0.6185** | **0.6059** | ✅ Both |
|
||||
| uranker | 0.3349 | 0.3909 | ✅ Both |
|
||||
| nima | 0.2989 | 0.3321 | ✅ Both |
|
||||
| piqe | 0.1958 | 0.1763 | ❌ None |
|
||||
|
||||
### Negative Correlations (Lower IQA Score = Higher Human Quality)
|
||||
|
||||
| Metric | Pearson | Spearman | Significance |
|
||||
|--------|---------|----------|--------------|
|
||||
| dbcnn | -0.3721 | -0.3489 | ✅ Both |
|
||||
| nrqm | -0.3493 | -0.3699 | ✅ Both |
|
||||
| hyperiqa | -0.3271 | -0.3106 | ✅ Both |
|
||||
| brisque | -0.3159 | -0.3859 | ✅ Both |
|
||||
| clipiqa+_vitL14_512 | -0.2259 | -0.1960 | ⚠️ Mixed |
|
||||
| topiq_nr | -0.2244 | -0.2445 | ✅ Both |
|
||||
| maniqa | -0.2106 | -0.2420 | ⚠️ Mixed |
|
||||
| musiq | -0.2013 | -0.2386 | ⚠️ Mixed |
|
||||
| unique | -0.1875 | -0.1971 | ❌ None |
|
||||
| paq2piq | -0.1445 | -0.1548 | ❌ None |
|
||||
| niqe | -0.0627 | 0.0745 | ❌ None |
|
||||
|
||||
## Statistical Significance Summary
|
||||
|
||||
### ✅ Highly Significant (Both Pearson and Spearman, p < 0.05)
|
||||
- deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr
|
||||
|
||||
### ⚠️ Partially Significant (One correlation type, p < 0.05)
|
||||
- maniqa, musiq, clipiqa+_vitL14_512
|
||||
|
||||
### ❌ Not Significant (Both correlations, p ≥ 0.05)
|
||||
- unique, piqe, paq2piq, niqe
|
||||
|
||||
## Recommendations
|
||||
|
||||
### 🎯 Primary Recommendation
|
||||
**Use DEQA as the primary IQA metric** for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.
|
||||
|
||||
### 🔄 Robust Evaluation Strategy
|
||||
Combine multiple metrics for comprehensive assessment:
|
||||
1. **deqa** (primary) - Strong positive correlation
|
||||
2. **uranker** (secondary) - Good positive correlation
|
||||
3. **nima** (validation) - Moderate positive correlation
|
||||
|
||||
### ⚠️ Important Notes
|
||||
- Some metrics show negative correlations, indicating different quality interpretations
|
||||
- Consider dataset-specific calibration for better performance
|
||||
- Results may vary with different image types or quality ranges
|
||||
|
||||
## Methodology
|
||||
|
||||
### Data Sources
|
||||
- **IQA Metrics**: 15 different IQA metrics computed for 81 identity document images
|
||||
- **Human Labels**: Quality annotations (coherence scores 1-5) for the same 81 images
|
||||
- **Correlation Analysis**: Both Pearson (linear) and Spearman (rank) correlations
|
||||
|
||||
### Statistical Analysis
|
||||
- **Correlation Types**: Pearson (linear) and Spearman (rank-order)
|
||||
- **Significance Threshold**: p < 0.05
|
||||
- **Overall Score**: Average of absolute Pearson and Spearman correlations
|
||||
|
||||
### Files Generated
|
||||
- `detailed_iqa_correlation_results.csv` - Complete analysis data
|
||||
- `iqa_ranking_table.csv` - Performance rankings
|
||||
- `detailed_data_*.csv` - Individual metric data (15 files)
|
||||
- `iqa_correlation_comparison.png` - Visualization plots
|
||||
|
||||
## Conclusion
|
||||
|
||||
DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems.
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-08-26*
|
Reference in New Issue
Block a user