init structure

2025-08-26 09:35:24 +00:00
commit 42047598ae
18 changed files with 2023 additions and 0 deletions
--- a/docs/task/cni/evaluation_results.md
+++ b/docs/task/cni/evaluation_results.md
@@ -0,0 +1,128 @@
+# CNI Task - IQA Metrics Evaluation Results
+
+## Overview
+
+This document presents the comprehensive evaluation results of 15 Image Quality Assessment (IQA) metrics against human quality annotations for 81 identity document images.
+
+**Evaluation Date:** 2025-08-26  
+**Total Images:** 81  
+**Total Metrics Evaluated:** 15  
+
+## Key Findings
+
+- **9/15 metrics** have statistically significant Pearson correlations (p < 0.05)
+- **10/15 metrics** have statistically significant Spearman correlations (p < 0.05)
+- **Best performing metric:** DEQA with correlation 0.6185
+- **Average absolute correlation:** 0.2713
+
+## Top Performing Metrics
+
+### 🏆 Best Overall Performer: DEQA
+
+| Metric | Pearson Correlation | Spearman Correlation | Overall Score | Significance |
+|--------|-------------------|---------------------|---------------|--------------|
+| **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both |
+| uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
+| dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
+| nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
+| brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
+
+## Complete Rankings
+
+| Rank | Metric | Pearson Corr | Spearman Corr | Overall Score | Significant |
+|------|--------|--------------|---------------|---------------|-------------|
+| 1 | **deqa** | **0.6185** | **0.6059** | **0.6122** | ✅ Both |
+| 2 | uranker | 0.3349 | 0.3909 | 0.3629 | ✅ Both |
+| 3 | dbcnn | -0.3721 | -0.3489 | 0.3605 | ✅ Both |
+| 4 | nrqm | -0.3493 | -0.3699 | 0.3596 | ✅ Both |
+| 5 | brisque | -0.3159 | -0.3859 | 0.3509 | ✅ Both |
+| 6 | hyperiqa | -0.3271 | -0.3106 | 0.3189 | ✅ Both |
+| 7 | nima | 0.2989 | 0.3321 | 0.3155 | ✅ Both |
+| 8 | topiq_nr | -0.2244 | -0.2445 | 0.2345 | ✅ Both |
+| 9 | maniqa | -0.2106 | -0.2420 | 0.2263 | ⚠️ Spearman only |
+| 10 | musiq | -0.2013 | -0.2386 | 0.2200 | ⚠️ Spearman only |
+| 11 | clipiqa+_vitL14_512 | -0.2259 | -0.1960 | 0.2109 | ⚠️ Pearson only |
+| 12 | unique | -0.1875 | -0.1971 | 0.1923 | ❌ None |
+| 13 | piqe | 0.1958 | 0.1763 | 0.1860 | ❌ None |
+| 14 | paq2piq | -0.1445 | -0.1548 | 0.1497 | ❌ None |
+| 15 | niqe | -0.0627 | 0.0745 | 0.0686 | ❌ None |
+
+## Correlation Analysis
+
+### Positive Correlations (Higher IQA Score = Higher Human Quality)
+
+| Metric | Pearson | Spearman | Significance |
+|--------|---------|----------|--------------|
+| **deqa** | **0.6185** | **0.6059** | ✅ Both |
+| uranker | 0.3349 | 0.3909 | ✅ Both |
+| nima | 0.2989 | 0.3321 | ✅ Both |
+| piqe | 0.1958 | 0.1763 | ❌ None |
+
+### Negative Correlations (Lower IQA Score = Higher Human Quality)
+
+| Metric | Pearson | Spearman | Significance |
+|--------|---------|----------|--------------|
+| dbcnn | -0.3721 | -0.3489 | ✅ Both |
+| nrqm | -0.3493 | -0.3699 | ✅ Both |
+| hyperiqa | -0.3271 | -0.3106 | ✅ Both |
+| brisque | -0.3159 | -0.3859 | ✅ Both |
+| clipiqa+_vitL14_512 | -0.2259 | -0.1960 | ⚠️ Mixed |
+| topiq_nr | -0.2244 | -0.2445 | ✅ Both |
+| maniqa | -0.2106 | -0.2420 | ⚠️ Mixed |
+| musiq | -0.2013 | -0.2386 | ⚠️ Mixed |
+| unique | -0.1875 | -0.1971 | ❌ None |
+| paq2piq | -0.1445 | -0.1548 | ❌ None |
+| niqe | -0.0627 | 0.0745 | ❌ None |
+
+## Statistical Significance Summary
+
+### ✅ Highly Significant (Both Pearson and Spearman, p < 0.05)
+- deqa, uranker, dbcnn, nrqm, brisque, hyperiqa, nima, topiq_nr
+
+### ⚠️ Partially Significant (One correlation type, p < 0.05)
+- maniqa, musiq, clipiqa+_vitL14_512
+
+### ❌ Not Significant (Both correlations, p ≥ 0.05)
+- unique, piqe, paq2piq, niqe
+
+## Recommendations
+
+### 🎯 Primary Recommendation
+**Use DEQA as the primary IQA metric** for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.
+
+### 🔄 Robust Evaluation Strategy
+Combine multiple metrics for comprehensive assessment:
+1. **deqa** (primary) - Strong positive correlation
+2. **uranker** (secondary) - Good positive correlation  
+3. **nima** (validation) - Moderate positive correlation
+
+### ⚠️ Important Notes
+- Some metrics show negative correlations, indicating different quality interpretations
+- Consider dataset-specific calibration for better performance
+- Results may vary with different image types or quality ranges
+
+## Methodology
+
+### Data Sources
+- **IQA Metrics**: 15 different IQA metrics computed for 81 identity document images
+- **Human Labels**: Quality annotations (coherence scores 1-5) for the same 81 images
+- **Correlation Analysis**: Both Pearson (linear) and Spearman (rank) correlations
+
+### Statistical Analysis
+- **Correlation Types**: Pearson (linear) and Spearman (rank-order)
+- **Significance Threshold**: p < 0.05
+- **Overall Score**: Average of absolute Pearson and Spearman correlations
+
+### Files Generated
+- `detailed_iqa_correlation_results.csv` - Complete analysis data
+- `iqa_ranking_table.csv` - Performance rankings
+- `detailed_data_*.csv` - Individual metric data (15 files)
+- `iqa_correlation_comparison.png` - Visualization plots
+
+## Conclusion
+
+DEQA emerges as the top-performing IQA metric for identity document quality assessment, showing the strongest correlation with human quality judgments. The evaluation demonstrates that several metrics have statistically significant relationships with human assessments, providing a solid foundation for automated quality evaluation systems.
+
+---
+
+*Last updated: 2025-08-26*