This project evaluates various IQA metrics by comparing their scores with human quality judgments for identity document images. The goal is to identify which IQA metrics best correlate with human perceptions of image quality.

Features

📊 IQA Metrics Evaluation

15 IQA Metrics evaluated against human quality annotations
Correlation Analysis - Both Pearson and Spearman correlations
Statistical Significance Testing - p-value analysis
Performance Rankings - Comprehensive metric comparisons

🔧 Core Components

IQA Score Processing - Load and parse IQA metric scores
Human Label Analysis - Process human quality annotations
Correlation Calculator - Statistical correlation analysis
Results Generator - Comprehensive reporting and visualization

📈 Analysis Capabilities

Batch Processing - Evaluate multiple IQA metrics simultaneously
Statistical Analysis - Correlation coefficients and significance testing
Visualization - Comparison plots and rankings
Multiple Export Formats - CSV, Markdown, and text reports

Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository:

git clone <repository-url>
cd IQA-Metric-Benchmark

Install dependencies:

pip install -r requirements.txt

Usage

Quick Start

Run the IQA evaluation analysis:

python main.py

Command Line Options

# Basic evaluation
python main.py

# Custom directories
python main.py --results-dir results --human-labels data/task/cni/human-label.csv

# Different output formats
python main.py --output-format csv
python main.py --output-format txt

Project Structure

IQA-Metric-Benchmark/
├── src/                    # Source code
│   ├── __init__.py
│   ├── deqa_scorer.py     # DeQA model wrapper
│   ├── iqa_analyzer.py    # Main analysis engine
│   └── logger_config.py   # Logging configuration
├── scripts/                # Utility scripts
│   ├── env.sh             # Environment setup
│   └── cleanup_logs.py    # Log cleanup utility
├── data/                   # Data files
│   └── task/
│       └── cni/
│           ├── images/     # Image files
│           └── human-label.csv  # Human quality annotations
├── docs/                   # Documentation
│   └── task/
│       └── cni/
│           └── evaluation_results.md  # IQA evaluation results
├── logs/                   # Log files (created automatically)
├── results/                # Output directory (created automatically)
├── main.py                 # Main execution script
├── requirements.txt        # Python dependencies
└── README.md              # This file

IQA Metrics Evaluated

The framework evaluates the following IQA metrics:

No-Reference Metrics:

DEQA - Deep Quality Assessment
URanker - Unified Ranking
DBCNN - Deep Blind CNN
HyperIQA - Hyperparameter-free IQA
MANIQA - Multi-dimension Attention Network
MUSIQ - Multi-scale Image Quality Transformer
NIMA - Neural Image Assessment
BRISQUE - Blind/Referenceless Image Spatial Quality Evaluator
NIQE - Natural Image Quality Evaluator
PIQE - Perception-based Image Quality Evaluator
NRQM - No-Reference Quality Metric
Unique - Unified Quality Evaluator
PaQ2PIQ - Perceptual Quality Assessment
CLIPIQA+ - CLIP-based IQA
TopIQ - Topology-aware IQA

Output and Results

Results Directory Structure

results/
├── detailed_iqa_correlation_results.csv    # Complete correlation analysis
├── iqa_ranking_table.csv                   # Performance rankings
├── detailed_data_*.csv                     # Individual metric data
├── iqa_correlation_comparison.png          # Visualization plots
└── evaluation_summary.txt                  # Summary report

📊 IQA Metrics Evaluation Results

Evaluation of 15 IQA metrics against human quality annotations for 81 identity document images:

🏆 Top Performing Metrics

DEQA (0.6185) - Best overall performer with strong positive correlation
URanker (0.3629) - Good positive correlation
DBCNN (0.3605) - Strong negative correlation (interpretation differs)
NRQM (0.3596) - Strong negative correlation
BRISQUE (0.3509) - Strong negative correlation

📈 Key Findings

9/15 metrics have statistically significant Pearson correlations (p < 0.05)
10/15 metrics have statistically significant Spearman correlations (p < 0.05)
DEQA recommended as primary metric for identity document quality assessment
Average absolute correlation: 0.2713

🔍 Correlation Patterns

Positive correlations (higher IQA score = higher human quality): DEQA, URanker, NIMA
Negative correlations (lower IQA score = higher human quality): DBCNN, NRQM, BRISQUE, HyperIQA

📋 Detailed results: See docs/task/cni/evaluation_results.md for complete analysis.

Methodology

Data Sources

IQA Metrics: 15 different IQA metrics computed for 81 identity document images
Human Labels: Quality annotations (coherence scores 1-5) for the same 81 images
Correlation Analysis: Both Pearson (linear) and Spearman (rank) correlations

Statistical Analysis

Correlation Types: Pearson (linear) and Spearman (rank-order)
Significance Threshold: p < 0.05
Overall Score: Average of absolute Pearson and Spearman correlations

Recommendations

🎯 Primary Recommendation

Use DEQA as the primary IQA metric for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.

🔄 Robust Evaluation Strategy

Combine multiple metrics for comprehensive assessment:

DEQA (primary) - Strong positive correlation
URanker (secondary) - Good positive correlation
NIMA (validation) - Moderate positive correlation

⚠️ Important Notes

Some metrics show negative correlations, indicating different quality interpretations
Consider dataset-specific calibration for better performance
Results may vary with different image types or quality ranges

Performance Considerations

Efficient Processing: Optimized for batch analysis of multiple metrics
Memory Management: Handles large datasets efficiently
Error Handling: Robust error handling for missing or corrupted data
Scalability: Designed to accommodate additional metrics

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this framework in your research, please cite:

@software{iqa_metric_benchmark,
  title={IQA Metric Benchmark: Evaluation of IQA Metrics Against Human Quality Annotations},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/IQA-Metric-Benchmark}
}

Support

For questions, issues, or contributions:

Open an issue on GitHub
Contact the maintainers
Check the documentation

Happy IQA Evaluation! 📊✨