IQA Metric Benchmark
A comprehensive framework for evaluating Image Quality Assessment (IQA) metrics against human quality annotations.
Overview
This project evaluates various IQA metrics by comparing their scores with human quality judgments for identity document images. The goal is to identify which IQA metrics best correlate with human perceptions of image quality.
Features
📊 IQA Metrics Evaluation
- 15 IQA Metrics evaluated against human quality annotations
- Correlation Analysis - Both Pearson and Spearman correlations
- Statistical Significance Testing - p-value analysis
- Performance Rankings - Comprehensive metric comparisons
🔧 Core Components
- IQA Score Processing - Load and parse IQA metric scores
- Human Label Analysis - Process human quality annotations
- Correlation Calculator - Statistical correlation analysis
- Results Generator - Comprehensive reporting and visualization
📈 Analysis Capabilities
- Batch Processing - Evaluate multiple IQA metrics simultaneously
- Statistical Analysis - Correlation coefficients and significance testing
- Visualization - Comparison plots and rankings
- Multiple Export Formats - CSV, Markdown, and text reports
Installation
Prerequisites
- Python 3.8 or higher
- pip package manager
Setup
- Clone the repository:
git clone <repository-url>
cd IQA-Metric-Benchmark
- Install dependencies:
pip install -r requirements.txt
Usage
Quick Start
Run the IQA evaluation analysis:
python main.py
Command Line Options
# Basic evaluation
python main.py
# Custom directories
python main.py --results-dir results --human-labels data/task/cni/human-label.csv
# Different output formats
python main.py --output-format csv
python main.py --output-format txt
Project Structure
IQA-Metric-Benchmark/
├── src/ # Source code
│ ├── __init__.py
│ ├── deqa_scorer.py # DeQA model wrapper
│ ├── iqa_analyzer.py # Main analysis engine
│ └── logger_config.py # Logging configuration
├── scripts/ # Utility scripts
│ ├── env.sh # Environment setup
│ └── cleanup_logs.py # Log cleanup utility
├── data/ # Data files
│ └── task/
│ └── cni/
│ ├── images/ # Image files
│ └── human-label.csv # Human quality annotations
├── docs/ # Documentation
│ └── task/
│ └── cni/
│ └── evaluation_results.md # IQA evaluation results
├── logs/ # Log files (created automatically)
├── results/ # Output directory (created automatically)
├── main.py # Main execution script
├── requirements.txt # Python dependencies
└── README.md # This file
IQA Metrics Evaluated
The framework evaluates the following IQA metrics:
No-Reference Metrics:
- DEQA - Deep Quality Assessment
- URanker - Unified Ranking
- DBCNN - Deep Blind CNN
- HyperIQA - Hyperparameter-free IQA
- MANIQA - Multi-dimension Attention Network
- MUSIQ - Multi-scale Image Quality Transformer
- NIMA - Neural Image Assessment
- BRISQUE - Blind/Referenceless Image Spatial Quality Evaluator
- NIQE - Natural Image Quality Evaluator
- PIQE - Perception-based Image Quality Evaluator
- NRQM - No-Reference Quality Metric
- Unique - Unified Quality Evaluator
- PaQ2PIQ - Perceptual Quality Assessment
- CLIPIQA+ - CLIP-based IQA
- TopIQ - Topology-aware IQA
Output and Results
Results Directory Structure
results/
├── detailed_iqa_correlation_results.csv # Complete correlation analysis
├── iqa_ranking_table.csv # Performance rankings
├── detailed_data_*.csv # Individual metric data
├── iqa_correlation_comparison.png # Visualization plots
└── evaluation_summary.txt # Summary report
📊 IQA Metrics Evaluation Results
Evaluation of 15 IQA metrics against human quality annotations for 81 identity document images:
🏆 Top Performing Metrics
- DEQA (0.6185) - Best overall performer with strong positive correlation
- URanker (0.3629) - Good positive correlation
- DBCNN (0.3605) - Strong negative correlation (interpretation differs)
- NRQM (0.3596) - Strong negative correlation
- BRISQUE (0.3509) - Strong negative correlation
📈 Key Findings
- 9/15 metrics have statistically significant Pearson correlations (p < 0.05)
- 10/15 metrics have statistically significant Spearman correlations (p < 0.05)
- DEQA recommended as primary metric for identity document quality assessment
- Average absolute correlation: 0.2713
🔍 Correlation Patterns
- Positive correlations (higher IQA score = higher human quality): DEQA, URanker, NIMA
- Negative correlations (lower IQA score = higher human quality): DBCNN, NRQM, BRISQUE, HyperIQA
📋 Detailed results: See docs/task/cni/evaluation_results.md for complete analysis.
Methodology
Data Sources
- IQA Metrics: 15 different IQA metrics computed for 81 identity document images
- Human Labels: Quality annotations (coherence scores 1-5) for the same 81 images
- Correlation Analysis: Both Pearson (linear) and Spearman (rank) correlations
Statistical Analysis
- Correlation Types: Pearson (linear) and Spearman (rank-order)
- Significance Threshold: p < 0.05
- Overall Score: Average of absolute Pearson and Spearman correlations
Recommendations
🎯 Primary Recommendation
Use DEQA as the primary IQA metric for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.
🔄 Robust Evaluation Strategy
Combine multiple metrics for comprehensive assessment:
- DEQA (primary) - Strong positive correlation
- URanker (secondary) - Good positive correlation
- NIMA (validation) - Moderate positive correlation
⚠️ Important Notes
- Some metrics show negative correlations, indicating different quality interpretations
- Consider dataset-specific calibration for better performance
- Results may vary with different image types or quality ranges
Performance Considerations
- Efficient Processing: Optimized for batch analysis of multiple metrics
- Memory Management: Handles large datasets efficiently
- Error Handling: Robust error handling for missing or corrupted data
- Scalability: Designed to accommodate additional metrics
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this framework in your research, please cite:
@software{iqa_metric_benchmark,
title={IQA Metric Benchmark: Evaluation of IQA Metrics Against Human Quality Annotations},
author={Your Name},
year={2024},
url={https://github.com/yourusername/IQA-Metric-Benchmark}
}
Support
For questions, issues, or contributions:
- Open an issue on GitHub
- Contact the maintainers
- Check the documentation
Happy IQA Evaluation! 📊✨