2025-08-27 11:28:02 +00:00
2025-08-27 11:28:02 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00
2025-08-26 09:35:24 +00:00

IQA Metric Benchmark

A comprehensive framework for evaluating Image Quality Assessment (IQA) metrics against human quality annotations.

Overview

This project evaluates various IQA metrics by comparing their scores with human quality judgments for identity document images. The goal is to identify which IQA metrics best correlate with human perceptions of image quality.

Features

📊 IQA Metrics Evaluation

  • 15 IQA Metrics evaluated against human quality annotations
  • Correlation Analysis - Both Pearson and Spearman correlations
  • Statistical Significance Testing - p-value analysis
  • Performance Rankings - Comprehensive metric comparisons

🔧 Core Components

  • IQA Score Processing - Load and parse IQA metric scores
  • Human Label Analysis - Process human quality annotations
  • Correlation Calculator - Statistical correlation analysis
  • Results Generator - Comprehensive reporting and visualization

📈 Analysis Capabilities

  • Batch Processing - Evaluate multiple IQA metrics simultaneously
  • Statistical Analysis - Correlation coefficients and significance testing
  • Visualization - Comparison plots and rankings
  • Multiple Export Formats - CSV, Markdown, and text reports

Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone the repository:
git clone <repository-url>
cd IQA-Metric-Benchmark
  1. Install dependencies:
pip install -r requirements.txt

Usage

Quick Start

Run the IQA evaluation analysis:

python main.py

Command Line Options

# Basic evaluation
python main.py

# Custom directories
python main.py --results-dir results --human-labels data/task/cni/human-label.csv

# Different output formats
python main.py --output-format csv
python main.py --output-format txt

Project Structure

IQA-Metric-Benchmark/
├── src/                    # Source code
│   ├── __init__.py
│   ├── deqa_scorer.py     # DeQA model wrapper
│   ├── iqa_analyzer.py    # Main analysis engine
│   └── logger_config.py   # Logging configuration
├── scripts/                # Utility scripts
│   ├── env.sh             # Environment setup
│   └── cleanup_logs.py    # Log cleanup utility
├── data/                   # Data files
│   └── task/
│       └── cni/
│           ├── images/     # Image files
│           └── human-label.csv  # Human quality annotations
├── docs/                   # Documentation
│   └── task/
│       └── cni/
│           └── evaluation_results.md  # IQA evaluation results
├── logs/                   # Log files (created automatically)
├── results/                # Output directory (created automatically)
├── main.py                 # Main execution script
├── requirements.txt        # Python dependencies
└── README.md              # This file

IQA Metrics Evaluated

The framework evaluates the following IQA metrics:

No-Reference Metrics:

  • DEQA - Deep Quality Assessment
  • URanker - Unified Ranking
  • DBCNN - Deep Blind CNN
  • HyperIQA - Hyperparameter-free IQA
  • MANIQA - Multi-dimension Attention Network
  • MUSIQ - Multi-scale Image Quality Transformer
  • NIMA - Neural Image Assessment
  • BRISQUE - Blind/Referenceless Image Spatial Quality Evaluator
  • NIQE - Natural Image Quality Evaluator
  • PIQE - Perception-based Image Quality Evaluator
  • NRQM - No-Reference Quality Metric
  • Unique - Unified Quality Evaluator
  • PaQ2PIQ - Perceptual Quality Assessment
  • CLIPIQA+ - CLIP-based IQA
  • TopIQ - Topology-aware IQA

Output and Results

Results Directory Structure

results/
├── detailed_iqa_correlation_results.csv    # Complete correlation analysis
├── iqa_ranking_table.csv                   # Performance rankings
├── detailed_data_*.csv                     # Individual metric data
├── iqa_correlation_comparison.png          # Visualization plots
└── evaluation_summary.txt                  # Summary report

📊 IQA Metrics Evaluation Results

Evaluation of 15 IQA metrics against human quality annotations for 81 identity document images:

🏆 Top Performing Metrics

  1. DEQA (0.6185) - Best overall performer with strong positive correlation
  2. URanker (0.3629) - Good positive correlation
  3. DBCNN (0.3605) - Strong negative correlation (interpretation differs)
  4. NRQM (0.3596) - Strong negative correlation
  5. BRISQUE (0.3509) - Strong negative correlation

📈 Key Findings

  • 9/15 metrics have statistically significant Pearson correlations (p < 0.05)
  • 10/15 metrics have statistically significant Spearman correlations (p < 0.05)
  • DEQA recommended as primary metric for identity document quality assessment
  • Average absolute correlation: 0.2713

🔍 Correlation Patterns

  • Positive correlations (higher IQA score = higher human quality): DEQA, URanker, NIMA
  • Negative correlations (lower IQA score = higher human quality): DBCNN, NRQM, BRISQUE, HyperIQA

📋 Detailed results: See docs/task/cni/evaluation_results.md for complete analysis.

Methodology

Data Sources

  • IQA Metrics: 15 different IQA metrics computed for 81 identity document images
  • Human Labels: Quality annotations (coherence scores 1-5) for the same 81 images
  • Correlation Analysis: Both Pearson (linear) and Spearman (rank) correlations

Statistical Analysis

  • Correlation Types: Pearson (linear) and Spearman (rank-order)
  • Significance Threshold: p < 0.05
  • Overall Score: Average of absolute Pearson and Spearman correlations

Recommendations

🎯 Primary Recommendation

Use DEQA as the primary IQA metric for identity document quality assessment due to its strong positive correlation (0.6185) with human quality judgments.

🔄 Robust Evaluation Strategy

Combine multiple metrics for comprehensive assessment:

  1. DEQA (primary) - Strong positive correlation
  2. URanker (secondary) - Good positive correlation
  3. NIMA (validation) - Moderate positive correlation

⚠️ Important Notes

  • Some metrics show negative correlations, indicating different quality interpretations
  • Consider dataset-specific calibration for better performance
  • Results may vary with different image types or quality ranges

Performance Considerations

  • Efficient Processing: Optimized for batch analysis of multiple metrics
  • Memory Management: Handles large datasets efficiently
  • Error Handling: Robust error handling for missing or corrupted data
  • Scalability: Designed to accommodate additional metrics

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests if applicable
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this framework in your research, please cite:

@software{iqa_metric_benchmark,
  title={IQA Metric Benchmark: Evaluation of IQA Metrics Against Human Quality Annotations},
  author={Your Name},
  year={2024},
  url={https://github.com/yourusername/IQA-Metric-Benchmark}
}

Support

For questions, issues, or contributions:

  • Open an issue on GitHub
  • Contact the maintainers
  • Check the documentation

Happy IQA Evaluation! 📊

Description
No description provided
Readme 560 KiB
Languages
Python 99.9%