ngpthanh15/IDcardsGenerator

Fork 0

Files

Nguyễn Phước Thành 4ee14f17d3 update ID_cards_detector

2025-08-06 19:03:17 +07:00

7.3 KiB

Raw Blame History

Evaluation Guide

Overview

This guide covers model evaluation procedures for YOLOv8 French ID Card Detection models.

🎯 Evaluation Process

1. Basic Evaluation

Evaluate the best trained model:

python eval.py

This will:

Automatically find the best model from runs/train/
Load the test dataset
Run evaluation on test set
Save results to runs/val/test_evaluation/

2. Custom Evaluation

Evaluate Specific Model

python eval.py --model runs/train/yolov8_n_french_id_card/weights/best.pt

Custom Thresholds

python eval.py --conf 0.3 --iou 0.5

Different Model Size

python eval.py --model-size m

📊 Evaluation Metrics

Key Metrics Explained

mAP50 (Mean Average Precision at IoU=0.5)
- Measures precision across different recall levels
- IoU threshold of 0.5 (50% overlap)
- Range: 0-1 (higher is better)
mAP50-95 (Mean Average Precision across IoU thresholds)
- Average of mAP at IoU thresholds from 0.5 to 0.95
- More comprehensive than mAP50
- Range: 0-1 (higher is better)
Precision
- Ratio of correct detections to total detections
- Measures accuracy of positive predictions
- Range: 0-1 (higher is better)
Recall
- Ratio of correct detections to total ground truth objects
- Measures ability to find all objects
- Range: 0-1 (higher is better)

Expected Performance

For French ID Card detection:

Metric	Target	Good	Excellent
mAP50	>0.8	>0.9	>0.95
mAP50-95	>0.6	>0.8	>0.9
Precision	>0.8	>0.9	>0.95
Recall	>0.8	>0.9	>0.95

📈 Understanding Results

Sample Output

Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14
  all        212        209          1       0.99      0.995      0.992

Interpretation:

Images: 212 test images
Instances: 209 ground truth objects
Box(P): Precision = 1.0 (100% accurate detections)
R: Recall = 0.99 (99% of objects found)
mAP50: 0.995 (excellent performance)
mAP50-95: 0.992 (excellent across IoU thresholds)

Confidence vs IoU Thresholds

Confidence Threshold Impact

# High confidence (fewer detections, higher precision)
python eval.py --conf 0.7

# Low confidence (more detections, lower precision)
python eval.py --conf 0.1

IoU Threshold Impact

# Strict IoU (higher precision requirements)
python eval.py --iou 0.7

# Lenient IoU (easier to match detections)
python eval.py --iou 0.3

📁 Evaluation Outputs

Results Directory Structure

runs/val/test_evaluation/
├── predictions.json      # Detailed predictions
├── results.png          # Performance plots
├── confusion_matrix.png  # Confusion matrix
├── BoxR_curve.png      # Precision-Recall curve
├── labels/             # Predicted labels
└── images/             # Visualization images

Key Output Files

predictions.json

{
  "metrics": {
    "metrics/mAP50": 0.995,
    "metrics/mAP50-95": 0.992,
    "metrics/precision": 1.0,
    "metrics/recall": 0.99
  }
}

results.png
- Training curves
- Loss plots
- Metric evolution
confusion_matrix.png
- True vs predicted classifications
- Error analysis

🔍 Advanced Evaluation

Batch Evaluation

Evaluate multiple models:

# Evaluate different model sizes
for size in n s m l; do
    python eval.py --model-size $size
done

Cross-Validation

# Evaluate with different data splits
python eval.py --data data/data_val1.yaml
python eval.py --data data/data_val2.yaml

Performance Analysis

Speed vs Accuracy Trade-off

Model Size	Inference Time	mAP50	Use Case
n (nano)	~2ms	0.995	Real-time
s (small)	~4ms	0.998	Balanced
m (medium)	~8ms	0.999	High accuracy
l (large)	~12ms	0.999	Best accuracy

📊 Visualization

Generated Plots

Precision-Recall Curve
- Shows precision vs recall at different thresholds
- Area under curve = mAP
Confusion Matrix
- True positives, false positives, false negatives
- Helps identify error patterns
Training Curves
- Loss evolution during training
- Metric progression

Custom Visualizations

# Load evaluation results
import json
with open('runs/val/test_evaluation/predictions.json', 'r') as f:
    results = json.load(f)

# Analyze specific metrics
mAP50 = results['metrics']['metrics/mAP50']
precision = results['metrics']['metrics/precision']
recall = results['metrics']['metrics/recall']

🔧 Troubleshooting

Common Evaluation Issues

1. Model Not Found

# Check available models
ls runs/train/*/weights/

# Specify model path explicitly
python eval.py --model path/to/model.pt

2. Test Data Not Found

# Validate data structure
python train.py --validate-only

# Check data.yaml paths
cat data/data.yaml

3. Memory Issues

# Reduce batch size
python eval.py --batch-size 8

# Use smaller model
python eval.py --model-size n

Debug Commands

# Check model file
python -c "import torch; model = torch.load('model.pt'); print(model.keys())"

# Validate data paths
python -c "import yaml; data = yaml.safe_load(open('data/data.yaml')); print(data)"

# Test GPU availability
python -c "import torch; print(torch.cuda.is_available())"

📋 Evaluation Checklist

Model trained successfully
Test dataset available
GPU memory sufficient
Correct model path
Appropriate thresholds set
Results directory writable

🎯 Best Practices

1. Threshold Selection

# Start with default thresholds
python eval.py

# Adjust based on use case
python eval.py --conf 0.5 --iou 0.5  # Balanced
python eval.py --conf 0.7 --iou 0.7  # High precision
python eval.py --conf 0.3 --iou 0.3  # High recall

2. Model Comparison

# Compare different models
python eval.py --model-size n
python eval.py --model-size s
python eval.py --model-size m

# Compare results
diff runs/val/test_evaluation_n/predictions.json \
     runs/val/test_evaluation_s/predictions.json

3. Performance Monitoring

# Regular evaluation
python eval.py --model-size n

# Log results
echo "$(date): mAP50=$(grep 'mAP50' runs/val/test_evaluation/predictions.json)" >> eval_log.txt

📈 Continuous Evaluation

Automated Evaluation

#!/bin/bash
# eval_script.sh

MODEL_SIZE=${1:-n}
THRESHOLD=${2:-0.25}

echo "Evaluating model size: $MODEL_SIZE"
python eval.py --model-size $MODEL_SIZE --conf $THRESHOLD

# Save results
cp runs/val/test_evaluation/predictions.json \
   results/eval_${MODEL_SIZE}_$(date +%Y%m%d).json

Integration with CI/CD

# .github/workflows/evaluate.yml
name: Model Evaluation
on: [push, pull_request]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Evaluate Model
        run: |
          pip install -r requirements.txt
          python eval.py --model-size n

Note: Regular evaluation helps ensure model performance remains consistent over time.

7.3 KiB Raw Blame History