update ID_cards_detector
This commit is contained in:
340
src/model/ID_cards_detector/docs/evaluation.md
Normal file
340
src/model/ID_cards_detector/docs/evaluation.md
Normal file
@@ -0,0 +1,340 @@
|
||||
# Evaluation Guide
|
||||
|
||||
## Overview
|
||||
|
||||
This guide covers model evaluation procedures for YOLOv8 French ID Card Detection models.
|
||||
|
||||
## 🎯 Evaluation Process
|
||||
|
||||
### 1. Basic Evaluation
|
||||
|
||||
Evaluate the best trained model:
|
||||
|
||||
```bash
|
||||
python eval.py
|
||||
```
|
||||
|
||||
This will:
|
||||
- Automatically find the best model from `runs/train/`
|
||||
- Load the test dataset
|
||||
- Run evaluation on test set
|
||||
- Save results to `runs/val/test_evaluation/`
|
||||
|
||||
### 2. Custom Evaluation
|
||||
|
||||
#### Evaluate Specific Model
|
||||
```bash
|
||||
python eval.py --model runs/train/yolov8_n_french_id_card/weights/best.pt
|
||||
```
|
||||
|
||||
#### Custom Thresholds
|
||||
```bash
|
||||
python eval.py --conf 0.3 --iou 0.5
|
||||
```
|
||||
|
||||
#### Different Model Size
|
||||
```bash
|
||||
python eval.py --model-size m
|
||||
```
|
||||
|
||||
## 📊 Evaluation Metrics
|
||||
|
||||
### Key Metrics Explained
|
||||
|
||||
1. **mAP50 (Mean Average Precision at IoU=0.5)**
|
||||
- Measures precision across different recall levels
|
||||
- IoU threshold of 0.5 (50% overlap)
|
||||
- Range: 0-1 (higher is better)
|
||||
|
||||
2. **mAP50-95 (Mean Average Precision across IoU thresholds)**
|
||||
- Average of mAP at IoU thresholds from 0.5 to 0.95
|
||||
- More comprehensive than mAP50
|
||||
- Range: 0-1 (higher is better)
|
||||
|
||||
3. **Precision**
|
||||
- Ratio of correct detections to total detections
|
||||
- Measures accuracy of positive predictions
|
||||
- Range: 0-1 (higher is better)
|
||||
|
||||
4. **Recall**
|
||||
- Ratio of correct detections to total ground truth objects
|
||||
- Measures ability to find all objects
|
||||
- Range: 0-1 (higher is better)
|
||||
|
||||
### Expected Performance
|
||||
|
||||
For French ID Card detection:
|
||||
|
||||
| Metric | Target | Good | Excellent |
|
||||
|--------|--------|------|-----------|
|
||||
| mAP50 | >0.8 | >0.9 | >0.95 |
|
||||
| mAP50-95| >0.6 | >0.8 | >0.9 |
|
||||
| Precision| >0.8 | >0.9 | >0.95 |
|
||||
| Recall | >0.8 | >0.9 | >0.95 |
|
||||
|
||||
## 📈 Understanding Results
|
||||
|
||||
### Sample Output
|
||||
|
||||
```
|
||||
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 14/14
|
||||
all 212 209 1 0.99 0.995 0.992
|
||||
```
|
||||
|
||||
**Interpretation:**
|
||||
- **Images**: 212 test images
|
||||
- **Instances**: 209 ground truth objects
|
||||
- **Box(P)**: Precision = 1.0 (100% accurate detections)
|
||||
- **R**: Recall = 0.99 (99% of objects found)
|
||||
- **mAP50**: 0.995 (excellent performance)
|
||||
- **mAP50-95**: 0.992 (excellent across IoU thresholds)
|
||||
|
||||
### Confidence vs IoU Thresholds
|
||||
|
||||
#### Confidence Threshold Impact
|
||||
```bash
|
||||
# High confidence (fewer detections, higher precision)
|
||||
python eval.py --conf 0.7
|
||||
|
||||
# Low confidence (more detections, lower precision)
|
||||
python eval.py --conf 0.1
|
||||
```
|
||||
|
||||
#### IoU Threshold Impact
|
||||
```bash
|
||||
# Strict IoU (higher precision requirements)
|
||||
python eval.py --iou 0.7
|
||||
|
||||
# Lenient IoU (easier to match detections)
|
||||
python eval.py --iou 0.3
|
||||
```
|
||||
|
||||
## 📁 Evaluation Outputs
|
||||
|
||||
### Results Directory Structure
|
||||
|
||||
```
|
||||
runs/val/test_evaluation/
|
||||
├── predictions.json # Detailed predictions
|
||||
├── results.png # Performance plots
|
||||
├── confusion_matrix.png # Confusion matrix
|
||||
├── BoxR_curve.png # Precision-Recall curve
|
||||
├── labels/ # Predicted labels
|
||||
└── images/ # Visualization images
|
||||
```
|
||||
|
||||
### Key Output Files
|
||||
|
||||
1. **predictions.json**
|
||||
```json
|
||||
{
|
||||
"metrics": {
|
||||
"metrics/mAP50": 0.995,
|
||||
"metrics/mAP50-95": 0.992,
|
||||
"metrics/precision": 1.0,
|
||||
"metrics/recall": 0.99
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
2. **results.png**
|
||||
- Training curves
|
||||
- Loss plots
|
||||
- Metric evolution
|
||||
|
||||
3. **confusion_matrix.png**
|
||||
- True vs predicted classifications
|
||||
- Error analysis
|
||||
|
||||
## 🔍 Advanced Evaluation
|
||||
|
||||
### Batch Evaluation
|
||||
|
||||
Evaluate multiple models:
|
||||
|
||||
```bash
|
||||
# Evaluate different model sizes
|
||||
for size in n s m l; do
|
||||
python eval.py --model-size $size
|
||||
done
|
||||
```
|
||||
|
||||
### Cross-Validation
|
||||
|
||||
```bash
|
||||
# Evaluate with different data splits
|
||||
python eval.py --data data/data_val1.yaml
|
||||
python eval.py --data data/data_val2.yaml
|
||||
```
|
||||
|
||||
### Performance Analysis
|
||||
|
||||
#### Speed vs Accuracy Trade-off
|
||||
|
||||
| Model Size | Inference Time | mAP50 | Use Case |
|
||||
|------------|----------------|-------|----------|
|
||||
| n (nano) | ~2ms | 0.995 | Real-time |
|
||||
| s (small) | ~4ms | 0.998 | Balanced |
|
||||
| m (medium) | ~8ms | 0.999 | High accuracy |
|
||||
| l (large) | ~12ms | 0.999 | Best accuracy |
|
||||
|
||||
## 📊 Visualization
|
||||
|
||||
### Generated Plots
|
||||
|
||||
1. **Precision-Recall Curve**
|
||||
- Shows precision vs recall at different thresholds
|
||||
- Area under curve = mAP
|
||||
|
||||
2. **Confusion Matrix**
|
||||
- True positives, false positives, false negatives
|
||||
- Helps identify error patterns
|
||||
|
||||
3. **Training Curves**
|
||||
- Loss evolution during training
|
||||
- Metric progression
|
||||
|
||||
### Custom Visualizations
|
||||
|
||||
```python
|
||||
# Load evaluation results
|
||||
import json
|
||||
with open('runs/val/test_evaluation/predictions.json', 'r') as f:
|
||||
results = json.load(f)
|
||||
|
||||
# Analyze specific metrics
|
||||
mAP50 = results['metrics']['metrics/mAP50']
|
||||
precision = results['metrics']['metrics/precision']
|
||||
recall = results['metrics']['metrics/recall']
|
||||
```
|
||||
|
||||
## 🔧 Troubleshooting
|
||||
|
||||
### Common Evaluation Issues
|
||||
|
||||
**1. Model Not Found**
|
||||
```bash
|
||||
# Check available models
|
||||
ls runs/train/*/weights/
|
||||
|
||||
# Specify model path explicitly
|
||||
python eval.py --model path/to/model.pt
|
||||
```
|
||||
|
||||
**2. Test Data Not Found**
|
||||
```bash
|
||||
# Validate data structure
|
||||
python train.py --validate-only
|
||||
|
||||
# Check data.yaml paths
|
||||
cat data/data.yaml
|
||||
```
|
||||
|
||||
**3. Memory Issues**
|
||||
```bash
|
||||
# Reduce batch size
|
||||
python eval.py --batch-size 8
|
||||
|
||||
# Use smaller model
|
||||
python eval.py --model-size n
|
||||
```
|
||||
|
||||
### Debug Commands
|
||||
|
||||
```bash
|
||||
# Check model file
|
||||
python -c "import torch; model = torch.load('model.pt'); print(model.keys())"
|
||||
|
||||
# Validate data paths
|
||||
python -c "import yaml; data = yaml.safe_load(open('data/data.yaml')); print(data)"
|
||||
|
||||
# Test GPU availability
|
||||
python -c "import torch; print(torch.cuda.is_available())"
|
||||
```
|
||||
|
||||
## 📋 Evaluation Checklist
|
||||
|
||||
- [ ] Model trained successfully
|
||||
- [ ] Test dataset available
|
||||
- [ ] GPU memory sufficient
|
||||
- [ ] Correct model path
|
||||
- [ ] Appropriate thresholds set
|
||||
- [ ] Results directory writable
|
||||
|
||||
## 🎯 Best Practices
|
||||
|
||||
### 1. Threshold Selection
|
||||
|
||||
```bash
|
||||
# Start with default thresholds
|
||||
python eval.py
|
||||
|
||||
# Adjust based on use case
|
||||
python eval.py --conf 0.5 --iou 0.5 # Balanced
|
||||
python eval.py --conf 0.7 --iou 0.7 # High precision
|
||||
python eval.py --conf 0.3 --iou 0.3 # High recall
|
||||
```
|
||||
|
||||
### 2. Model Comparison
|
||||
|
||||
```bash
|
||||
# Compare different models
|
||||
python eval.py --model-size n
|
||||
python eval.py --model-size s
|
||||
python eval.py --model-size m
|
||||
|
||||
# Compare results
|
||||
diff runs/val/test_evaluation_n/predictions.json \
|
||||
runs/val/test_evaluation_s/predictions.json
|
||||
```
|
||||
|
||||
### 3. Performance Monitoring
|
||||
|
||||
```bash
|
||||
# Regular evaluation
|
||||
python eval.py --model-size n
|
||||
|
||||
# Log results
|
||||
echo "$(date): mAP50=$(grep 'mAP50' runs/val/test_evaluation/predictions.json)" >> eval_log.txt
|
||||
```
|
||||
|
||||
## 📈 Continuous Evaluation
|
||||
|
||||
### Automated Evaluation
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# eval_script.sh
|
||||
|
||||
MODEL_SIZE=${1:-n}
|
||||
THRESHOLD=${2:-0.25}
|
||||
|
||||
echo "Evaluating model size: $MODEL_SIZE"
|
||||
python eval.py --model-size $MODEL_SIZE --conf $THRESHOLD
|
||||
|
||||
# Save results
|
||||
cp runs/val/test_evaluation/predictions.json \
|
||||
results/eval_${MODEL_SIZE}_$(date +%Y%m%d).json
|
||||
```
|
||||
|
||||
### Integration with CI/CD
|
||||
|
||||
```yaml
|
||||
# .github/workflows/evaluate.yml
|
||||
name: Model Evaluation
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
evaluate:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Evaluate Model
|
||||
run: |
|
||||
pip install -r requirements.txt
|
||||
python eval.py --model-size n
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
**Note**: Regular evaluation helps ensure model performance remains consistent over time.
|
Reference in New Issue
Block a user