update ID_cards_detector

2025-08-06 19:03:17 +07:00
parent 02f0936497
commit 4ee14f17d3
22 changed files with 3145 additions and 724 deletions
--- a/src/model/ID_cards_detector/docs/evaluation.md
+++ b/src/model/ID_cards_detector/docs/evaluation.md
@@ -0,0 +1,340 @@
+# Evaluation Guide
+
+## Overview
+
+This guide covers model evaluation procedures for YOLOv8 French ID Card Detection models.
+
+## 🎯 Evaluation Process
+
+### 1. Basic Evaluation
+
+Evaluate the best trained model:
+
+```bash
+python eval.py
+```
+
+This will:
+- Automatically find the best model from `runs/train/`
+- Load the test dataset
+- Run evaluation on test set
+- Save results to `runs/val/test_evaluation/`
+
+### 2. Custom Evaluation
+
+#### Evaluate Specific Model
+```bash
+python eval.py --model runs/train/yolov8_n_french_id_card/weights/best.pt
+```
+
+#### Custom Thresholds
+```bash
+python eval.py --conf 0.3 --iou 0.5
+```
+
+#### Different Model Size
+```bash
+python eval.py --model-size m
+```
+
+## 📊 Evaluation Metrics
+
+### Key Metrics Explained
+
+1. **mAP50 (Mean Average Precision at IoU=0.5)**
+   - Measures precision across different recall levels
+   - IoU threshold of 0.5 (50% overlap)
+   - Range: 0-1 (higher is better)
+
+2. **mAP50-95 (Mean Average Precision across IoU thresholds)**
+   - Average of mAP at IoU thresholds from 0.5 to 0.95
+   - More comprehensive than mAP50
+   - Range: 0-1 (higher is better)
+
+3. **Precision**
+   - Ratio of correct detections to total detections
+   - Measures accuracy of positive predictions
+   - Range: 0-1 (higher is better)
+
+4. **Recall**
+   - Ratio of correct detections to total ground truth objects
+   - Measures ability to find all objects
+   - Range: 0-1 (higher is better)
+
+### Expected Performance
+
+For French ID Card detection:
+
+| Metric | Target | Good | Excellent |
+|--------|--------|------|-----------|
+| mAP50  | >0.8   | >0.9 | >0.95     |
+| mAP50-95| >0.6   | >0.8 | >0.9      |
+| Precision| >0.8   | >0.9 | >0.95     |
+| Recall | >0.8   | >0.9 | >0.95     |
+
+## 📈 Understanding Results
+
+### Sample Output
+
+```
+Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 14/14
+  all        212        209          1       0.99      0.995      0.992
+```
+
+**Interpretation:**
+- **Images**: 212 test images
+- **Instances**: 209 ground truth objects
+- **Box(P)**: Precision = 1.0 (100% accurate detections)
+- **R**: Recall = 0.99 (99% of objects found)
+- **mAP50**: 0.995 (excellent performance)
+- **mAP50-95**: 0.992 (excellent across IoU thresholds)
+
+### Confidence vs IoU Thresholds
+
+#### Confidence Threshold Impact
+```bash
+# High confidence (fewer detections, higher precision)
+python eval.py --conf 0.7
+
+# Low confidence (more detections, lower precision)
+python eval.py --conf 0.1
+```
+
+#### IoU Threshold Impact
+```bash
+# Strict IoU (higher precision requirements)
+python eval.py --iou 0.7
+
+# Lenient IoU (easier to match detections)
+python eval.py --iou 0.3
+```
+
+## 📁 Evaluation Outputs
+
+### Results Directory Structure
+
+```
+runs/val/test_evaluation/
+├── predictions.json      # Detailed predictions
+├── results.png          # Performance plots
+├── confusion_matrix.png  # Confusion matrix
+├── BoxR_curve.png      # Precision-Recall curve
+├── labels/             # Predicted labels
+└── images/             # Visualization images
+```
+
+### Key Output Files
+
+1. **predictions.json**
+   ```json
+   {
+     "metrics": {
+       "metrics/mAP50": 0.995,
+       "metrics/mAP50-95": 0.992,
+       "metrics/precision": 1.0,
+       "metrics/recall": 0.99
+     }
+   }
+   ```
+
+2. **results.png**
+   - Training curves
+   - Loss plots
+   - Metric evolution
+
+3. **confusion_matrix.png**
+   - True vs predicted classifications
+   - Error analysis
+
+## 🔍 Advanced Evaluation
+
+### Batch Evaluation
+
+Evaluate multiple models:
+
+```bash
+# Evaluate different model sizes
+for size in n s m l; do
+    python eval.py --model-size $size
+done
+```
+
+### Cross-Validation
+
+```bash
+# Evaluate with different data splits
+python eval.py --data data/data_val1.yaml
+python eval.py --data data/data_val2.yaml
+```
+
+### Performance Analysis
+
+#### Speed vs Accuracy Trade-off
+
+| Model Size | Inference Time | mAP50 | Use Case |
+|------------|----------------|-------|----------|
+| n (nano)   | ~2ms          | 0.995 | Real-time |
+| s (small)  | ~4ms          | 0.998 | Balanced |
+| m (medium) | ~8ms          | 0.999 | High accuracy |
+| l (large)  | ~12ms         | 0.999 | Best accuracy |
+
+## 📊 Visualization
+
+### Generated Plots
+
+1. **Precision-Recall Curve**
+   - Shows precision vs recall at different thresholds
+   - Area under curve = mAP
+
+2. **Confusion Matrix**
+   - True positives, false positives, false negatives
+   - Helps identify error patterns
+
+3. **Training Curves**
+   - Loss evolution during training
+   - Metric progression
+
+### Custom Visualizations
+
+```python
+# Load evaluation results
+import json
+with open('runs/val/test_evaluation/predictions.json', 'r') as f:
+    results = json.load(f)
+
+# Analyze specific metrics
+mAP50 = results['metrics']['metrics/mAP50']
+precision = results['metrics']['metrics/precision']
+recall = results['metrics']['metrics/recall']
+```
+
+## 🔧 Troubleshooting
+
+### Common Evaluation Issues
+
+**1. Model Not Found**
+```bash
+# Check available models
+ls runs/train/*/weights/
+
+# Specify model path explicitly
+python eval.py --model path/to/model.pt
+```
+
+**2. Test Data Not Found**
+```bash
+# Validate data structure
+python train.py --validate-only
+
+# Check data.yaml paths
+cat data/data.yaml
+```
+
+**3. Memory Issues**
+```bash
+# Reduce batch size
+python eval.py --batch-size 8
+
+# Use smaller model
+python eval.py --model-size n
+```
+
+### Debug Commands
+
+```bash
+# Check model file
+python -c "import torch; model = torch.load('model.pt'); print(model.keys())"
+
+# Validate data paths
+python -c "import yaml; data = yaml.safe_load(open('data/data.yaml')); print(data)"
+
+# Test GPU availability
+python -c "import torch; print(torch.cuda.is_available())"
+```
+
+## 📋 Evaluation Checklist
+
+- [ ] Model trained successfully
+- [ ] Test dataset available
+- [ ] GPU memory sufficient
+- [ ] Correct model path
+- [ ] Appropriate thresholds set
+- [ ] Results directory writable
+
+## 🎯 Best Practices
+
+### 1. Threshold Selection
+
+```bash
+# Start with default thresholds
+python eval.py
+
+# Adjust based on use case
+python eval.py --conf 0.5 --iou 0.5  # Balanced
+python eval.py --conf 0.7 --iou 0.7  # High precision
+python eval.py --conf 0.3 --iou 0.3  # High recall
+```
+
+### 2. Model Comparison
+
+```bash
+# Compare different models
+python eval.py --model-size n
+python eval.py --model-size s
+python eval.py --model-size m
+
+# Compare results
+diff runs/val/test_evaluation_n/predictions.json \
+     runs/val/test_evaluation_s/predictions.json
+```
+
+### 3. Performance Monitoring
+
+```bash
+# Regular evaluation
+python eval.py --model-size n
+
+# Log results
+echo "$(date): mAP50=$(grep 'mAP50' runs/val/test_evaluation/predictions.json)" >> eval_log.txt
+```
+
+## 📈 Continuous Evaluation
+
+### Automated Evaluation
+
+```bash
+#!/bin/bash
+# eval_script.sh
+
+MODEL_SIZE=${1:-n}
+THRESHOLD=${2:-0.25}
+
+echo "Evaluating model size: $MODEL_SIZE"
+python eval.py --model-size $MODEL_SIZE --conf $THRESHOLD
+
+# Save results
+cp runs/val/test_evaluation/predictions.json \
+   results/eval_${MODEL_SIZE}_$(date +%Y%m%d).json
+```
+
+### Integration with CI/CD
+
+```yaml
+# .github/workflows/evaluate.yml
+name: Model Evaluation
+on: [push, pull_request]
+
+jobs:
+  evaluate:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - name: Evaluate Model
+        run: |
+          pip install -r requirements.txt
+          python eval.py --model-size n
+```
+
+---
+
+**Note**: Regular evaluation helps ensure model performance remains consistent over time.