update augment + YOLO pipeline

This commit is contained in:
Nguyễn Phước Thành
2025-08-06 20:52:39 +07:00
parent 4ee14f17d3
commit 51d3a66cc4
9 changed files with 989 additions and 407 deletions

9
.gitignore vendored
View File

@@ -16,4 +16,11 @@
*.pt
*.ipynb
*.pyc
*.log
*.log
!docs/
!docs/**/*.png
!docs/**/*.jpg
!docs/**/*.jpeg
!docs/**/*.gif
!docs/**/*.svg

423
README.md
View File

@@ -1,132 +1,148 @@
# ID Cards Data Augmentation Tool
# ID Card Data Augmentation Pipeline
A comprehensive data augmentation tool specifically designed for ID card images, implementing 7 different augmentation techniques to simulate real-world scenarios.
A comprehensive data augmentation pipeline for ID card images with YOLO-based detection and advanced augmentation techniques.
## 🎯 Overview
![Pipeline Overview](docs/images/yolov8_pipeline.png)
This tool provides data augmentation capabilities for ID card images, implementing various transformation techniques that mimic real-world conditions such as worn-out cards, partial occlusion, different lighting conditions, and more.
## 🚀 Features
## ✨ Features
### **YOLO-based ID Card Detection**
- Automatic detection and cropping of ID cards from large images
- Configurable confidence and IoU thresholds
- Multiple cropping modes (bbox, square, aspect_ratio)
- Padding and target size customization
### 7 Augmentation Techniques
### **Advanced Data Augmentation**
- **Geometric Transformations**: Rotation with multiple angles
- **Random Cropping**: Simulates partially visible cards
- **Noise Addition**: Simulates worn-out cards
- **Partial Blockage**: Simulates occluded card details
- **Blurring**: Simulates blurred but readable images
- **Brightness/Contrast**: Mimics different lighting conditions
- **Grayscale Conversion**: Final preprocessing step for all images
1. **Rotation** - Simulates cards at different angles
2. **Random Cropping** - Simulates partially visible cards
3. **Random Noise** - Simulates worn-out cards
4. **Horizontal Blockage** - Simulates occluded card details
5. **Grayscale Transformation** - Simulates Xerox/scan copies
6. **Blurring** - Simulates blurred but readable cards
7. **Brightness & Contrast** - Simulates different lighting conditions
### **Flexible Configuration**
- YAML-based configuration system
- Command-line argument overrides
- Environment-specific settings
- Comprehensive logging
### Key Features
## 📋 Requirements
- **Separate Methods**: Each augmentation technique is applied independently
- **Quality Preservation**: Maintains image quality with white background preservation
- **OpenCV Integration**: Uses OpenCV functions for reliable image processing
- **Configurable**: Easy configuration through YAML files
- **Progress Tracking**: Real-time progress monitoring
- **Batch Processing**: Process multiple images efficiently
```bash
# Python 3.8+
conda create -n gpu python=3.8
conda activate gpu
## 🚀 Installation
# Install dependencies
pip install -r requirements.txt
```
### Prerequisites
### Dependencies
- `opencv-python>=4.5.0`
- `numpy>=1.21.0`
- `Pillow>=8.3.0`
- `PyYAML>=5.4.0`
- `ultralytics>=8.0.0` (for YOLO models)
- Python 3.7+
- OpenCV
- NumPy
- PyYAML
- PIL (Pillow)
## 🛠️ Installation
### Setup
1. **Clone the repository**:
1. **Clone the repository**
```bash
git clone <repository-url>
cd IDcardsGenerator
```
2. **Install dependencies**:
2. **Install dependencies**
```bash
pip install opencv-python numpy pyyaml pillow
pip install -r requirements.txt
```
3. **Activate conda environment** (if using GPU):
3. **Prepare YOLO model** (optional)
```bash
conda activate gpu
# Place your trained YOLO model at:
data/weights/id_cards_yolov8n.pt
```
## 📁 Project Structure
## 📖 Usage
```
IDcardsGenerator/
├── config/
│ └── config.yaml # Main configuration file
├── data/
│ └── IDcards/
│ └── processed/ # Input images directory
├── src/
│ ├── data_augmentation.py # Core augmentation logic
│ ├── config_manager.py # Configuration management
│ ├── image_processor.py # Image processing utilities
│ └── utils.py # Utility functions
├── logs/ # Log files
├── out/ # Output directory
└── main.py # Main script
### **Basic Usage**
```bash
# Run with default configuration
python main.py
# Run with ID card detection enabled
python main.py --enable-id-detection
# Run with custom input/output directories
python main.py --input-dir "path/to/input" --output-dir "path/to/output"
```
## ⚙️ Configuration
### **Configuration Options**
### Main Configuration (`config/config.yaml`)
#### **ID Card Detection**
```bash
# Enable detection with custom model
python main.py --enable-id-detection --model-path "path/to/model.pt"
# Adjust detection parameters
python main.py --enable-id-detection --confidence 0.3 --crop-mode square
# Set target size for cropped cards
python main.py --enable-id-detection --crop-target-size "640,640"
```
#### **Data Augmentation**
```bash
# Customize augmentation parameters
python main.py --num-augmentations 5 --target-size "512,512"
# Preview augmentation results
python main.py --preview
```
### **Configuration File**
Edit `config/config.yaml` for persistent settings:
```yaml
# Data augmentation parameters
# ID Card Detection
id_card_detection:
enabled: false # Enable/disable YOLO detection
model_path: "data/weights/id_cards_yolov8n.pt"
confidence_threshold: 0.25
iou_threshold: 0.45
padding: 10
crop_mode: "bbox"
target_size: null
# Data Augmentation
augmentation:
# Rotation
rotation:
enabled: true
angles: [30, 60, 120, 150, 180, 210, 240, 300, 330]
probability: 1.0
# Random cropping
random_cropping:
enabled: true
ratio_range: [0.7, 1.0]
probability: 1.0
# Random noise
random_noise:
enabled: true
mean_range: [0.0, 0.7]
variance_range: [0.0, 0.1]
probability: 1.0
# Partial blockage
partial_blockage:
enabled: true
num_occlusions_range: [1, 100]
coverage_range: [0.0, 0.25]
variance_range: [0.0, 0.1]
probability: 1.0
# Grayscale transformation
grayscale:
enabled: true
probability: 1.0
# Blurring
blurring:
enabled: true
kernel_ratio_range: [0.0, 0.0084]
probability: 1.0
# Brightness and contrast
brightness_contrast:
enabled: true
alpha_range: [0.4, 3.0]
beta_range: [1, 100]
probability: 1.0
grayscale:
enabled: true # Applied as final step
# Processing configuration
# Processing
processing:
target_size: [640, 640]
num_augmentations: 3
@@ -134,156 +150,139 @@ processing:
quality: 95
```
## 🎮 Usage
## 🔄 Workflow
### Basic Usage
### **Two-Step Processing Pipeline**
#### **Step 1: ID Card Detection (Optional)**
When `id_card_detection.enabled: true`:
1. **Input**: Large images containing multiple ID cards
2. **YOLO Detection**: Locate and detect ID cards
3. **Cropping**: Extract individual ID cards with padding
4. **Output**: Cropped ID cards saved to `out/processed/`
#### **Step 2: Data Augmentation**
1. **Input**: Original images OR cropped ID cards
2. **Augmentation**: Apply 6 augmentation methods:
- Rotation (9 different angles)
- Random cropping (70-100% ratio)
- Random noise (simulate wear)
- Partial blockage (simulate occlusion)
- Blurring (simulate motion blur)
- Brightness/Contrast adjustment
3. **Grayscale**: Convert all images to grayscale (final step)
4. **Output**: Augmented images in main output directory
### **Direct Augmentation Mode**
When `id_card_detection.enabled: false`:
- Skips YOLO detection
- Applies augmentation directly to input images
- All images are converted to grayscale
## 📊 Output Structure
```
output_directory/
├── processed/ # Cropped ID cards (if detection enabled)
│ ├── id_card_001.jpg
│ ├── id_card_002.jpg
│ └── processing_summary.json
├── im1__rotation_01.png # Augmented images
├── im1__cropping_01.png
├── im1__noise_01.png
├── im1__blockage_01.png
├── im1__blurring_01.png
├── im1__brightness_contrast_01.png
└── augmentation_summary.json
```
## 🎯 Use Cases
### **Training Data Generation**
```bash
python main.py --input-dir data/IDcards/processed --output-dir out
# Generate diverse training data
python main.py --enable-id-detection --num-augmentations 10
```
### Command Line Options
### **Quality Control**
```bash
python main.py [OPTIONS]
Options:
--config CONFIG Path to configuration file (default: config/config.yaml)
--input-dir INPUT_DIR Input directory containing images
--output-dir OUTPUT_DIR Output directory for augmented images
--num-augmentations N Number of augmented versions per image (default: 3)
--target-size SIZE Target size for images (width x height)
--preview Preview augmentation on first image only
--info Show information about images in input directory
--list-presets List available presets and exit
--log-level LEVEL Logging level (DEBUG, INFO, WARNING, ERROR)
# Preview results before processing
python main.py --preview
```
### Examples
1. **Preview augmentation**:
### **Batch Processing**
```bash
python main.py --preview --input-dir data/IDcards/processed --output-dir test_output
# Process large datasets
python main.py --input-dir "large_dataset/" --output-dir "augmented_dataset/"
```
2. **Show image information**:
```bash
python main.py --info --input-dir data/IDcards/processed
## ⚙️ Advanced Configuration
### **Custom Augmentation Parameters**
```yaml
augmentation:
rotation:
angles: [45, 90, 135, 180, 225, 270, 315] # Custom angles
random_cropping:
ratio_range: [0.8, 0.95] # Tighter cropping
random_noise:
mean_range: [0.1, 0.5] # More noise
variance_range: [0.05, 0.15]
```
3. **Custom number of augmentations**:
```bash
python main.py --input-dir data/IDcards/processed --output-dir out --num-augmentations 5
### **Performance Optimization**
```yaml
performance:
num_workers: 4
prefetch_factor: 2
pin_memory: true
use_gpu: false
```
4. **Custom target size**:
```bash
python main.py --input-dir data/IDcards/processed --output-dir out --target-size 512x512
```
## 📊 Output
### File Naming Convention
The tool creates separate files for each augmentation method:
```
im1_rotation_01.png # Rotation method
im1_cropping_01.png # Random cropping method
im1_noise_01.png # Random noise method
im1_blockage_01.png # Partial blockage method
im1_grayscale_01.png # Grayscale method
im1_blurring_01.png # Blurring method
im1_brightness_contrast_01.png # Brightness/contrast method
```
### Output Summary
After processing, you'll see a summary like:
```
==================================================
AUGMENTATION SUMMARY
==================================================
Original images: 106
Augmented images: 2226
Augmentation ratio: 21.00
Successful augmentations: 106
Output directory: out
==================================================
```
## 🔧 Augmentation Techniques Details
### 1. Rotation
- **Purpose**: Simulates cards at different angles
- **Angles**: 30°, 60°, 120°, 150°, 180°, 210°, 240°, 300°, 330°
- **Method**: OpenCV rotation with white background preservation
### 2. Random Cropping
- **Purpose**: Simulates partially visible ID cards
- **Ratio Range**: 0.7 to 1.0 (70% to 100% of original size)
- **Method**: Random crop with white background preservation
### 3. Random Noise
- **Purpose**: Simulates worn-out cards
- **Mean Range**: 0.0 to 0.7
- **Variance Range**: 0.0 to 0.1
- **Method**: Gaussian noise addition
### 4. Horizontal Blockage
- **Purpose**: Simulates occluded card details
- **Lines**: 1 to 100 horizontal lines
- **Coverage**: 0% to 25% of image area
- **Colors**: Multiple colors to simulate various objects
### 5. Grayscale Transformation
- **Purpose**: Simulates Xerox/scan copies
- **Method**: OpenCV `cv2.cvtColor()` function
- **Output**: 3-channel grayscale image
### 6. Blurring
- **Purpose**: Simulates blurred but readable cards
- **Kernel Ratio**: 0.0 to 0.0084
- **Method**: OpenCV `cv2.filter2D()` with Gaussian kernel
### 7. Brightness & Contrast
- **Purpose**: Simulates different lighting conditions
- **Alpha Range**: 0.4 to 3.0 (contrast)
- **Beta Range**: 1 to 100 (brightness)
- **Method**: OpenCV `cv2.convertScaleAbs()`
## 🛠️ Development
### Adding New Augmentation Methods
1. Add the method to `src/data_augmentation.py`
2. Update configuration in `config/config.yaml`
3. Update default config in `src/config_manager.py`
4. Test with preview mode
### Code Structure
- **`main.py`**: Entry point and command-line interface
- **`src/data_augmentation.py`**: Core augmentation logic
- **`src/config_manager.py`**: Configuration management
- **`src/image_processor.py`**: Image processing utilities
- **`src/utils.py`**: Utility functions
## 📝 Logging
The tool provides comprehensive logging:
The system provides comprehensive logging:
- **File**: `logs/data_augmentation.log`
- **Console**: Real-time progress updates
- **Summary**: JSON files with processing statistics
- **File logging**: `logs/data_augmentation.log`
- **Console logging**: Real-time progress updates
- **Log levels**: DEBUG, INFO, WARNING, ERROR
### **Log Levels**
- `INFO`: General processing information
- `WARNING`: Non-critical issues (e.g., no cards detected)
- `ERROR`: Critical errors
## 🔧 Troubleshooting
### **Common Issues**
1. **No images detected**
- Check input directory path
- Verify image formats (jpg, png, bmp, tiff)
- Ensure images are not corrupted
2. **YOLO model not found**
- Place model file at `data/weights/id_cards_yolov8n.pt`
- Or specify custom path with `--model-path`
3. **Memory issues**
- Reduce `num_augmentations`
- Use smaller `target_size`
- Enable GPU if available
### **Performance Tips**
- **GPU Acceleration**: Set `use_gpu: true` in config
- **Batch Processing**: Use multiple workers for large datasets
- **Memory Management**: Process in smaller batches
## 🤝 Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
4. Add tests if applicable
5. Submit a pull request
## 📄 License
@@ -292,18 +291,10 @@ This project is licensed under the MIT License - see the LICENSE file for detail
## 🙏 Acknowledgments
- OpenCV for image processing capabilities
- NumPy for numerical operations
- PyYAML for configuration management
## 📞 Support
For issues and questions:
1. Check the logs in `logs/data_augmentation.log`
2. Review the configuration in `config/config.yaml`
3. Test with preview mode first
4. Create an issue with detailed information
- **YOLOv8**: Ultralytics for the detection framework
- **OpenCV**: Computer vision operations
- **NumPy**: Numerical computations
---
**Note**: This tool is specifically designed for ID card augmentation and may need adjustments for other image types.
**For questions and support, please open an issue on GitHub.**

View File

@@ -7,6 +7,17 @@ paths:
output_dir: "out1"
log_file: "logs/data_augmentation.log"
# ID Card Detection configuration
id_card_detection:
enabled: false # Bật/tắt tính năng detect và crop ID cards
model_path: "data/weights/id_cards_yolov8n.pt" # Đường dẫn đến YOLO model
confidence_threshold: 0.25 # Confidence threshold cho detection
iou_threshold: 0.45 # IoU threshold cho NMS
padding: 10 # Padding thêm xung quanh bbox
crop_mode: "bbox" # Mode cắt: bbox, square, aspect_ratio
target_size: null # Kích thước target (width, height) hoặc null
save_original_crops: true # Có lưu ảnh gốc đã crop không
# Data augmentation parameters - ROTATION and RANDOM CROPPING
augmentation:
# Geometric transformations
@@ -36,11 +47,6 @@ augmentation:
variance_range: [0.0, 0.1] # Line thickness variance (min, max)
probability: 1.0 # Always apply blockage
# Grayscale transformation to mimic Xerox/scan copies
grayscale:
enabled: true
probability: 1.0 # Always apply grayscale
# Blurring to simulate blurred card images that are still readable
blurring:
enabled: true
@@ -53,6 +59,11 @@ augmentation:
alpha_range: [0.4, 3.0] # Contrast range (min, max)
beta_range: [1, 100] # Brightness range (min, max)
probability: 1.0 # Always apply brightness/contrast adjustment
# Grayscale transformation as final step (applied to all augmented images)
grayscale:
enabled: true
probability: 1.0 # Always apply grayscale as final step
# Processing configuration
processing:

Binary file not shown.

After

Width:  |  Height:  |  Size: 580 KiB

170
main.py
View File

@@ -12,6 +12,7 @@ sys.path.append(str(Path(__file__).parent / "src"))
from src.config_manager import ConfigManager
from src.data_augmentation import DataAugmentation
from src.image_processor import ImageProcessor
from src.id_card_detector import IDCardDetector
from src.utils import setup_logging, get_image_files, print_progress
def parse_arguments():
@@ -83,6 +84,38 @@ def parse_arguments():
help="Logging level"
)
# ID Card Detection arguments
parser.add_argument(
"--enable-id-detection",
action="store_true",
help="Enable ID card detection and cropping before augmentation"
)
parser.add_argument(
"--model-path",
type=str,
help="Path to YOLO model for ID card detection (overrides config)"
)
parser.add_argument(
"--confidence",
type=float,
help="Confidence threshold for ID card detection (overrides config)"
)
parser.add_argument(
"--crop-mode",
type=str,
choices=["bbox", "square", "aspect_ratio"],
help="Crop mode for ID cards (overrides config)"
)
parser.add_argument(
"--crop-target-size",
type=str,
help="Target size for cropped ID cards (widthxheight) (overrides config)"
)
return parser.parse_args()
def parse_range(range_str: str) -> tuple:
@@ -134,7 +167,8 @@ def show_image_info(input_dir: Path):
print(f"\nTotal file size: {total_size:.2f} MB")
print(f"Average file size: {total_size/len(image_files):.2f} MB")
def preview_augmentation(input_dir: Path, output_dir: Path, config: Dict[str, Any]):
def preview_augmentation(input_dir: Path, output_dir: Path, config: Dict[str, Any],
id_detection_config: Dict[str, Any] = None):
"""Preview augmentation on first image"""
image_files = get_image_files(input_dir)
@@ -147,7 +181,40 @@ def preview_augmentation(input_dir: Path, output_dir: Path, config: Dict[str, An
# Create augmentation instance
augmenter = DataAugmentation(config)
# Augment first image
# Process with ID detection if enabled
if id_detection_config and id_detection_config.get('enabled', False):
print("🔍 ID Card Detection enabled - processing with YOLO model...")
# Initialize ID card detector
detector = IDCardDetector(
model_path=id_detection_config.get('model_path'),
config=config
)
if not detector.model:
print("❌ Failed to load YOLO model, proceeding with normal augmentation")
else:
# Process single image with ID detection
result = detector.process_single_image(
image_path=image_files[0],
output_dir=output_dir,
apply_augmentation=True,
save_original=id_detection_config.get('save_original_crops', True),
confidence=id_detection_config.get('confidence_threshold', 0.25),
iou_threshold=id_detection_config.get('iou_threshold', 0.45),
crop_mode=id_detection_config.get('crop_mode', 'bbox'),
target_size=id_detection_config.get('target_size'),
padding=id_detection_config.get('padding', 10)
)
if result and result.get('detections'):
print(f"✅ Detected {len(result['detections'])} ID cards")
print(f"💾 Saved {len(result['processed_cards'])} processed cards")
return
else:
print("⚠️ No ID cards detected, proceeding with normal augmentation")
# Normal augmentation (fallback)
augmented_paths = augmenter.augment_image_file(
image_files[0],
output_dir,
@@ -225,9 +292,29 @@ def main():
show_image_info(input_dir)
return
# Get ID detection config
id_detection_config = config.get('id_card_detection', {})
# Override ID detection config with command line arguments
if args.enable_id_detection:
id_detection_config['enabled'] = True
if args.model_path:
id_detection_config['model_path'] = args.model_path
if args.confidence:
id_detection_config['confidence_threshold'] = args.confidence
if args.crop_mode:
id_detection_config['crop_mode'] = args.crop_mode
if args.crop_target_size:
target_size = parse_size(args.crop_target_size)
id_detection_config['target_size'] = list(target_size)
# Preview augmentation if requested
if args.preview:
preview_augmentation(input_dir, output_dir, augmentation_config)
preview_augmentation(input_dir, output_dir, augmentation_config, id_detection_config)
return
# Get image files
@@ -242,35 +329,56 @@ def main():
logger.info(f"Number of augmentations per image: {processing_config.get('num_augmentations', 3)}")
logger.info(f"Target size: {processing_config.get('target_size', [224, 224])}")
# Create augmentation instance with new config
augmenter = DataAugmentation(augmentation_config)
# Process with ID detection if enabled
if id_detection_config.get('enabled', False):
logger.info("ID Card Detection enabled - processing with YOLO model...")
# Initialize ID card detector
detector = IDCardDetector(
model_path=id_detection_config.get('model_path'),
config=config
)
if not detector.model:
logger.error("Failed to load YOLO model")
sys.exit(1)
logger.info(f"YOLO model loaded: {detector.model_path}")
logger.info(f"Confidence threshold: {id_detection_config.get('confidence_threshold', 0.25)}")
logger.info(f"Crop mode: {id_detection_config.get('crop_mode', 'bbox')}")
# Bước 1: Detect và crop ID cards vào thư mục processed
processed_dir = output_dir / "processed"
processed_dir.mkdir(parents=True, exist_ok=True)
logger.info("Step 1: Detect and crop ID cards...")
detector.batch_process(
input_dir=input_dir,
output_dir=processed_dir,
confidence=id_detection_config.get('confidence_threshold', 0.25),
iou_threshold=id_detection_config.get('iou_threshold', 0.45),
crop_mode=id_detection_config.get('crop_mode', 'bbox'),
target_size=id_detection_config.get('target_size'),
padding=id_detection_config.get('padding', 10)
)
# Bước 2: Augment các card đã crop
logger.info("Step 2: Augment cropped ID cards...")
augmenter = DataAugmentation(augmentation_config)
augmenter.batch_augment(
processed_dir,
output_dir,
num_augmentations=processing_config.get("num_augmentations", 3)
)
else:
# Augment trực tiếp ảnh gốc
logger.info("Starting normal batch augmentation (direct augmentation)...")
augmenter = DataAugmentation(augmentation_config)
augmenter.batch_augment(
input_dir,
output_dir,
num_augmentations=processing_config.get("num_augmentations", 3)
)
# Update target size
target_size = tuple(processing_config.get("target_size", [224, 224]))
augmenter.image_processor.target_size = target_size
# Perform batch augmentation
logger.info("Starting batch augmentation...")
results = augmenter.batch_augment(
input_dir,
output_dir,
num_augmentations=processing_config.get("num_augmentations", 3)
)
# Get and display summary
summary = augmenter.get_augmentation_summary(results)
print("\n" + "="*50)
print("AUGMENTATION SUMMARY")
print("="*50)
print(f"Original images: {summary['total_original_images']}")
print(f"Augmented images: {summary['total_augmented_images']}")
print(f"Augmentation ratio: {summary['augmentation_ratio']:.2f}")
print(f"Successful augmentations: {summary['successful_augmentations']}")
print(f"Output directory: {output_dir}")
print("="*50)
logger.info("Data augmentation completed successfully")
logger.info("Data processing completed successfully")
if __name__ == "__main__":
main()

View File

@@ -1,133 +0,0 @@
#!/usr/bin/env python3
"""
Simple ID Card Cropper using Roboflow API
Input: folder containing images
Output: folder with cropped ID cards
"""
import sys
import yaml
from pathlib import Path
import logging
import argparse
# Add src to path
sys.path.append(str(Path(__file__).parent / "src"))
from model.roboflow_id_detector import RoboflowIDDetector
def setup_logging():
"""Setup basic logging"""
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s'
)
def crop_id_cards(input_folder: str, output_folder: str, api_key: str = "Pkz4puRA0Cy3xMOuNoNr"):
"""
Crop ID cards from all images in input folder
Args:
input_folder: Path to input folder containing images
output_folder: Path to output folder for cropped ID cards
api_key: Roboflow API key
"""
logger = logging.getLogger(__name__)
# Convert to Path objects
input_path = Path(input_folder)
output_path = Path(output_folder)
# Check if input folder exists
if not input_path.exists():
logger.error(f"Input folder not found: {input_folder}")
return False
# Create output folder
output_path.mkdir(parents=True, exist_ok=True)
# Initialize detector
detector = RoboflowIDDetector(
api_key=api_key,
model_id="french-card-id-detect",
version=3,
confidence=0.5
)
# Get all image files
image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff'}
image_files = []
for file_path in input_path.rglob('*'):
if file_path.is_file() and file_path.suffix.lower() in image_extensions:
image_files.append(file_path)
if not image_files:
logger.error(f"No images found in {input_folder}")
return False
logger.info(f"Found {len(image_files)} images to process")
# Process each image
total_cropped = 0
for i, image_path in enumerate(image_files, 1):
logger.info(f"Processing {i}/{len(image_files)}: {image_path.name}")
# Detect ID cards
detections = detector.detect_id_cards(image_path)
if not detections:
logger.warning(f"No ID cards detected in {image_path.name}")
continue
# Crop each detected ID card
for j, detection in enumerate(detections):
bbox = detection['bbox']
# Create output filename
stem = image_path.stem
suffix = f"_card_{j+1}.jpg"
output_file = output_path / f"{stem}{suffix}"
# Crop ID card
cropped = detector.crop_id_card(image_path, bbox, output_file)
if cropped is not None:
total_cropped += 1
logger.info(f" ✓ Cropped card {j+1} to {output_file.name}")
# Add delay between requests
if i < len(image_files):
import time
time.sleep(1.0)
logger.info(f"Processing completed! Total ID cards cropped: {total_cropped}")
return True
def main():
"""Main function"""
parser = argparse.ArgumentParser(description='Crop ID cards from images using Roboflow API')
parser.add_argument('input_folder', help='Input folder containing images')
parser.add_argument('output_folder', help='Output folder for cropped ID cards')
parser.add_argument('--api-key', default="Pkz4puRA0Cy3xMOuNoNr",
help='Roboflow API key (default: demo key)')
args = parser.parse_args()
# Setup logging
setup_logging()
# Process images
success = crop_id_cards(args.input_folder, args.output_folder, args.api_key)
if success:
print(f"\n✓ Successfully processed images from '{args.input_folder}'")
print(f"✓ Cropped ID cards saved to '{args.output_folder}'")
else:
print(f"\n✗ Failed to process images")
return 1
return 0
if __name__ == "__main__":
exit(main())

View File

@@ -363,8 +363,6 @@ class DataAugmentation:
return result
def augment_single_image(self, image: np.ndarray, num_augmentations: int = None) -> List[np.ndarray]:
"""
Apply each augmentation method separately to create independent augmented versions
@@ -455,20 +453,7 @@ class DataAugmentation:
augmented_images.append(augmented)
# 5. Grayscale only
if grayscale_config.get("enabled", False):
for i in range(num_augmentations):
augmented = image.copy()
augmented = self.convert_to_grayscale_preserve_quality(augmented)
# Resize preserving aspect ratio
target_size = self.image_processor.target_size
if target_size:
augmented = self.resize_preserve_aspect(augmented, target_size)
augmented_images.append(augmented)
# 6. Blurring only
# 5. Blurring only
if blurring_config.get("enabled", False):
for i in range(num_augmentations):
augmented = image.copy()
@@ -481,7 +466,7 @@ class DataAugmentation:
augmented_images.append(augmented)
# 7. Brightness and contrast only
# 6. Brightness/Contrast only
if brightness_contrast_config.get("enabled", False):
for i in range(num_augmentations):
augmented = image.copy()
@@ -494,6 +479,11 @@ class DataAugmentation:
augmented_images.append(augmented)
# 7. Apply grayscale as final step to ALL augmented images
if grayscale_config.get("enabled", False):
for i in range(len(augmented_images)):
augmented_images[i] = self.convert_to_grayscale_preserve_quality(augmented_images[i])
return augmented_images
def augment_image_file(self, image_path: Path, output_dir: Path, num_augmentations: int = None) -> List[Path]:
@@ -518,7 +508,7 @@ class DataAugmentation:
# Save augmented images with method names
saved_paths = []
method_names = ["rotation", "cropping", "noise", "blockage", "grayscale", "blurring", "brightness_contrast"]
method_names = ["rotation", "cropping", "noise", "blockage", "blurring", "brightness_contrast", "grayscale"]
method_index = 0
for i, aug_image in enumerate(augmented_images):

611
src/id_card_detector.py Normal file
View File

@@ -0,0 +1,611 @@
"""
ID Card Detector Module
Sử dụng YOLO để detect và cắt ID cards từ ảnh lớn, kết hợp với data augmentation
Tích hợp với YOLOv8 French ID Card Detection model
"""
import cv2
import numpy as np
from pathlib import Path
from typing import List, Tuple, Optional, Dict, Any, Union
import torch
import torch.nn as nn
from ultralytics import YOLO
import logging
from data_augmentation import DataAugmentation
from utils import load_image, save_image, create_augmented_filename, print_progress
import os
import json
import yaml
class IDCardDetector:
"""Class để detect và cắt ID cards từ ảnh lớn sử dụng YOLO"""
def __init__(self, model_path: str = None, config: Dict[str, Any] = None):
"""
Initialize ID Card Detector
Args:
model_path: Đường dẫn đến model YOLO đã train
config: Configuration dictionary
"""
self.config = config or {}
self.model_path = model_path
self.model = None
self.data_augmentation = DataAugmentation(config)
self.logger = self._setup_logger()
# Default model path nếu không được cung cấp
if not model_path:
default_model_path = "data/weights/id_cards_yolov8n.pt"
if os.path.exists(default_model_path):
model_path = default_model_path
self.model_path = model_path
# Load YOLO model nếu có
if model_path and os.path.exists(model_path):
self.load_model(model_path)
def _setup_logger(self) -> logging.Logger:
"""Setup logger cho module"""
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
if not logger.handlers:
handler = logging.StreamHandler()
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
def load_model(self, model_path: str) -> bool:
"""
Load YOLO model từ file
Args:
model_path: Đường dẫn đến model file
Returns:
True nếu load thành công, False nếu thất bại
"""
try:
self.model = YOLO(model_path)
self.logger.info(f"Loaded YOLO model from: {model_path}")
return True
except Exception as e:
self.logger.error(f"Failed to load model: {e}")
return False
def detect_id_cards(self, image: np.ndarray, confidence: float = 0.5, iou_threshold: float = 0.45) -> List[Dict[str, Any]]:
"""
Detect ID cards trong ảnh sử dụng YOLO
Args:
image: Input image
confidence: Confidence threshold
iou_threshold: IoU threshold cho NMS
Returns:
List các detection results với format:
{
'bbox': [x1, y1, x2, y2],
'confidence': float,
'class_id': int,
'class_name': str
}
"""
if self.model is None:
self.logger.error("Model chưa được load!")
return []
try:
# Run inference
results = self.model(image, conf=confidence, iou=float(iou_threshold), verbose=False)
detections = []
for result in results:
boxes = result.boxes
if boxes is not None:
for box in boxes:
# Get bbox coordinates
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
# Get confidence and class
confidence_score = float(box.conf[0].cpu().numpy())
class_id = int(box.cls[0].cpu().numpy())
class_name = self.model.names[class_id] if hasattr(self.model, 'names') else f"class_{class_id}"
detection = {
'bbox': [int(x1), int(y1), int(x2), int(y2)],
'confidence': confidence_score,
'class_id': class_id,
'class_name': class_name
}
detections.append(detection)
self.logger.info(f"Detected {len(detections)} ID cards")
return detections
except Exception as e:
self.logger.error(f"Error during detection: {e}")
return []
def crop_id_card(self, image: np.ndarray, bbox: List[int], padding: int = 10,
crop_mode: str = "bbox", target_size: Tuple[int, int] = None) -> np.ndarray:
"""
Cắt ID card từ ảnh gốc dựa trên bbox với nhiều options
Args:
image: Input image
bbox: Bounding box [x1, y1, x2, y2]
padding: Padding thêm xung quanh bbox
crop_mode: Mode cắt ("bbox", "square", "aspect_ratio")
target_size: Kích thước target (width, height) nếu muốn resize
Returns:
Cropped ID card image
"""
x1, y1, x2, y2 = bbox
# Thêm padding
height, width = image.shape[:2]
x1 = max(0, x1 - padding)
y1 = max(0, y1 - padding)
x2 = min(width, x2 + padding)
y2 = min(height, y2 + padding)
# Cắt ảnh theo mode
if crop_mode == "bbox":
# Cắt theo bbox gốc
cropped = image[y1:y2, x1:x2]
elif crop_mode == "square":
# Cắt thành hình vuông
center_x = (x1 + x2) // 2
center_y = (y1 + y2) // 2
size = max(x2 - x1, y2 - y1)
half_size = size // 2
x1 = max(0, center_x - half_size)
y1 = max(0, center_y - half_size)
x2 = min(width, center_x + half_size)
y2 = min(height, center_y + half_size)
cropped = image[y1:y2, x1:x2]
elif crop_mode == "aspect_ratio":
# Cắt theo tỷ lệ khung hình chuẩn (3:4 cho ID card)
bbox_width = x2 - x1
bbox_height = y2 - y1
center_x = (x1 + x2) // 2
center_y = (y1 + y2) // 2
# Tỷ lệ 3:4 cho ID card
target_ratio = 3 / 4
current_ratio = bbox_width / bbox_height
if current_ratio > target_ratio:
# Bbox quá rộng, giữ chiều cao
new_width = int(bbox_height * target_ratio)
half_width = new_width // 2
x1 = max(0, center_x - half_width)
x2 = min(width, center_x + half_width)
else:
# Bbox quá cao, giữ chiều rộng
new_height = int(bbox_width / target_ratio)
half_height = new_height // 2
y1 = max(0, center_y - half_height)
y2 = min(height, center_y + half_height)
cropped = image[y1:y2, x1:x2]
else:
# Default: cắt theo bbox
cropped = image[y1:y2, x1:x2]
# Resize nếu có target_size
if target_size:
cropped = cv2.resize(cropped, target_size, interpolation=cv2.INTER_AREA)
return cropped
def process_single_image(self, image_path: Union[str, Path], output_dir: Path,
confidence: float = 0.5, iou_threshold: float = 0.45,
crop_mode: str = "bbox", target_size: Tuple[int, int] = None,
padding: int = 10, card_counter: int = 0) -> Dict[str, Any]:
"""
Xử lý một ảnh: detect ID cards, cắt và áp dụng augmentation
Args:
image_path: Đường dẫn đến ảnh input
output_dir: Thư mục output
apply_augmentation: Có áp dụng data augmentation không
save_original: Có lưu ảnh gốc không
confidence: Confidence threshold
iou_threshold: IoU threshold
crop_mode: Mode cắt ("bbox", "square", "aspect_ratio")
target_size: Kích thước target (width, height) hoặc None
padding: Padding thêm xung quanh bbox
Returns:
Dictionary chứa kết quả xử lý
"""
image_path = Path(image_path)
if not image_path.exists():
self.logger.error(f"Image not found: {image_path}")
return {}
# Load ảnh
image = load_image(str(image_path))
if image is None:
self.logger.error(f"Failed to load image: {image_path}")
return {}
# Detect ID cards
detections = self.detect_id_cards(image, confidence, float(iou_threshold))
if not detections:
self.logger.warning(f"No ID cards detected in: {image_path}")
return {
'image_path': str(image_path),
'detections': [],
'processed_cards': []
}
# Tạo thư mục output
output_dir.mkdir(parents=True, exist_ok=True)
processed_cards = []
current_card_counter = card_counter
for i, detection in enumerate(detections):
# Cắt ID card với options mới
cropped_card = self.crop_id_card(
image,
detection['bbox'],
padding=padding,
crop_mode=crop_mode,
target_size=target_size
)
# Tạo tên file unique cho mỗi ID card
current_card_counter += 1
card_filename = f"id_card_{current_card_counter:03d}.jpg"
card_path = output_dir / card_filename
# Lưu ảnh gốc
save_image(cropped_card, card_path)
processed_cards.append({
'original_path': str(card_path),
'detection_info': detection,
'crop_info': {
'mode': crop_mode,
'target_size': target_size,
'padding': padding
}
})
result = {
'image_path': str(image_path),
'detections': detections,
'processed_cards': processed_cards,
'total_cards': len(processed_cards),
'crop_settings': {
'mode': crop_mode,
'target_size': target_size,
'padding': padding
}
}
self.logger.info(f"Processed {len(processed_cards)} cards from {image_path.name}")
return result
def batch_process(self, input_dir: Union[str, Path], output_dir: Union[str, Path],
confidence: float = 0.5, iou_threshold: float = 0.45,
crop_mode: str = "bbox", target_size: Tuple[int, int] = None,
padding: int = 10) -> Dict[str, Any]:
"""
Xử lý batch nhiều ảnh
Args:
input_dir: Thư mục chứa ảnh input
output_dir: Thư mục output
apply_augmentation: Có áp dụng data augmentation không
save_original: Có lưu ảnh gốc không
confidence: Confidence threshold
iou_threshold: IoU threshold
crop_mode: Mode cắt ("bbox", "square", "aspect_ratio")
target_size: Kích thước target (width, height) hoặc None
padding: Padding thêm xung quanh bbox
Returns:
Dictionary chứa kết quả batch processing
"""
input_dir = Path(input_dir)
output_dir = Path(output_dir)
if not input_dir.exists():
self.logger.error(f"Input directory not found: {input_dir}")
return {}
# Tạo thư mục output
output_dir.mkdir(parents=True, exist_ok=True)
# Tìm tất cả ảnh
supported_formats = self.config.get('supported_formats', ['.jpg', '.jpeg', '.png', '.bmp', '.tiff'])
image_files = []
for fmt in supported_formats:
image_files.extend(input_dir.glob(f"*{fmt}"))
image_files.extend(input_dir.glob(f"*{fmt.upper()}"))
if not image_files:
self.logger.warning(f"No supported images found in: {input_dir}")
return {}
self.logger.info(f"Found {len(image_files)} images to process")
results = {}
total_cards = 0
global_card_counter = 0 # Counter để tạo tên file unique
for i, image_path in enumerate(image_files):
self.logger.info(f"Processing {i+1}/{len(image_files)}: {image_path.name}")
# Xử lý ảnh - chỉ detect và crop, không augment
result = self.process_single_image(
image_path,
output_dir,
confidence,
iou_threshold,
crop_mode,
target_size,
padding,
global_card_counter
)
# Cập nhật counter
global_card_counter += len(result.get('detections', []))
results[image_path.name] = result
total_cards += len(result.get('detections', [])) # Số lượng ID cards thực tế đã detect
# Print progress
print_progress(i + 1, len(image_files), f"Processed {image_path.name}")
# Tạo summary
summary = {
'total_images': len(image_files),
'total_cards_detected': total_cards,
'images_with_cards': len([r for r in results.values() if r.get('detections')]),
'images_without_cards': len([r for r in results.values() if not r.get('detections')]),
'output_directory': str(output_dir),
'crop_settings': {
'mode': crop_mode,
'target_size': target_size,
'padding': padding
},
'results': results
}
# Lưu summary
summary_path = output_dir / "processing_summary.json"
with open(summary_path, 'w', encoding='utf-8') as f:
json.dump(summary, f, indent=2, ensure_ascii=False)
self.logger.info(f"Batch processing completed. Summary saved to: {summary_path}")
return summary
def get_detection_statistics(self, results: Dict[str, Any]) -> Dict[str, Any]:
"""
Tính toán thống kê từ kết quả detection
Args:
results: Kết quả từ batch_process
Returns:
Dictionary chứa thống kê
"""
if not results:
return {}
total_images = results.get('total_images', 0)
total_cards = results.get('total_cards_detected', 0)
images_with_cards = results.get('images_with_cards', 0)
# Tính confidence statistics
all_confidences = []
for image_result in results.get('results', {}).values():
for detection in image_result.get('detections', []):
all_confidences.append(detection.get('confidence', 0))
stats = {
'total_images_processed': total_images,
'total_cards_detected': total_cards,
'images_with_cards': images_with_cards,
'images_without_cards': total_images - images_with_cards,
'average_cards_per_image': total_cards / total_images if total_images > 0 else 0,
'detection_rate': images_with_cards / total_images if total_images > 0 else 0,
'confidence_statistics': {
'min': min(all_confidences) if all_confidences else 0,
'max': max(all_confidences) if all_confidences else 0,
'mean': np.mean(all_confidences) if all_confidences else 0,
'std': np.std(all_confidences) if all_confidences else 0
}
}
return stats
def augment_cropped_cards(self, input_dir: Union[str, Path], output_dir: Union[str, Path],
num_augmentations: int = 3) -> Dict[str, Any]:
"""
Augment tất cả ID cards đã crop trong thư mục input
Args:
input_dir: Thư mục chứa ID cards đã crop
output_dir: Thư mục output cho augmented images
num_augmentations: Số lượng augmentation cho mỗi card
Returns:
Dictionary chứa kết quả augmentation
"""
input_dir = Path(input_dir)
output_dir = Path(output_dir)
if not input_dir.exists():
self.logger.error(f"Input directory not found: {input_dir}")
return {}
# Tạo thư mục output
output_dir.mkdir(parents=True, exist_ok=True)
# Tìm tất cả ID cards đã crop
card_files = list(input_dir.glob("id_card_*.jpg"))
if not card_files:
self.logger.warning(f"No ID card files found in: {input_dir}")
return {}
self.logger.info(f"Found {len(card_files)} ID cards to augment")
results = {}
total_augmented = 0
for i, card_path in enumerate(card_files):
self.logger.info(f"Augmenting {i+1}/{len(card_files)}: {card_path.name}")
# Load ID card
card_image = load_image(str(card_path))
if card_image is None:
self.logger.error(f"Failed to load card: {card_path}")
continue
# Augment card
try:
augmented_cards = self.data_augmentation.augment_single_image(
card_image,
num_augmentations=num_augmentations
)
# Debug: Kiểm tra số lượng augmented cards
self.logger.info(f"Generated {len(augmented_cards)} augmented cards for {card_path.name}")
# Debug: Kiểm tra config
self.logger.info(f"DataAugmentation config: {self.data_augmentation.config}")
except Exception as e:
self.logger.error(f"Error during augmentation: {e}")
augmented_cards = []
# Save augmented cards
card_results = []
for j, aug_card in enumerate(augmented_cards):
aug_filename = f"{card_path.stem}_aug_{j+1}.jpg"
aug_path = output_dir / aug_filename
save_image(aug_card, aug_path)
card_results.append({
'augmented_path': str(aug_path),
'augmentation_index': j+1
})
results[card_path.name] = {
'original_path': str(card_path),
'augmented_cards': card_results,
'total_augmented': len(card_results)
}
total_augmented += len(card_results)
# Print progress
print_progress(i + 1, len(card_files), f"Augmented {card_path.name}")
# Tạo summary
summary = {
'total_cards': len(card_files),
'total_augmented': total_augmented,
'output_directory': str(output_dir),
'results': results
}
# Lưu summary
summary_path = output_dir / "augmentation_summary.json"
with open(summary_path, 'w', encoding='utf-8') as f:
json.dump(summary, f, indent=2, ensure_ascii=False)
self.logger.info(f"Augmentation completed. Summary saved to: {summary_path}")
return summary
def load_yolo_config(self, config_path: str = None) -> Dict[str, Any]:
"""
Load config từ YOLO detector
Args:
config_path: Đường dẫn đến file config
Returns:
Config dictionary
"""
if config_path is None:
# Tìm config mặc định
default_config_path = "src/model/ID_cards_detector/config.py"
if os.path.exists(default_config_path):
config_path = default_config_path
config = {}
try:
# Import config từ YOLO detector
import sys
sys.path.append(str(Path("src/model/ID_cards_detector")))
from config import DEFAULT_TRAINING_CONFIG, DEFAULT_INFERENCE_CONFIG
config.update({
'yolo_training_config': DEFAULT_TRAINING_CONFIG,
'yolo_inference_config': DEFAULT_INFERENCE_CONFIG,
'detection': {
'confidence_threshold': DEFAULT_INFERENCE_CONFIG.get('conf_threshold', 0.25),
'iou_threshold': DEFAULT_INFERENCE_CONFIG.get('iou_threshold', 0.45),
'padding': 10
},
'processing': {
'apply_augmentation': True,
'save_original': True,
'num_augmentations': 3,
'save_format': "jpg",
'quality': 95,
'target_size': [640, 640]
},
'crop_options': {
'crop_mode': 'bbox', # bbox, square, aspect_ratio
'target_size': None, # (width, height) hoặc None
'padding': 10
}
})
self.logger.info("Loaded YOLO config successfully")
except Exception as e:
self.logger.warning(f"Failed to load YOLO config: {e}")
# Fallback config
config = {
'detection': {
'confidence_threshold': 0.25,
'iou_threshold': 0.45,
'padding': 10
},
'processing': {
'apply_augmentation': True,
'save_original': True,
'num_augmentations': 3,
'save_format': "jpg",
'quality': 95,
'target_size': [640, 640]
},
'crop_options': {
'crop_mode': 'bbox',
'target_size': None,
'padding': 10
}
}
return config

View File

@@ -41,14 +41,11 @@ def load_image(image_path: Path, target_size: Tuple[int, int] = None) -> Optiona
image = cv2.imread(str(image_path))
if image is None:
return None
# Convert BGR to RGB
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize if target_size is provided
if target_size:
image = cv2.resize(image, target_size, interpolation=cv2.INTER_AREA)
return image
except Exception as e:
print(f"Error loading image {image_path}: {e}")