combine augment

2025-08-06 21:44:39 +07:00
parent 51d3a66cc4
commit f63589a10a
4 changed files with 851 additions and 355 deletions
--- a/README.md
+++ b/README.md
@@ -1,10 +1,24 @@
 # ID Card Data Augmentation Pipeline
-A comprehensive data augmentation pipeline for ID card images with YOLO-based detection and advanced augmentation techniques.
+A comprehensive data augmentation pipeline for ID card images with YOLO-based detection, smart sampling strategies, and advanced augmentation techniques.
 ![Pipeline Overview](docs/images/yolov8_pipeline.png)
-## 🚀 Features
+## 🚀 New Features v2.0
 ### **Smart Data Strategy**
 - **Sampling Mode** (`factor < 1.0`): Process only a percentage of input data  
 - **Multiplication Mode** (`factor >= 1.0`): Multiply total dataset size
 - **Balanced Output**: Includes both raw and augmented images
 - **Configurable Sampling**: Random, stratified, or uniform selection
 ### **Enhanced Augmentation**
 - **Random Method Combination**: Mix and match augmentation techniques
 - **Method Probability Weights**: Control frequency of each augmentation
 - **Raw Image Preservation**: Always includes original processed images
 - **Flexible Processing Modes**: Individual, sequential, or random combination
 ## 🎯 Key Features
 ### **YOLO-based ID Card Detection**
 - Automatic detection and cropping of ID cards from large images
@@ -17,15 +31,17 @@ A comprehensive data augmentation pipeline for ID card images with YOLO-based de
 - **Random Cropping**: Simulates partially visible cards
 - **Noise Addition**: Simulates worn-out cards
 - **Partial Blockage**: Simulates occluded card details
- **Blurring**: Simulates blurred but readable images
+- **Blurring**: Simulates motion blur while keeping readability
 - **Brightness/Contrast**: Mimics different lighting conditions
 - **Color Jittering**: HSV adjustments for color variations
 - **Perspective Transform**: Simulates viewing angle changes
 - **Grayscale Conversion**: Final preprocessing step for all images
 ### **Flexible Configuration**
 - YAML-based configuration system
 - Command-line argument overrides
- Environment-specific settings
+- Smart data strategy configuration
- Comprehensive logging
+- Comprehensive logging and statistics
 ## 📋 Requirements
@@ -44,6 +60,7 @@ pip install -r requirements.txt
 - `Pillow>=8.3.0`
 - `PyYAML>=5.4.0`
 - `ultralytics>=8.0.0` (for YOLO models)
 - `torch>=1.12.0` (for GPU acceleration)
 ## 🛠️ Installation
@@ -69,115 +86,80 @@ data/weights/id_cards_yolov8n.pt
 ### **Basic Usage**
 ```bash
-# Run with default configuration
+# Run with default configuration (3x multiplication)
 python main.py
 # Run with sampling mode (30% of input data)
 python main.py  # Set multiplication_factor: 0.3 in config
 # Run with ID card detection enabled
 python main.py --enable-id-detection
 # Run with custom input/output directories
 python main.py --input-dir "path/to/input" --output-dir "path/to/output"
 ```
-### **Configuration Options**
+### **Data Strategy Examples**
-#### **ID Card Detection**
+#### **Sampling Mode** (factor < 1.0)
-```bash
+```yaml
-# Enable detection with custom model
+data_strategy:
-python main.py --enable-id-detection --model-path "path/to/model.pt"
+  multiplication_factor: 0.3  # Process 30% of input images
-
+  sampling:
-# Adjust detection parameters
+    method: "random"          # random, stratified, uniform
-python main.py --enable-id-detection --confidence 0.3 --crop-mode square
+    preserve_distribution: true
 # Set target size for cropped cards
 python main.py --enable-id-detection --crop-target-size "640,640"
 ```
 - Input: 100 images → Select 30 images → Output: 100 images total
 - Each selected image generates ~3-4 versions (including raw)
-#### **Data Augmentation**
+#### **Multiplication Mode** (factor >= 1.0)  
-```bash
+```yaml
-# Customize augmentation parameters
+data_strategy:
-python main.py --num-augmentations 5 --target-size "512,512"
+  multiplication_factor: 3.0  # 3x dataset size
 # Preview augmentation results
 python main.py --preview
 ```
 - Input: 100 images → Process all → Output: 300 images total
 - Each image generates 3 versions (1 raw + 2 augmented)
-### **Configuration File**
+### **Augmentation Strategy**
 Edit `config/config.yaml` for persistent settings:
 ```yaml
 # ID Card Detection
 id_card_detection:
  enabled: false  # Enable/disable YOLO detection
  model_path: "data/weights/id_cards_yolov8n.pt"
  confidence_threshold: 0.25
  iou_threshold: 0.45
  padding: 10
  crop_mode: "bbox"
  target_size: null
 # Data Augmentation
 augmentation:
-  rotation:
+  strategy:
-    enabled: true
+    mode: "random_combine"     # random_combine, sequential, individual
-    angles: [30, 60, 120, 150, 180, 210, 240, 300, 330]
+    min_methods: 2             # Min augmentation methods per image
-  random_cropping:
+    max_methods: 4             # Max augmentation methods per image
    enabled: true
    ratio_range: [0.7, 1.0]
  random_noise:
    enabled: true
    mean_range: [0.0, 0.7]
    variance_range: [0.0, 0.1]
  partial_blockage:
    enabled: true
    coverage_range: [0.0, 0.25]
  blurring:
    enabled: true
    kernel_ratio_range: [0.0, 0.0084]
  brightness_contrast:
    enabled: true
    alpha_range: [0.4, 3.0]
    beta_range: [1, 100]
  grayscale:
    enabled: true  # Applied as final step
-# Processing
+  methods:
-processing:
+    rotation:
-  target_size: [640, 640]
+      enabled: true
-  num_augmentations: 3
+      probability: 0.8         # 80% chance to be selected
-  save_format: "jpg"
+      angles: [30, 60, 120, 150, 180, 210, 240, 300, 330]
-  quality: 95
+      
    random_cropping:
      enabled: true  
      probability: 0.7
      ratio_range: [0.7, 1.0]
    # ... other methods with probabilities
 ```
 ## 🔄 Workflow
-### **Two-Step Processing Pipeline**
+### **Smart Processing Pipeline**
-#### **Step 1: ID Card Detection (Optional)**
+#### **Step 1: Data Selection**
 - **Sampling Mode**: Randomly select subset of input images
 - **Multiplication Mode**: Process all input images
 - **Stratified Sampling**: Preserve file type distribution
 #### **Step 2: ID Card Detection** (Optional)
 When `id_card_detection.enabled: true`:
-1. **Input**: Large images containing multiple ID cards
+1. **YOLO Detection**: Locate ID cards in large images
-2. **YOLO Detection**: Locate and detect ID cards
+2. **Cropping**: Extract individual ID cards with padding
-3. **Cropping**: Extract individual ID cards with padding
+3. **Output**: Cropped ID cards saved to `out/processed/`
 4. **Output**: Cropped ID cards saved to `out/processed/`
-#### **Step 2: Data Augmentation**
+#### **Step 3: Smart Augmentation**
-1. **Input**: Original images OR cropped ID cards
+1. **Raw Processing**: Always include original (resized + grayscale)
-2. **Augmentation**: Apply 6 augmentation methods:
+2. **Random Combination**: Select 2-4 augmentation methods randomly
-   - Rotation (9 different angles)
+3. **Method Application**: Apply selected methods with probability weights
-   - Random cropping (70-100% ratio)
+4. **Final Processing**: Grayscale conversion for all outputs
   - Random noise (simulate wear)
   - Partial blockage (simulate occlusion)
   - Blurring (simulate motion blur)
   - Brightness/Contrast adjustment
 3. **Grayscale**: Convert all images to grayscale (final step)
 4. **Output**: Augmented images in main output directory
 ### **Direct Augmentation Mode**
 When `id_card_detection.enabled: false`:
 - Skips YOLO detection
 - Applies augmentation directly to input images
 - All images are converted to grayscale
 ## 📊 Output Structure
@@ -187,103 +169,144 @@ output_directory/
 │   ├── id_card_001.jpg
 │   ├── id_card_002.jpg  
 │   └── processing_summary.json
-├── im1__rotation_01.png         # Augmented images
+├── im1__raw_001.jpg             # Raw processed images
-├── im1__cropping_01.png
+├── im1__aug_001.jpg             # Augmented images (random combinations)
-├── im1__noise_01.png
+├── im1__aug_002.jpg
-├── im1__blockage_01.png
+├── im2__raw_001.jpg
-├── im1__blurring_01.png
+├── im2__aug_001.jpg
-├── im1__brightness_contrast_01.png
+└── processing_summary.json
 └── augmentation_summary.json
 ```
 ### **File Naming Convention**
 - `{basename}_raw_001.jpg`: Original image (resized + grayscale)
 - `{basename}_aug_001.jpg`: Augmented version 1 (random methods)
 - `{basename}_aug_002.jpg`: Augmented version 2 (different methods)
 ## 🎯 Use Cases
-### **Training Data Generation**
+### **Dataset Expansion** 
-```bash
+```yaml
-# Generate diverse training data
+# Triple your dataset size with balanced augmentation
-python main.py --enable-id-detection --num-augmentations 10
+data_strategy:
  multiplication_factor: 3.0
 ```
 ### **Smart Sampling for Large Datasets**
 ```yaml  
 # Process only 20% but maintain original dataset size
 data_strategy:
  multiplication_factor: 0.2
  sampling:
    method: "stratified"  # Preserve file type distribution
 ```
 ### **Quality Control**
 ```bash
-# Preview results before processing
+# Preview results before full processing
 python main.py --preview
 ```
 ### **Batch Processing**
 ```bash
 # Process large datasets
 python main.py --input-dir "large_dataset/" --output-dir "augmented_dataset/"
 ```
 ## ⚙️ Advanced Configuration
-### **Custom Augmentation Parameters**
+### **Augmentation Strategy Modes**
 #### **Random Combination** (Recommended)
 ```yaml
 augmentation:
-  rotation:
+  strategy:
-    angles: [45, 90, 135, 180, 225, 270, 315]  # Custom angles
+    mode: "random_combine"
-  random_cropping:
+    min_methods: 2
-    ratio_range: [0.8, 0.95]  # Tighter cropping
+    max_methods: 4
  random_noise:
    mean_range: [0.1, 0.5]    # More noise
    variance_range: [0.05, 0.15]
 ```
 Each image gets 2-4 randomly selected augmentation methods.
-### **Performance Optimization**
+#### **Sequential Application**
 ```yaml
-performance:
+augmentation:
-  num_workers: 4
+  strategy:
-  prefetch_factor: 2
+    mode: "sequential"
-  pin_memory: true
+```
-  use_gpu: false
+All enabled methods applied to each image in sequence.
 #### **Individual Methods**
 ```yaml
 augmentation:
  strategy:
    mode: "individual"
 ```
 Legacy mode - each method creates separate output images.
 ### **Method Probability Tuning**
 ```yaml
 methods:
  rotation:
    probability: 0.9      # High chance - common transformation
  perspective:
    probability: 0.2      # Low chance - subtle effect
  partial_blockage:
    probability: 0.3      # Medium chance - specific use case
 ```
-## 📝 Logging
+## 📊 Performance Statistics
-The system provides comprehensive logging:
+The system provides detailed statistics:
 - **File**: `logs/data_augmentation.log`
 - **Console**: Real-time progress updates
 - **Summary**: JSON files with processing statistics
-### **Log Levels**
+```json
- `INFO`: General processing information
+{
- `WARNING`: Non-critical issues (e.g., no cards detected)
+  "input_images": 100,
- `ERROR`: Critical errors
+  "selected_images": 30,        // In sampling mode
  "target_total": 100,
  "actual_generated": 98,
  "multiplication_factor": 0.3,
  "mode": "sampling",
  "efficiency": 0.98            // 98% target achievement
 }
 ```
 ## 🔧 Troubleshooting
 ### **Common Issues**
-1. **No images detected**
+1. **Low efficiency in sampling mode**
-   - Check input directory path
+   - Increase `min_methods` or adjust `target_size`
-   - Verify image formats (jpg, png, bmp, tiff)
+   - Check available augmentation methods
   - Ensure images are not corrupted
-2. **YOLO model not found**
+2. **Memory issues with large datasets**
-   - Place model file at `data/weights/id_cards_yolov8n.pt`
+   - Use sampling mode with lower factor
-   - Or specify custom path with `--model-path`
+   - Reduce `target_size` resolution
   - Enable `memory_efficient` mode
-3. **Memory issues**
+3. **Inconsistent augmentation results**
-   - Reduce `num_augmentations`
+   - Set `random_seed` for reproducibility
-   - Use smaller `target_size`
+   - Adjust method probabilities
-   - Enable GPU if available
+   - Check `min_methods`/`max_methods` balance
 ### **Performance Tips**
- **GPU Acceleration**: Set `use_gpu: true` in config
+- **Sampling Mode**: Use for large datasets (>1000 images)
- **Batch Processing**: Use multiple workers for large datasets
+- **GPU Acceleration**: Enable for YOLO detection
- **Memory Management**: Process in smaller batches
+- **Batch Processing**: Process in chunks for memory efficiency
 - **Probability Tuning**: Higher probabilities for stable methods
 ## 📈 Benchmarks
 ### **Processing Speed**
 - **Direct Mode**: ~2-3 images/second
 - **YOLO + Augmentation**: ~1-2 images/second  
 - **Memory Usage**: ~2-4GB for 1000 images
 ### **Output Quality**
 - **Raw Images**: 100% preserved quality
 - **Augmented Images**: Balanced realism vs. diversity
 - **Grayscale Conversion**: Consistent preprocessing
 ## 🤝 Contributing
 1. Fork the repository
-2. Create a feature branch
+2. Create a feature branch (`git checkout -b feature/amazing-feature`)
-3. Make your changes
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
-4. Add tests if applicable
+4. Push to the branch (`git push origin feature/amazing-feature`)
-5. Submit a pull request
+5. Open a Pull Request
 ## 📄 License
@@ -294,6 +317,7 @@ This project is licensed under the MIT License - see the LICENSE file for detail
 - **YOLOv8**: Ultralytics for the detection framework
 - **OpenCV**: Computer vision operations
 - **NumPy**: Numerical computations
 - **PyTorch**: Deep learning backend
 ---
--- a/config/config.yaml
+++ b/config/config.yaml
@@ -1,5 +1,5 @@
-# Data Augmentation Configuration
+# ID Card Data Augmentation Configuration v2.0
-# Main configuration file for image data augmentation
+# Enhanced configuration with smart sampling, multiplication, and random method combination
 # Paths configuration
 paths:
@@ -7,72 +7,123 @@ paths:
  output_dir: "out1"
  log_file: "logs/data_augmentation.log"
 # Data Sampling and Multiplication Strategy  
 data_strategy:
  # Multiplication/Sampling factor:
  # - If < 1.0 (e.g. 0.3): Random sampling 30% of input data to augment
  # - If >= 1.0 (e.g. 2.0, 3.0): Multiply dataset size by 2x, 3x etc.
  multiplication_factor: 0.3
  # Random seed for reproducibility (null = random each run)
  random_seed: null
  # Sampling strategy for factor < 1.0
  sampling:
    method: "random"  # random, stratified, uniform
    preserve_distribution: true  # Maintain file type distribution
 # ID Card Detection configuration
 id_card_detection:
-  enabled: false  # Bật/tắt tính năng detect và crop ID cards
+  enabled: false  # Enable/disable YOLO detection and cropping
-  model_path: "data/weights/id_cards_yolov8n.pt"  # Đường dẫn đến YOLO model
+  model_path: "data/weights/id_cards_yolov8n.pt"  # Path to YOLO model
-  confidence_threshold: 0.25  # Confidence threshold cho detection
+  confidence_threshold: 0.25  # Detection confidence threshold
-  iou_threshold: 0.45  # IoU threshold cho NMS
+  iou_threshold: 0.45  # IoU threshold for NMS
-  padding: 10  # Padding thêm xung quanh bbox
+  padding: 10  # Extra padding around bounding box
-  crop_mode: "bbox"  # Mode cắt: bbox, square, aspect_ratio
+  crop_mode: "bbox"  # Cropping mode: bbox, square, aspect_ratio
-  target_size: null  # Kích thước target (width, height) hoặc null
+  target_size: null  # Target size (width, height) or null
-  save_original_crops: true  # Có lưu ảnh gốc đã crop không
+  save_original_crops: true  # Save original cropped images
-# Data augmentation parameters - ROTATION and RANDOM CROPPING
+# Augmentation Strategy - Random Combination of Methods
 augmentation:
-  # Geometric transformations
+  # Strategy for combining augmentation methods
-  rotation:
+  strategy:
-    enabled: true
+    mode: "random_combine"  # random_combine, sequential, individual
-    angles: [30, 60, 120, 150, 180, 210, 240, 300, 330]  # Specific rotation angles
+    min_methods: 2  # Minimum methods applied per image
-    probability: 1.0  # Always apply rotation
+    max_methods: 4  # Maximum methods applied per image
    allow_duplicates: false  # Allow same method multiple times with different params
-  # Random cropping to simulate partially visible ID cards
+  # Available augmentation methods with selection probabilities
-  random_cropping:
+  methods:
-    enabled: true
+    # Geometric transformations
-    ratio_range: [0.7, 1.0]  # Crop ratio range (min, max)
+    rotation:
-    probability: 1.0  # Always apply cropping
+      enabled: true
      probability: 0.8  # Selection probability for this method
      angles: [30, 60, 120, 150, 180, 210, 240, 300, 330]
-  # Random noise to simulate worn-out ID cards
+    # Random cropping to simulate partially visible ID cards
-  random_noise:
+    random_cropping:
-    enabled: true
+      enabled: true
-    mean_range: [0.0, 0.7]  # Noise mean range (min, max)
+      probability: 0.7
-    variance_range: [0.0, 0.1]  # Noise variance range (min, max)
+      ratio_range: [0.7, 1.0]
    probability: 1.0  # Always apply noise
-  # Partial blockage to simulate occluded card details
+    # Random noise to simulate worn-out ID cards
-  partial_blockage:
+    random_noise:
-    enabled: true
+      enabled: true
-    num_occlusions_range: [1, 100]  # Number of occlusion lines (min, max)
+      probability: 0.6
-    coverage_range: [0.0, 0.25]  # Coverage ratio (min, max)
+      mean_range: [0.0, 0.7]
-    variance_range: [0.0, 0.1]  # Line thickness variance (min, max)
+      variance_range: [0.0, 0.1]
    probability: 1.0  # Always apply blockage
-  # Blurring to simulate blurred card images that are still readable
+    # Partial blockage to simulate occluded card details
-  blurring:
+    partial_blockage:
-    enabled: true
+      enabled: true
-    kernel_ratio_range: [0.0, 0.0084]  # Kernel ratio range (min, max)
+      probability: 0.5
-    probability: 1.0  # Always apply blurring
+      num_occlusions_range: [1, 100]
      coverage_range: [0.0, 0.25]
      variance_range: [0.0, 0.1]
-  # Brightness and contrast adjustment to mimic different environmental lighting conditions
+    # Blurring to simulate motion blur while keeping readability
-  brightness_contrast:
+    blurring:
-    enabled: true
+      enabled: true
-    alpha_range: [0.4, 3.0]  # Contrast range (min, max)
+      probability: 0.6
-    beta_range: [1, 100]  # Brightness range (min, max)
+      kernel_ratio_range: [0.0, 0.0084]
    probability: 1.0  # Always apply brightness/contrast adjustment
-  # Grayscale transformation as final step (applied to all augmented images)
+    # Brightness and contrast adjustment for lighting variations
-  grayscale:
+    brightness_contrast:
-    enabled: true
+      enabled: true
-    probability: 1.0  # Always apply grayscale as final step
+      probability: 0.7
      alpha_range: [0.4, 3.0]
      beta_range: [1, 100]
    # Color space transformations
    color_jitter:
      enabled: true
      probability: 0.4
      brightness_range: [0.8, 1.2]
      contrast_range: [0.8, 1.2]
      saturation_range: [0.8, 1.2]
      hue_range: [-0.1, 0.1]
    # Perspective transformation for viewing angle simulation
    perspective:
      enabled: false
      probability: 0.3
      distortion_scale: 0.2
  # Final processing (always applied to all outputs)
  final_processing:
    # Grayscale transformation as final preprocessing step
    grayscale:
      enabled: true
      probability: 1.0  # Always apply to ensure consistency
    # Quality enhancement (future feature)
    quality_enhancement:
      enabled: false
      sharpen: 0.1
      denoise: false
 # Processing configuration
 processing:
-  target_size: [640, 640]  # [width, height] - Increased for better coverage
+  target_size: [640, 640]  # [width, height] - Target resolution
  batch_size: 32
  num_augmentations: 3  # number of augmented versions per image
  save_format: "jpg"
  quality: 95
  # Advanced processing options
  preserve_original: false  # Whether to save original images
  parallel_processing: true  # Enable parallel processing
  memory_efficient: true  # Optimize memory usage
 # Supported image formats
 supported_formats:
  - ".jpg"
@@ -83,7 +134,7 @@ supported_formats:
 # Logging configuration
 logging:
-  level: "INFO"  # DEBUG, INFO, WARNING, ERROR
+  level: "INFO"  # Available levels: DEBUG, INFO, WARNING, ERROR
  format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
  handlers:
    - type: "file"
@@ -92,7 +143,7 @@ logging:
 # Performance settings
 performance:
-  num_workers: 4
+  num_workers: 4        # Number of parallel workers
-  prefetch_factor: 2
+  prefetch_factor: 2    # Data prefetching factor
-  pin_memory: true
+  pin_memory: true      # Pin memory for GPU transfer
-  use_gpu: false 
+  use_gpu: false        # Enable GPU acceleration 
--- a/main.py
+++ b/main.py
@@ -214,11 +214,11 @@ def preview_augmentation(input_dir: Path, output_dir: Path, config: Dict[str, An
            else:
                print("⚠️  No ID cards detected, proceeding with normal augmentation")
-    # Normal augmentation (fallback)
+    # Normal augmentation (fallback) with new logic
    augmented_paths = augmenter.augment_image_file(
        image_files[0], 
        output_dir, 
-        num_augmentations=3
+        num_target_images=3
    )
    if augmented_paths:
@@ -270,6 +270,7 @@ def main():
    processing_config = config_manager.get_processing_config()
    augmentation_config = config_manager.get_augmentation_config()
    logging_config = config_manager.get_logging_config()
    data_strategy_config = config.get("data_strategy", {})
    # Setup logging
    logger = setup_logging(logging_config.get("level", "INFO"))
@@ -324,10 +325,20 @@ def main():
        logger.error(f"No images found in {input_dir}")
        sys.exit(1)
    # Get data strategy parameters
    multiplication_factor = data_strategy_config.get("multiplication_factor", 3.0)
    random_seed = data_strategy_config.get("random_seed")
    logger.info(f"Found {len(image_files)} images to process")
    logger.info(f"Output directory: {output_dir}")
-    logger.info(f"Number of augmentations per image: {processing_config.get('num_augmentations', 3)}")
+    logger.info(f"Data strategy: multiplication_factor = {multiplication_factor}")
    if multiplication_factor < 1.0:
        logger.info(f"SAMPLING MODE: Will process {multiplication_factor*100:.1f}% of input images")
    else:
        logger.info(f"MULTIPLICATION MODE: Target {multiplication_factor}x dataset size")
    logger.info(f"Target size: {processing_config.get('target_size', [224, 224])}")
    if random_seed:
        logger.info(f"Random seed: {random_seed}")
    # Process with ID detection if enabled
    if id_detection_config.get('enabled', False):
@@ -360,24 +371,52 @@ def main():
            target_size=id_detection_config.get('target_size'),
            padding=id_detection_config.get('padding', 10)
        )
-        # Bước 2: Augment các card đã crop
+        # Bước 2: Augment các card đã crop với strategy mới
-        logger.info("Step 2: Augment cropped ID cards...")
+        logger.info("Step 2: Augment cropped ID cards with smart strategy...")
        augmenter = DataAugmentation(augmentation_config)
-        augmenter.batch_augment(
+        
        # Truyền full config để augmenter có thể access data_strategy
        augmenter.config.update({"data_strategy": data_strategy_config})
        augment_results = augmenter.batch_augment(
            processed_dir,
            output_dir,
-            num_augmentations=processing_config.get("num_augmentations", 3)
+            multiplication_factor=multiplication_factor,
            random_seed=random_seed
        )
        # Log results  
        if augment_results:
            logger.info(f"Augmentation Summary:")
            logger.info(f"   Input images: {augment_results.get('input_images', 0)}")
            logger.info(f"   Selected for processing: {augment_results.get('selected_images', 0)}")  
            logger.info(f"   Target total: {augment_results.get('target_total', 0)}")
            logger.info(f"   Actually generated: {augment_results.get('actual_generated', 0)}")
            logger.info(f"   Efficiency: {augment_results.get('efficiency', 0):.1%}")
    else:
-        # Augment trực tiếp ảnh gốc
+        # Augment trực tiếp ảnh gốc với strategy mới
-        logger.info("Starting normal batch augmentation (direct augmentation)...")
+        logger.info("Starting smart batch augmentation (direct augmentation)...")
        augmenter = DataAugmentation(augmentation_config)
-        augmenter.batch_augment(
+        
        # Truyền full config để augmenter có thể access data_strategy  
        augmenter.config.update({"data_strategy": data_strategy_config})
        augment_results = augmenter.batch_augment(
            input_dir,
            output_dir,
-            num_augmentations=processing_config.get("num_augmentations", 3)
+            multiplication_factor=multiplication_factor,
            random_seed=random_seed
        )
        # Log results
        if augment_results:
            logger.info(f"Augmentation Summary:")
            logger.info(f"   Input images: {augment_results.get('input_images', 0)}")
            logger.info(f"   Selected for processing: {augment_results.get('selected_images', 0)}")
            logger.info(f"   Target total: {augment_results.get('target_total', 0)}")  
            logger.info(f"   Actually generated: {augment_results.get('actual_generated', 0)}")
            logger.info(f"   Efficiency: {augment_results.get('efficiency', 0):.1%}")
    logger.info("Data processing completed successfully")
 if __name__ == "__main__":
--- a/src/data_augmentation.py
+++ b/src/data_augmentation.py
@@ -7,6 +7,7 @@ from pathlib import Path
 from typing import List, Tuple, Optional, Dict, Any
 import random
 import math
 import logging
 from image_processor import ImageProcessor
 from utils import load_image, save_image, create_augmented_filename, print_progress
@@ -22,6 +23,7 @@ class DataAugmentation:
        """
        self.config = config or {}
        self.image_processor = ImageProcessor()
        self.logger = logging.getLogger(__name__)
    def random_crop_preserve_quality(self, image: np.ndarray, crop_ratio_range: Tuple[float, float] = (0.7, 1.0)) -> np.ndarray:
        """
@@ -363,21 +365,306 @@ class DataAugmentation:
        return result
-    def augment_single_image(self, image: np.ndarray, num_augmentations: int = None) -> List[np.ndarray]:
+    def augment_single_image(self, image: np.ndarray, num_target_images: int = None) -> List[np.ndarray]:
        """
-        Apply each augmentation method separately to create independent augmented versions
+        Apply random combination of augmentation methods to create diverse augmented versions
        Args:
            image: Input image
-            num_augmentations: Number of augmented versions to create per method
+            num_target_images: Number of target augmented images to generate
        Returns:
-            List of augmented images (each method creates separate versions)
+            List of augmented images with random method combinations
        """
-        num_augmentations = num_augmentations or 3  # Default value
+        num_target_images = num_target_images or 3  # Default value
        # Get strategy config
        strategy_config = self.config.get("strategy", {})
        methods_config = self.config.get("methods", {})
        final_config = self.config.get("final_processing", {})
        mode = strategy_config.get("mode", "random_combine")
        min_methods = strategy_config.get("min_methods", 2)
        max_methods = strategy_config.get("max_methods", 4)
        if mode == "random_combine":
            return self._augment_random_combine(image, num_target_images, methods_config, final_config, min_methods, max_methods)
        elif mode == "sequential":
            return self._augment_sequential(image, num_target_images, methods_config, final_config)
        elif mode == "individual":
            return self._augment_individual_legacy(image, num_target_images)
        else:
            # Fallback to legacy method
            return self._augment_individual_legacy(image, num_target_images)
    def _augment_random_combine(self, image: np.ndarray, num_target_images: int, 
                               methods_config: dict, final_config: dict, 
                               min_methods: int, max_methods: int) -> List[np.ndarray]:
        """Apply random combination of methods"""
        augmented_images = []
-        # Get configuration
+        # Get enabled methods with their probabilities
        available_methods = []
        for method_name, method_config in methods_config.items():
            if method_config.get("enabled", False):
                available_methods.append((method_name, method_config))
        if not available_methods:
            self.logger.warning("No augmentation methods enabled!")
            return [image.copy() for _ in range(num_target_images)]
        for i in range(num_target_images):
            # Decide number of methods for this image
            num_methods = random.randint(min_methods, min(max_methods, len(available_methods)))
            # Select methods based on probability
            selected_methods = self._select_methods_by_probability(available_methods, num_methods)
            # Apply selected methods in sequence
            augmented = image.copy()
            method_names = []
            for method_name, method_config in selected_methods:
                if random.random() < method_config.get("probability", 0.5):
                    augmented = self._apply_single_method(augmented, method_name, method_config)
                    method_names.append(method_name)
            # Apply final processing
            augmented = self._apply_final_processing(augmented, final_config)
            # Resize preserving aspect ratio
            target_size = self.image_processor.target_size
            if target_size:
                augmented = self.resize_preserve_aspect(augmented, target_size)
            augmented_images.append(augmented)
        return augmented_images
    def _select_methods_by_probability(self, available_methods: List[Tuple], num_methods: int) -> List[Tuple]:
        """Select methods based on their probability weights"""
        # Create weighted list
        weighted_methods = []
        for method_name, method_config in available_methods:
            probability = method_config.get("probability", 0.5)
            weighted_methods.append((method_name, method_config, probability))
        # Sort by probability (highest first) and select top candidates
        weighted_methods.sort(key=lambda x: x[2], reverse=True)
        # Use weighted random selection
        selected = []
        remaining_methods = weighted_methods.copy()
        for _ in range(num_methods):
            if not remaining_methods:
                break
            # Calculate cumulative probabilities
            total_prob = sum(method[2] for method in remaining_methods)
            if total_prob == 0:
                # If all probabilities are 0, select randomly
                selected_method = random.choice(remaining_methods)
            else:
                rand_val = random.uniform(0, total_prob)
                cumulative_prob = 0
                selected_method = None
                for method in remaining_methods:
                    cumulative_prob += method[2]
                    if rand_val <= cumulative_prob:
                        selected_method = method
                        break
                if selected_method is None:
                    selected_method = remaining_methods[-1]
            selected.append((selected_method[0], selected_method[1]))
            remaining_methods.remove(selected_method)
        return selected
    def _apply_single_method(self, image: np.ndarray, method_name: str, method_config: dict) -> np.ndarray:
        """Apply a single augmentation method"""
        try:
            if method_name == "rotation":
                angles = method_config.get("angles", [30, 60, 90, 120, 150, 180, 210, 240, 300, 330])
                angle = random.choice(angles)
                return self.rotate_image_preserve_quality(image, angle)
            elif method_name == "random_cropping":
                ratio_range = method_config.get("ratio_range", (0.7, 1.0))
                return self.random_crop_preserve_quality(image, ratio_range)
            elif method_name == "random_noise":
                mean_range = method_config.get("mean_range", (0.0, 0.7))
                variance_range = method_config.get("variance_range", (0.0, 0.1))
                return self.add_random_noise_preserve_quality(image, mean_range, variance_range)
            elif method_name == "partial_blockage":
                num_range = method_config.get("num_occlusions_range", (1, 100))
                coverage_range = method_config.get("coverage_range", (0.0, 0.25))
                variance_range = method_config.get("variance_range", (0.0, 0.1))
                return self.add_partial_blockage_preserve_quality(image, num_range, coverage_range, variance_range)
            elif method_name == "blurring":
                kernel_range = method_config.get("kernel_ratio_range", (0.0, 0.0084))
                return self.apply_blurring_preserve_quality(image, kernel_range)
            elif method_name == "brightness_contrast":
                alpha_range = method_config.get("alpha_range", (0.4, 3.0))
                beta_range = method_config.get("beta_range", (1, 100))
                return self.adjust_brightness_contrast_preserve_quality(image, alpha_range, beta_range)
            elif method_name == "color_jitter":
                return self.apply_color_jitter(image, method_config)
            elif method_name == "perspective":
                distortion_scale = method_config.get("distortion_scale", 0.2)
                return self.apply_perspective_transform(image, distortion_scale)
            else:
                return image
        except Exception as e:
            print(f"Error applying method {method_name}: {e}")
            return image
    def _apply_final_processing(self, image: np.ndarray, final_config: dict) -> np.ndarray:
        """Apply final processing steps - ALWAYS applied to all outputs"""
        # Grayscale conversion - ALWAYS applied if enabled
        grayscale_config = final_config.get("grayscale", {})
        if grayscale_config.get("enabled", False):
            # Always apply grayscale, no random check
            image = self.convert_to_grayscale_preserve_quality(image)
        # Quality enhancement (future feature)
        quality_config = final_config.get("quality_enhancement", {})
        if quality_config.get("enabled", False):
            # TODO: Implement quality enhancement
            pass
        return image
    def apply_color_jitter(self, image: np.ndarray, config: dict) -> np.ndarray:
        """
        Apply color jittering (brightness, contrast, saturation, hue adjustments)
        Args:
            image: Input image
            config: Color jitter configuration
        Returns:
            Color-jittered image
        """
        # Get parameters
        brightness_range = config.get("brightness_range", [0.8, 1.2])
        contrast_range = config.get("contrast_range", [0.8, 1.2])
        saturation_range = config.get("saturation_range", [0.8, 1.2])
        hue_range = config.get("hue_range", [-0.1, 0.1])
        # Convert to HSV for saturation and hue adjustments
        hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV).astype(np.float32)
        # Apply brightness (adjust V channel)
        brightness_factor = random.uniform(brightness_range[0], brightness_range[1])
        hsv[:, :, 2] = np.clip(hsv[:, :, 2] * brightness_factor, 0, 255)
        # Apply saturation (adjust S channel)
        saturation_factor = random.uniform(saturation_range[0], saturation_range[1])
        hsv[:, :, 1] = np.clip(hsv[:, :, 1] * saturation_factor, 0, 255)
        # Apply hue shift (adjust H channel)
        hue_shift = random.uniform(hue_range[0], hue_range[1]) * 179  # OpenCV hue range is 0-179
        hsv[:, :, 0] = (hsv[:, :, 0] + hue_shift) % 180
        # Convert back to RGB
        result = cv2.cvtColor(hsv.astype(np.uint8), cv2.COLOR_HSV2RGB)
        # Apply contrast (after converting back to RGB)
        contrast_factor = random.uniform(contrast_range[0], contrast_range[1])
        result = cv2.convertScaleAbs(result, alpha=contrast_factor, beta=0)
        return result
    def apply_perspective_transform(self, image: np.ndarray, distortion_scale: float = 0.2) -> np.ndarray:
        """
        Apply perspective transformation to simulate viewing angle changes
        Args:
            image: Input image
            distortion_scale: Scale of perspective distortion (0.0 to 1.0)
        Returns:
            Perspective-transformed image
        """
        height, width = image.shape[:2]
        # Define source points (corners of original image)
        src_points = np.float32([
            [0, 0],
            [width-1, 0],
            [width-1, height-1],
            [0, height-1]
        ])
        # Add random distortion to destination points
        max_distortion = min(width, height) * distortion_scale
        dst_points = np.float32([
            [random.uniform(0, max_distortion), random.uniform(0, max_distortion)],
            [width-1-random.uniform(0, max_distortion), random.uniform(0, max_distortion)],
            [width-1-random.uniform(0, max_distortion), height-1-random.uniform(0, max_distortion)],
            [random.uniform(0, max_distortion), height-1-random.uniform(0, max_distortion)]
        ])
        # Calculate perspective transformation matrix
        matrix = cv2.getPerspectiveTransform(src_points, dst_points)
        # Apply transformation
        result = cv2.warpPerspective(image, matrix, (width, height), 
                                   borderMode=cv2.BORDER_CONSTANT, 
                                   borderValue=(255, 255, 255))
        return result
    def _augment_sequential(self, image: np.ndarray, num_target_images: int, 
                          methods_config: dict, final_config: dict) -> List[np.ndarray]:
        """Apply methods in sequence (pipeline style)"""
        augmented_images = []
        # Get enabled methods
        enabled_methods = [
            (name, config) for name, config in methods_config.items() 
            if config.get("enabled", False)
        ]
        for i in range(num_target_images):
            augmented = image.copy()
            # Apply all enabled methods in sequence
            for method_name, method_config in enabled_methods:
                if random.random() < method_config.get("probability", 0.5):
                    augmented = self._apply_single_method(augmented, method_name, method_config)
            # Apply final processing
            augmented = self._apply_final_processing(augmented, final_config)
            # Resize preserving aspect ratio
            target_size = self.image_processor.target_size
            if target_size:
                augmented = self.resize_preserve_aspect(augmented, target_size)
            augmented_images.append(augmented)
        return augmented_images
    def _augment_individual_legacy(self, image: np.ndarray, num_target_images: int) -> List[np.ndarray]:
        """Legacy individual method application (backward compatibility)"""
        # This is the old implementation for backward compatibility
        augmented_images = []
        # Get old-style configuration
        rotation_config = self.config.get("rotation", {})
        cropping_config = self.config.get("random_cropping", {})
        noise_config = self.config.get("random_noise", {})
@@ -386,177 +673,272 @@ class DataAugmentation:
        blurring_config = self.config.get("blurring", {})
        brightness_contrast_config = self.config.get("brightness_contrast", {})
-        # Configuration parameters
+        # Apply individual methods (old logic)
-        angles = rotation_config.get("angles", [30, 60, 120, 150, 180, 210, 240, 300, 330])
+        methods = [
-        crop_ratio_range = cropping_config.get("ratio_range", (0.7, 1.0))
+            ("rotation", rotation_config, self.rotate_image_preserve_quality),
-        mean_range = noise_config.get("mean_range", (0.0, 0.7))
+            ("cropping", cropping_config, self.random_crop_preserve_quality),
-        variance_range = noise_config.get("variance_range", (0.0, 0.1))
+            ("noise", noise_config, self.add_random_noise_preserve_quality),
-        num_occlusions_range = blockage_config.get("num_occlusions_range", (1, 100))
+            ("blockage", blockage_config, self.add_partial_blockage_preserve_quality),
-        coverage_range = blockage_config.get("coverage_range", (0.0, 0.25))
+            ("blurring", blurring_config, self.apply_blurring_preserve_quality),
-        blockage_variance_range = blockage_config.get("variance_range", (0.0, 0.1))
+            ("brightness_contrast", brightness_contrast_config, self.adjust_brightness_contrast_preserve_quality)
-        kernel_ratio_range = blurring_config.get("kernel_ratio_range", (0.0, 0.0084))
+        ]
        alpha_range = brightness_contrast_config.get("alpha_range", (0.4, 3.0))
        beta_range = brightness_contrast_config.get("beta_range", (1, 100))
-        # Apply each method separately to create independent versions
+        for method_name, method_config, method_func in methods:
            if method_config.get("enabled", False):
                for i in range(num_target_images):
                    augmented = image.copy()
                    # Apply single method with appropriate parameters
                    if method_name == "rotation":
                        angles = method_config.get("angles", [30, 60, 90, 120, 150, 180, 210, 240, 300, 330])
                        angle = random.choice(angles)
                        augmented = method_func(augmented, angle)
                    elif method_name == "cropping":
                        ratio_range = method_config.get("ratio_range", (0.7, 1.0))
                        augmented = method_func(augmented, ratio_range)
                    # Add other method parameter handling as needed
-        # 1. Rotation only
+                    # Resize preserving aspect ratio
-        if rotation_config.get("enabled", False):
+                    target_size = self.image_processor.target_size
-            for i in range(num_augmentations):
+                    if target_size:
-                augmented = image.copy()
+                        augmented = self.resize_preserve_aspect(augmented, target_size)
                angle = random.choice(angles)
                augmented = self.rotate_image_preserve_quality(augmented, angle)
-                # Resize preserving aspect ratio
+                    augmented_images.append(augmented)
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
-                augmented_images.append(augmented)
+        # Apply grayscale to all images
        # 2. Random cropping only
        if cropping_config.get("enabled", False):
            for i in range(num_augmentations):
                augmented = image.copy()
                augmented = self.random_crop_preserve_quality(augmented, crop_ratio_range)
                # Resize preserving aspect ratio
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
                augmented_images.append(augmented)
        # 3. Random noise only
        if noise_config.get("enabled", False):
            for i in range(num_augmentations):
                augmented = image.copy()
                augmented = self.add_random_noise_preserve_quality(augmented, mean_range, variance_range)
                # Resize preserving aspect ratio
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
                augmented_images.append(augmented)
        # 4. Partial blockage only
        if blockage_config.get("enabled", False):
            for i in range(num_augmentations):
                augmented = image.copy()
                augmented = self.add_partial_blockage_preserve_quality(augmented, num_occlusions_range, coverage_range, blockage_variance_range)
                # Resize preserving aspect ratio
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
                augmented_images.append(augmented)
        # 5. Blurring only
        if blurring_config.get("enabled", False):
            for i in range(num_augmentations):
                augmented = image.copy()
                augmented = self.apply_blurring_preserve_quality(augmented, kernel_ratio_range)
                # Resize preserving aspect ratio
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
                augmented_images.append(augmented)
        # 6. Brightness/Contrast only
        if brightness_contrast_config.get("enabled", False):
            for i in range(num_augmentations):
                augmented = image.copy()
                augmented = self.adjust_brightness_contrast_preserve_quality(augmented, alpha_range, beta_range)
                # Resize preserving aspect ratio
                target_size = self.image_processor.target_size
                if target_size:
                    augmented = self.resize_preserve_aspect(augmented, target_size)
                augmented_images.append(augmented)
        # 7. Apply grayscale as final step to ALL augmented images
        if grayscale_config.get("enabled", False):
            for i in range(len(augmented_images)):
                augmented_images[i] = self.convert_to_grayscale_preserve_quality(augmented_images[i])
        return augmented_images
-    def augment_image_file(self, image_path: Path, output_dir: Path, num_augmentations: int = None) -> List[Path]:
+    def augment_image_file(self, image_path: Path, output_dir: Path, num_target_images: int = None) -> List[Path]:
        """
        Augment a single image file and save results with quality preservation
        Args:
            image_path: Path to input image
            output_dir: Output directory for augmented images  
-            num_augmentations: Number of augmented versions to create per method
+            num_target_images: Number of target augmented images to generate
        Returns:
            List of paths to saved augmented images
        """
        # Load image without resizing to preserve original quality
-        image = load_image(image_path, None)  # Load original size
+        image = load_image(image_path, None)
        if image is None:
            return []
        # Apply augmentations
-        augmented_images = self.augment_single_image(image, num_augmentations)
+        augmented_images = self.augment_single_image(image, num_target_images)
-        # Save augmented images with method names
+        # Save augmented images
        saved_paths = []
        method_names = ["rotation", "cropping", "noise", "blockage", "blurring", "brightness_contrast", "grayscale"]
        method_index = 0
        for i, aug_image in enumerate(augmented_images):
-            # Determine method name based on index
+            base_name = image_path.stem
-            method_name = method_names[method_index // num_augmentations] if method_index // num_augmentations < len(method_names) else "aug"
+            output_filename = f"{base_name}_aug_{i+1:03d}.jpg"
            output_path = output_dir / output_filename
            # Create output filename with method name
            output_filename = create_augmented_filename(image_path, (i % num_augmentations) + 1, method_name)
            output_path = output_dir / output_filename.name
            # Save image
            if save_image(aug_image, output_path):
                saved_paths.append(output_path)
-            method_index += 1
+        return saved_paths
    def augment_image_file_with_raw(self, image_path: Path, output_dir: Path, 
                                  num_total_versions: int = None) -> List[Path]:
        """
        Augment a single image file including raw/original version
        Args:
            image_path: Path to input image
            output_dir: Output directory for all image versions
            num_total_versions: Total number of versions (including raw)
        Returns:
            List of paths to saved images (raw + augmented)
        """
        # Load original image
        image = load_image(image_path, None)
        if image is None:
            return []
        saved_paths = []
        base_name = image_path.stem
        # Always save raw version first (resized but not augmented)
        if num_total_versions > 0:
            raw_image = image.copy()
            # Apply final processing (grayscale) but no augmentation
            final_config = self.config.get("final_processing", {})
            raw_image = self._apply_final_processing(raw_image, final_config)
            # Resize to target size
            target_size = self.image_processor.target_size
            if target_size:
                raw_image = self.resize_preserve_aspect(raw_image, target_size)
            # Save raw version
            raw_filename = f"{base_name}_raw_001.jpg"
            raw_path = output_dir / raw_filename
            if save_image(raw_image, raw_path):
                saved_paths.append(raw_path)
        # Generate augmented versions for remaining slots
        num_augmented = max(0, num_total_versions - 1)
        if num_augmented > 0:
            augmented_images = self.augment_single_image(image, num_augmented)
            for i, aug_image in enumerate(augmented_images):
                aug_filename = f"{base_name}_aug_{i+1:03d}.jpg"
                aug_path = output_dir / aug_filename
                if save_image(aug_image, aug_path):
                    saved_paths.append(aug_path)
        return saved_paths
-    def batch_augment(self, input_dir: Path, output_dir: Path, num_augmentations: int = None) -> Dict[str, List[Path]]:
+    def batch_augment(self, input_dir: Path, output_dir: Path, 
                     multiplication_factor: float = None, random_seed: int = None) -> Dict[str, List[Path]]:
        """
-        Augment all images in a directory
+        Augment images in a directory with smart sampling and multiplication strategy
        Args:
            input_dir: Input directory containing images
            output_dir: Output directory for augmented images
-            num_augmentations: Number of augmented versions per image
+            multiplication_factor: 
                - If < 1.0: Sample percentage of input data to augment
                - If >= 1.0: Target multiplication factor for output data size
            random_seed: Random seed for reproducibility
        Returns:
-            Dictionary mapping original images to their augmented versions
+            Dictionary containing results and statistics
        """
        from utils import get_image_files
-        image_files = get_image_files(input_dir)
+        # Set random seed for reproducibility
        if random_seed is not None:
            random.seed(random_seed)
            np.random.seed(random_seed)
        # Get all input images
        all_image_files = get_image_files(input_dir)
        if not all_image_files:
            print("No images found in input directory")
            return {}
        # Get multiplication factor from config if not provided
        if multiplication_factor is None:
            data_strategy = self.config.get("data_strategy", {})
            multiplication_factor = data_strategy.get("multiplication_factor", 3.0)
        print(f"Found {len(all_image_files)} total images")
        print(f"Multiplication factor: {multiplication_factor}")
        # Determine sampling strategy
        if multiplication_factor < 1.0:
            # Sampling mode: Take a percentage of input data
            num_selected = int(len(all_image_files) * multiplication_factor)
            selected_images = self._sample_images(all_image_files, num_selected)
            target_total_images = len(all_image_files)  # Keep original dataset size
            images_per_input = max(1, target_total_images // len(selected_images))
            print(f"SAMPLING MODE: Selected {len(selected_images)} images ({multiplication_factor*100:.1f}%)")
            print(f"Target: {target_total_images} total images, {images_per_input} per selected image")
        else:
            # Multiplication mode: Multiply dataset size
            selected_images = all_image_files
            target_total_images = int(len(all_image_files) * multiplication_factor)
            images_per_input = max(1, target_total_images // len(selected_images))
            print(f"MULTIPLICATION MODE: Processing all {len(selected_images)} images")
            print(f"Target: {target_total_images} total images ({multiplication_factor}x original), {images_per_input} per image")
        # Process selected images
        results = {}
        total_generated = 0
-        print(f"Found {len(image_files)} images to augment")
+        for i, image_path in enumerate(selected_images):
            print_progress(i + 1, len(selected_images), f"Processing {image_path.name}")
-        for i, image_path in enumerate(image_files):
+            # Calculate number of versions for this image (including raw)
-            print_progress(i + 1, len(image_files), "Augmenting images")
+            remaining_images = target_total_images - total_generated
            remaining_inputs = len(selected_images) - i
            total_versions_needed = min(images_per_input, remaining_images)
-            # Augment single image
+            # Always include raw image, then augmented ones
-            augmented_paths = self.augment_image_file(image_path, output_dir, num_augmentations)
+            augmented_paths = self.augment_image_file_with_raw(
                image_path, output_dir, total_versions_needed
            )
            if augmented_paths:
                results[str(image_path)] = augmented_paths
                total_generated += len(augmented_paths)
-        print(f"\nAugmented {len(results)} images successfully")
+        # Generate summary
-        return results
+        summary = {
            "input_images": len(all_image_files),
            "selected_images": len(selected_images),
            "target_total": target_total_images,
            "actual_generated": total_generated,
            "multiplication_factor": multiplication_factor,
            "mode": "sampling" if multiplication_factor < 1.0 else "multiplication",
            "results": results,
            "efficiency": total_generated / target_total_images if target_total_images > 0 else 0
        }
        print(f"\n✅ Augmentation completed!")
        print(f"Generated {total_generated} images from {len(selected_images)} selected images")
        print(f"Target vs Actual: {target_total_images} → {total_generated} ({summary['efficiency']:.1%} efficiency)")
        return summary
    def _sample_images(self, image_files: List[Path], num_selected: int) -> List[Path]:
        """Sample images from the input list based on strategy"""
        data_strategy = self.config.get("data_strategy", {})
        sampling_config = data_strategy.get("sampling", {})
        method = sampling_config.get("method", "random")
        preserve_distribution = sampling_config.get("preserve_distribution", True)
        if method == "random":
            # Simple random sampling
            return random.sample(image_files, min(num_selected, len(image_files)))
        elif method == "stratified" and preserve_distribution:
            # Stratified sampling by file extension
            extension_groups = {}
            for img_file in image_files:
                ext = img_file.suffix.lower()
                if ext not in extension_groups:
                    extension_groups[ext] = []
                extension_groups[ext].append(img_file)
            selected = []
            for ext, files in extension_groups.items():
                # Sample proportionally from each extension group
                group_size = max(1, int(num_selected * len(files) / len(image_files)))
                group_selected = random.sample(files, min(group_size, len(files)))
                selected.extend(group_selected)
            # If we have too few, add more randomly
            if len(selected) < num_selected:
                remaining = [f for f in image_files if f not in selected]
                additional = random.sample(remaining, 
                                         min(num_selected - len(selected), len(remaining)))
                selected.extend(additional)
            return selected[:num_selected]
        elif method == "uniform":
            # Uniform sampling - evenly spaced
            if num_selected >= len(image_files):
                return image_files
            step = len(image_files) / num_selected
            indices = [int(i * step) for i in range(num_selected)]
            return [image_files[i] for i in indices]
        else:
            # Fallback to random
            return random.sample(image_files, min(num_selected, len(image_files)))
    def get_augmentation_summary(self, results: Dict[str, List[Path]]) -> Dict[str, Any]:
        """