init
This commit is contained in:
17
.gitignore
vendored
Normal file
17
.gitignore
vendored
Normal file
@@ -0,0 +1,17 @@
|
||||
*.png
|
||||
*.json
|
||||
*.jpg
|
||||
*.zip
|
||||
*.rar
|
||||
*.pdf
|
||||
*.docx
|
||||
*.doc
|
||||
*.xls
|
||||
*.xlsx
|
||||
*.ppt
|
||||
*.pptx
|
||||
*.txt
|
||||
*.csv
|
||||
*.json
|
||||
*.pt
|
||||
*.ipynb
|
183
README_ID_Card_Processing.md
Normal file
183
README_ID_Card_Processing.md
Normal file
@@ -0,0 +1,183 @@
|
||||
# ID Card Processing with YOLO Detection
|
||||
|
||||
Hệ thống xử lý ID cards sử dụng YOLO để detect và crop, kết hợp với các phương pháp tiền xử lý để clean background và enhance chất lượng ảnh.
|
||||
|
||||
## Tính năng chính
|
||||
|
||||
- **YOLO Detection**: Detect và crop ID cards từ ảnh gốc
|
||||
- **Background Removal**: 3 phương pháp loại bỏ background (GrabCut, Threshold, Contour)
|
||||
- **Image Enhancement**: Cải thiện chất lượng ảnh cho OCR
|
||||
- **Batch Processing**: Xử lý hàng loạt ảnh
|
||||
- **Flexible Pipeline**: Có thể chạy từng bước riêng biệt
|
||||
|
||||
## Cài đặt
|
||||
|
||||
1. Cài đặt dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
2. Cấu trúc thư mục:
|
||||
```
|
||||
OCR/
|
||||
├── src/
|
||||
│ ├── model/
|
||||
│ │ ├── __init__.py
|
||||
│ │ ├── yolo_detector.py
|
||||
│ │ └── id_card_processor.py
|
||||
│ └── ...
|
||||
├── data/
|
||||
│ ├── IDcards/ # Thư mục chứa ảnh ID cards gốc
|
||||
│ └── processed_id_cards/ # Thư mục output
|
||||
├── id_card_processor_main.py
|
||||
└── requirements.txt
|
||||
```
|
||||
|
||||
## Sử dụng
|
||||
|
||||
### 1. Full Pipeline (Detect + Preprocess)
|
||||
|
||||
```bash
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--confidence 0.5 \
|
||||
--bg-removal grabcut \
|
||||
--target-size 800x600 \
|
||||
--save-annotated
|
||||
```
|
||||
|
||||
### 2. Chỉ Detect và Crop
|
||||
|
||||
```bash
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--detect-only \
|
||||
--save-annotated
|
||||
```
|
||||
|
||||
### 3. Chỉ Preprocess (bỏ qua detection)
|
||||
|
||||
```bash
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--preprocess-only \
|
||||
--bg-removal threshold \
|
||||
--target-size 800x600
|
||||
```
|
||||
|
||||
## Các tham số
|
||||
|
||||
### Detection Parameters
|
||||
- `--model-path`: Đường dẫn đến custom YOLO model (.pt file)
|
||||
- `--confidence`: Ngưỡng confidence cho detection (default: 0.5)
|
||||
|
||||
### Preprocessing Parameters
|
||||
- `--bg-removal`: Phương pháp loại bỏ background
|
||||
- `grabcut`: Sử dụng GrabCut algorithm (recommended)
|
||||
- `threshold`: Sử dụng thresholding
|
||||
- `contour`: Sử dụng contour detection
|
||||
- `none`: Không loại bỏ background
|
||||
- `--target-size`: Kích thước chuẩn hóa (width x height)
|
||||
|
||||
### Output Options
|
||||
- `--save-annotated`: Lưu ảnh với bounding boxes
|
||||
- `--detect-only`: Chỉ chạy detection
|
||||
- `--preprocess-only`: Chỉ chạy preprocessing
|
||||
|
||||
## Output Structure
|
||||
|
||||
```
|
||||
data/processed_id_cards/
|
||||
├── cropped/ # Ảnh đã được crop từ YOLO
|
||||
│ ├── image1_card_1.jpg
|
||||
│ ├── image1_card_2.jpg
|
||||
│ └── ...
|
||||
├── processed/ # Ảnh đã được preprocess
|
||||
│ ├── image1_card_1_processed.jpg
|
||||
│ ├── image1_card_2_processed.jpg
|
||||
│ └── ...
|
||||
└── annotated/ # Ảnh với bounding boxes (nếu có)
|
||||
├── image1_annotated.jpg
|
||||
└── ...
|
||||
```
|
||||
|
||||
## Ví dụ sử dụng
|
||||
|
||||
### Ví dụ 1: Xử lý toàn bộ dataset
|
||||
```bash
|
||||
# Xử lý tất cả ảnh trong thư mục IDcards
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--confidence 0.6 \
|
||||
--bg-removal grabcut \
|
||||
--target-size 1024x768 \
|
||||
--save-annotated
|
||||
```
|
||||
|
||||
### Ví dụ 2: Test với một vài ảnh
|
||||
```bash
|
||||
# Tạo thư mục test với một vài ảnh
|
||||
mkdir -p data/test_images
|
||||
# Copy một vài ảnh vào test_images
|
||||
|
||||
# Chạy detection
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/test_images" \
|
||||
--output-dir "data/test_output" \
|
||||
--detect-only \
|
||||
--save-annotated
|
||||
```
|
||||
|
||||
### Ví dụ 3: Sử dụng custom model
|
||||
```bash
|
||||
# Nếu bạn có custom YOLO model đã train
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--model-path "models/custom_id_card_model.pt" \
|
||||
--confidence 0.7
|
||||
```
|
||||
|
||||
## Lưu ý
|
||||
|
||||
1. **YOLO Model**: Mặc định sử dụng YOLOv8n pre-trained. Nếu có custom model tốt hơn, hãy sử dụng `--model-path`
|
||||
|
||||
2. **Background Removal**:
|
||||
- `grabcut`: Tốt nhất cho ID cards có background phức tạp
|
||||
- `threshold`: Nhanh, phù hợp với background đơn giản
|
||||
- `contour`: Phù hợp với ID cards có viền rõ ràng
|
||||
|
||||
3. **Performance**:
|
||||
- Sử dụng GPU nếu có thể để tăng tốc độ detection
|
||||
- Có thể điều chỉnh `--confidence` để cân bằng giữa precision và recall
|
||||
|
||||
4. **Memory**: Với dataset lớn, có thể cần tăng memory hoặc xử lý theo batch nhỏ hơn
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Lỗi thường gặp
|
||||
|
||||
1. **No detections found**:
|
||||
- Giảm `--confidence` xuống 0.3-0.4
|
||||
- Kiểm tra chất lượng ảnh input
|
||||
|
||||
2. **Memory error**:
|
||||
- Giảm batch size hoặc xử lý từng ảnh một
|
||||
- Sử dụng CPU thay vì GPU
|
||||
|
||||
3. **Poor background removal**:
|
||||
- Thử các phương pháp khác nhau: `grabcut`, `threshold`, `contour`
|
||||
- Điều chỉnh parameters trong code
|
||||
|
||||
### Debug mode
|
||||
|
||||
```bash
|
||||
python id_card_processor_main.py \
|
||||
--input-dir "data/IDcards" \
|
||||
--output-dir "data/processed_id_cards" \
|
||||
--log-level DEBUG
|
||||
```
|
48
config/config.yaml
Normal file
48
config/config.yaml
Normal file
@@ -0,0 +1,48 @@
|
||||
# Data Augmentation Configuration
|
||||
# Main configuration file for image data augmentation
|
||||
|
||||
# Paths configuration
|
||||
paths:
|
||||
input_dir: "data/Archive"
|
||||
output_dir: "out"
|
||||
log_file: "logs/data_augmentation.log"
|
||||
|
||||
# Data augmentation parameters - ONLY ROTATION
|
||||
augmentation:
|
||||
# Geometric transformations - ONLY ROTATION
|
||||
rotation:
|
||||
enabled: true
|
||||
angles: [30, 60, 120, 150, 180, 210, 240, 300, 330] # Specific rotation angles
|
||||
probability: 1.0 # Always apply rotation
|
||||
|
||||
# Processing configuration
|
||||
processing:
|
||||
target_size: [224, 224] # [width, height]
|
||||
batch_size: 32
|
||||
num_augmentations: 3 # number of augmented versions per image
|
||||
save_format: "jpg"
|
||||
quality: 95
|
||||
|
||||
# Supported image formats
|
||||
supported_formats:
|
||||
- ".jpg"
|
||||
- ".jpeg"
|
||||
- ".png"
|
||||
- ".bmp"
|
||||
- ".tiff"
|
||||
|
||||
# Logging configuration
|
||||
logging:
|
||||
level: "INFO" # DEBUG, INFO, WARNING, ERROR
|
||||
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
handlers:
|
||||
- type: "file"
|
||||
filename: "logs/data_augmentation.log"
|
||||
- type: "console"
|
||||
|
||||
# Performance settings
|
||||
performance:
|
||||
num_workers: 4
|
||||
prefetch_factor: 2
|
||||
pin_memory: true
|
||||
use_gpu: false
|
222
data_augmentation.log
Normal file
222
data_augmentation.log
Normal file
@@ -0,0 +1,222 @@
|
||||
2025-08-05 18:53:06,981 - src.model.yolo_detector - INFO - Using pre-trained YOLOv8n model
|
||||
2025-08-05 18:53:07,004 - src.model.yolo_detector - INFO - Using device: cuda
|
||||
2025-08-05 18:53:07,038 - src.model.yolo_detector - INFO - Using pre-trained YOLOv8n model
|
||||
2025-08-05 18:53:07,038 - src.model.yolo_detector - INFO - Using device: cuda
|
||||
2025-08-05 18:53:07,361 - src.model.yolo_detector - INFO - Using pre-trained YOLOv8n model
|
||||
2025-08-05 18:53:07,362 - src.model.yolo_detector - INFO - Using device: cuda
|
||||
2025-08-05 18:53:07,363 - src.model.id_card_processor - INFO - Detecting and cropping ID cards...
|
||||
2025-08-05 18:53:07,363 - src.model.yolo_detector - ERROR - No images found in data\IDcards
|
||||
2025-08-05 18:53:07,364 - src.model.id_card_processor - INFO - Processing cropped ID cards...
|
||||
2025-08-05 18:53:07,364 - src.model.id_card_processor - ERROR - No images found in data\test_output\cropped
|
||||
2025-08-05 19:04:14,903 - src.model.yolo_detector - INFO - Using pre-trained YOLOv8n model
|
||||
2025-08-05 19:04:14,995 - src.model.yolo_detector - INFO - Using device: cuda
|
||||
2025-08-05 19:04:14,996 - src.model.id_card_processor - INFO - Detecting and cropping ID cards...
|
||||
2025-08-05 19:04:14,997 - src.model.yolo_detector - INFO - Processing 29 images from data\IDcards and subdirectories
|
||||
2025-08-05 19:04:14,998 - src.model.yolo_detector - INFO - Processing 1/29: im10.png
|
||||
2025-08-05 19:04:19,785 - src.model.yolo_detector - INFO - Found 1 detections in im10.png
|
||||
2025-08-05 19:04:19,813 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im10_card_1.jpg
|
||||
2025-08-05 19:04:19,813 - src.model.yolo_detector - INFO - Processed im10.png: 1 cards cropped
|
||||
2025-08-05 19:04:19,814 - src.model.yolo_detector - INFO - Processing 2/29: im11.png
|
||||
2025-08-05 19:04:19,926 - src.model.yolo_detector - INFO - Found 2 detections in im11.png
|
||||
2025-08-05 19:04:19,937 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im11_card_1.jpg
|
||||
2025-08-05 19:04:19,946 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im11_card_2.jpg
|
||||
2025-08-05 19:04:19,946 - src.model.yolo_detector - INFO - Processed im11.png: 2 cards cropped
|
||||
2025-08-05 19:04:19,946 - src.model.yolo_detector - INFO - Processing 3/29: im12.png
|
||||
2025-08-05 19:04:20,056 - src.model.yolo_detector - INFO - Found 2 detections in im12.png
|
||||
2025-08-05 19:04:20,069 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im12_card_1.jpg
|
||||
2025-08-05 19:04:20,082 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im12_card_2.jpg
|
||||
2025-08-05 19:04:20,083 - src.model.yolo_detector - INFO - Processed im12.png: 2 cards cropped
|
||||
2025-08-05 19:04:20,083 - src.model.yolo_detector - INFO - Processing 4/29: im13.png
|
||||
2025-08-05 19:04:20,116 - src.model.yolo_detector - INFO - Found 0 detections in im13.png
|
||||
2025-08-05 19:04:20,117 - src.model.yolo_detector - WARNING - No ID cards detected in im13.png
|
||||
2025-08-05 19:04:20,117 - src.model.yolo_detector - INFO - Processing 5/29: im14.png
|
||||
2025-08-05 19:04:20,156 - src.model.yolo_detector - INFO - Found 1 detections in im14.png
|
||||
2025-08-05 19:04:20,172 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im14_card_1.jpg
|
||||
2025-08-05 19:04:20,173 - src.model.yolo_detector - INFO - Processed im14.png: 1 cards cropped
|
||||
2025-08-05 19:04:20,174 - src.model.yolo_detector - INFO - Processing 6/29: im15.png
|
||||
2025-08-05 19:04:20,208 - src.model.yolo_detector - INFO - Found 1 detections in im15.png
|
||||
2025-08-05 19:04:20,222 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im15_card_1.jpg
|
||||
2025-08-05 19:04:20,222 - src.model.yolo_detector - INFO - Processed im15.png: 1 cards cropped
|
||||
2025-08-05 19:04:20,223 - src.model.yolo_detector - INFO - Processing 7/29: im1_.png
|
||||
2025-08-05 19:04:20,466 - src.model.yolo_detector - INFO - Found 0 detections in im1_.png
|
||||
2025-08-05 19:04:20,466 - src.model.yolo_detector - WARNING - No ID cards detected in im1_.png
|
||||
2025-08-05 19:04:20,466 - src.model.yolo_detector - INFO - Processing 8/29: im2.png
|
||||
2025-08-05 19:04:20,534 - src.model.yolo_detector - INFO - Found 2 detections in im2.png
|
||||
2025-08-05 19:04:20,564 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im2_card_1.jpg
|
||||
2025-08-05 19:04:20,594 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im2_card_2.jpg
|
||||
2025-08-05 19:04:20,594 - src.model.yolo_detector - INFO - Processed im2.png: 2 cards cropped
|
||||
2025-08-05 19:04:20,595 - src.model.yolo_detector - INFO - Processing 9/29: im3.png
|
||||
2025-08-05 19:04:20,648 - src.model.yolo_detector - INFO - Found 1 detections in im3.png
|
||||
2025-08-05 19:04:20,671 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im3_card_1.jpg
|
||||
2025-08-05 19:04:20,671 - src.model.yolo_detector - INFO - Processed im3.png: 1 cards cropped
|
||||
2025-08-05 19:04:20,672 - src.model.yolo_detector - INFO - Processing 10/29: im4.png
|
||||
2025-08-05 19:04:20,724 - src.model.yolo_detector - INFO - Found 1 detections in im4.png
|
||||
2025-08-05 19:04:20,753 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im4_card_1.jpg
|
||||
2025-08-05 19:04:20,754 - src.model.yolo_detector - INFO - Processed im4.png: 1 cards cropped
|
||||
2025-08-05 19:04:20,754 - src.model.yolo_detector - INFO - Processing 11/29: im5.png
|
||||
2025-08-05 19:04:20,798 - src.model.yolo_detector - INFO - Found 2 detections in im5.png
|
||||
2025-08-05 19:04:20,816 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im5_card_1.jpg
|
||||
2025-08-05 19:04:20,835 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im5_card_2.jpg
|
||||
2025-08-05 19:04:20,836 - src.model.yolo_detector - INFO - Processed im5.png: 2 cards cropped
|
||||
2025-08-05 19:04:20,837 - src.model.yolo_detector - INFO - Processing 12/29: im6.png
|
||||
2025-08-05 19:04:20,994 - src.model.yolo_detector - INFO - Found 2 detections in im6.png
|
||||
2025-08-05 19:04:21,052 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im6_card_1.jpg
|
||||
2025-08-05 19:04:21,118 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im6_card_2.jpg
|
||||
2025-08-05 19:04:21,119 - src.model.yolo_detector - INFO - Processed im6.png: 2 cards cropped
|
||||
2025-08-05 19:04:21,120 - src.model.yolo_detector - INFO - Processing 13/29: im7.png
|
||||
2025-08-05 19:04:21,159 - src.model.yolo_detector - INFO - Found 3 detections in im7.png
|
||||
2025-08-05 19:04:21,168 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im7_card_1.jpg
|
||||
2025-08-05 19:04:21,176 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im7_card_2.jpg
|
||||
2025-08-05 19:04:21,184 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im7_card_3.jpg
|
||||
2025-08-05 19:04:21,184 - src.model.yolo_detector - INFO - Processed im7.png: 3 cards cropped
|
||||
2025-08-05 19:04:21,185 - src.model.yolo_detector - INFO - Processing 14/29: im8.png
|
||||
2025-08-05 19:04:21,353 - src.model.yolo_detector - INFO - Found 2 detections in im8.png
|
||||
2025-08-05 19:04:21,387 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im8_card_1.jpg
|
||||
2025-08-05 19:04:21,423 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im8_card_2.jpg
|
||||
2025-08-05 19:04:21,424 - src.model.yolo_detector - INFO - Processed im8.png: 2 cards cropped
|
||||
2025-08-05 19:04:21,425 - src.model.yolo_detector - INFO - Processing 15/29: im9.png
|
||||
2025-08-05 19:04:21,522 - src.model.yolo_detector - INFO - Found 1 detections in im9.png
|
||||
2025-08-05 19:04:21,532 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\Archive\im9_card_1.jpg
|
||||
2025-08-05 19:04:21,532 - src.model.yolo_detector - INFO - Processed im9.png: 1 cards cropped
|
||||
2025-08-05 19:04:21,532 - src.model.yolo_detector - INFO - Processing 16/29: im10.png
|
||||
2025-08-05 19:04:21,585 - src.model.yolo_detector - INFO - Found 3 detections in im10.png
|
||||
2025-08-05 19:04:21,601 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im10_card_1.jpg
|
||||
2025-08-05 19:04:21,618 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im10_card_2.jpg
|
||||
2025-08-05 19:04:21,636 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im10_card_3.jpg
|
||||
2025-08-05 19:04:21,636 - src.model.yolo_detector - INFO - Processed im10.png: 3 cards cropped
|
||||
2025-08-05 19:04:21,638 - src.model.yolo_detector - INFO - Processing 17/29: im11.png
|
||||
2025-08-05 19:04:21,679 - src.model.yolo_detector - INFO - Found 2 detections in im11.png
|
||||
2025-08-05 19:04:21,696 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im11_card_1.jpg
|
||||
2025-08-05 19:04:21,712 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im11_card_2.jpg
|
||||
2025-08-05 19:04:21,713 - src.model.yolo_detector - INFO - Processed im11.png: 2 cards cropped
|
||||
2025-08-05 19:04:21,713 - src.model.yolo_detector - INFO - Processing 18/29: im12.png
|
||||
2025-08-05 19:04:21,755 - src.model.yolo_detector - INFO - Found 0 detections in im12.png
|
||||
2025-08-05 19:04:21,756 - src.model.yolo_detector - WARNING - No ID cards detected in im12.png
|
||||
2025-08-05 19:04:21,756 - src.model.yolo_detector - INFO - Processing 19/29: im13.png
|
||||
2025-08-05 19:04:21,793 - src.model.yolo_detector - INFO - Found 1 detections in im13.png
|
||||
2025-08-05 19:04:21,806 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im13_card_1.jpg
|
||||
2025-08-05 19:04:21,806 - src.model.yolo_detector - INFO - Processed im13.png: 1 cards cropped
|
||||
2025-08-05 19:04:21,806 - src.model.yolo_detector - INFO - Processing 20/29: im14.png
|
||||
2025-08-05 19:04:21,846 - src.model.yolo_detector - INFO - Found 2 detections in im14.png
|
||||
2025-08-05 19:04:21,862 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im14_card_1.jpg
|
||||
2025-08-05 19:04:21,877 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im14_card_2.jpg
|
||||
2025-08-05 19:04:21,877 - src.model.yolo_detector - INFO - Processed im14.png: 2 cards cropped
|
||||
2025-08-05 19:04:21,878 - src.model.yolo_detector - INFO - Processing 21/29: im15.png
|
||||
2025-08-05 19:04:21,914 - src.model.yolo_detector - INFO - Found 0 detections in im15.png
|
||||
2025-08-05 19:04:21,914 - src.model.yolo_detector - WARNING - No ID cards detected in im15.png
|
||||
2025-08-05 19:04:21,914 - src.model.yolo_detector - INFO - Processing 22/29: im1_.png
|
||||
2025-08-05 19:04:21,959 - src.model.yolo_detector - INFO - Found 3 detections in im1_.png
|
||||
2025-08-05 19:04:21,971 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im1__card_1.jpg
|
||||
2025-08-05 19:04:21,983 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im1__card_2.jpg
|
||||
2025-08-05 19:04:21,996 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im1__card_3.jpg
|
||||
2025-08-05 19:04:21,997 - src.model.yolo_detector - INFO - Processed im1_.png: 3 cards cropped
|
||||
2025-08-05 19:04:21,997 - src.model.yolo_detector - INFO - Processing 23/29: im2.png
|
||||
2025-08-05 19:04:22,101 - src.model.yolo_detector - INFO - Found 1 detections in im2.png
|
||||
2025-08-05 19:04:22,174 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im2_card_1.jpg
|
||||
2025-08-05 19:04:22,174 - src.model.yolo_detector - INFO - Processed im2.png: 1 cards cropped
|
||||
2025-08-05 19:04:22,176 - src.model.yolo_detector - INFO - Processing 24/29: im3.png
|
||||
2025-08-05 19:04:22,220 - src.model.yolo_detector - INFO - Found 2 detections in im3.png
|
||||
2025-08-05 19:04:22,235 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im3_card_1.jpg
|
||||
2025-08-05 19:04:22,251 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im3_card_2.jpg
|
||||
2025-08-05 19:04:22,252 - src.model.yolo_detector - INFO - Processed im3.png: 2 cards cropped
|
||||
2025-08-05 19:04:22,252 - src.model.yolo_detector - INFO - Processing 25/29: im5.png
|
||||
2025-08-05 19:04:22,307 - src.model.yolo_detector - INFO - Found 1 detections in im5.png
|
||||
2025-08-05 19:04:22,316 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im5_card_1.jpg
|
||||
2025-08-05 19:04:22,316 - src.model.yolo_detector - INFO - Processed im5.png: 1 cards cropped
|
||||
2025-08-05 19:04:22,317 - src.model.yolo_detector - INFO - Processing 26/29: im6.png
|
||||
2025-08-05 19:04:22,375 - src.model.yolo_detector - INFO - Found 2 detections in im6.png
|
||||
2025-08-05 19:04:22,387 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im6_card_1.jpg
|
||||
2025-08-05 19:04:22,397 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im6_card_2.jpg
|
||||
2025-08-05 19:04:22,398 - src.model.yolo_detector - INFO - Processed im6.png: 2 cards cropped
|
||||
2025-08-05 19:04:22,399 - src.model.yolo_detector - INFO - Processing 27/29: im7.png
|
||||
2025-08-05 19:04:22,441 - src.model.yolo_detector - INFO - Found 1 detections in im7.png
|
||||
2025-08-05 19:04:22,458 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im7_card_1.jpg
|
||||
2025-08-05 19:04:22,459 - src.model.yolo_detector - INFO - Processed im7.png: 1 cards cropped
|
||||
2025-08-05 19:04:22,460 - src.model.yolo_detector - INFO - Processing 28/29: im8.png
|
||||
2025-08-05 19:04:22,492 - src.model.yolo_detector - INFO - Found 2 detections in im8.png
|
||||
2025-08-05 19:04:22,502 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im8_card_1.jpg
|
||||
2025-08-05 19:04:22,509 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im8_card_2.jpg
|
||||
2025-08-05 19:04:22,510 - src.model.yolo_detector - INFO - Processed im8.png: 2 cards cropped
|
||||
2025-08-05 19:04:22,510 - src.model.yolo_detector - INFO - Processing 29/29: im9.png
|
||||
2025-08-05 19:04:22,540 - src.model.yolo_detector - INFO - Found 1 detections in im9.png
|
||||
2025-08-05 19:04:22,546 - src.model.yolo_detector - INFO - Saved cropped image to data\processed_id_cards\cropped\titre-sejour-fr\im9_card_1.jpg
|
||||
2025-08-05 19:04:22,546 - src.model.yolo_detector - INFO - Processed im9.png: 1 cards cropped
|
||||
2025-08-05 19:04:22,546 - src.model.yolo_detector - INFO - Batch processing completed:
|
||||
2025-08-05 19:04:22,548 - src.model.yolo_detector - INFO - - Total images: 29
|
||||
2025-08-05 19:04:22,548 - src.model.yolo_detector - INFO - - Processed: 25
|
||||
2025-08-05 19:04:22,548 - src.model.yolo_detector - INFO - - Total detections: 42
|
||||
2025-08-05 19:04:22,549 - src.model.yolo_detector - INFO - - Total cropped: 42
|
||||
2025-08-05 19:04:22,549 - src.model.id_card_processor - INFO - Processing cropped ID cards...
|
||||
2025-08-05 19:04:22,552 - src.model.id_card_processor - INFO - Processing 42 images from data\processed_id_cards\cropped and subdirectories
|
||||
2025-08-05 19:04:22,552 - src.model.id_card_processor - INFO - Processing 1/42: im10_card_1.jpg
|
||||
2025-08-05 19:04:22,564 - src.model.id_card_processor - INFO - Removing background from im10_card_1.jpg
|
||||
2025-08-05 19:04:22,877 - src.model.id_card_processor - INFO - Enhancing im10_card_1.jpg
|
||||
2025-08-05 19:04:23,016 - src.model.id_card_processor - INFO - Normalizing im10_card_1.jpg
|
||||
2025-08-05 19:04:23,023 - src.model.id_card_processor - INFO - Processed im10_card_1.jpg
|
||||
2025-08-05 19:04:23,023 - src.model.id_card_processor - INFO - Processing 2/42: im11_card_1.jpg
|
||||
2025-08-05 19:04:23,034 - src.model.id_card_processor - INFO - Removing background from im11_card_1.jpg
|
||||
2025-08-05 19:04:23,264 - src.model.id_card_processor - INFO - Enhancing im11_card_1.jpg
|
||||
2025-08-05 19:04:23,265 - src.model.id_card_processor - INFO - Normalizing im11_card_1.jpg
|
||||
2025-08-05 19:04:23,270 - src.model.id_card_processor - INFO - Processed im11_card_1.jpg
|
||||
2025-08-05 19:04:23,271 - src.model.id_card_processor - INFO - Processing 3/42: im11_card_2.jpg
|
||||
2025-08-05 19:04:23,282 - src.model.id_card_processor - INFO - Removing background from im11_card_2.jpg
|
||||
2025-08-05 19:04:23,312 - src.model.id_card_processor - INFO - Enhancing im11_card_2.jpg
|
||||
2025-08-05 19:04:23,313 - src.model.id_card_processor - INFO - Normalizing im11_card_2.jpg
|
||||
2025-08-05 19:04:23,316 - src.model.id_card_processor - INFO - Processed im11_card_2.jpg
|
||||
2025-08-05 19:04:23,316 - src.model.id_card_processor - INFO - Processing 4/42: im12_card_1.jpg
|
||||
2025-08-05 19:04:23,328 - src.model.id_card_processor - INFO - Removing background from im12_card_1.jpg
|
||||
2025-08-05 19:04:23,670 - src.model.id_card_processor - INFO - Enhancing im12_card_1.jpg
|
||||
2025-08-05 19:04:23,671 - src.model.id_card_processor - INFO - Normalizing im12_card_1.jpg
|
||||
2025-08-05 19:04:23,675 - src.model.id_card_processor - INFO - Processed im12_card_1.jpg
|
||||
2025-08-05 19:04:23,676 - src.model.id_card_processor - INFO - Processing 5/42: im12_card_2.jpg
|
||||
2025-08-05 19:04:23,686 - src.model.id_card_processor - INFO - Removing background from im12_card_2.jpg
|
||||
2025-08-05 19:04:29,279 - src.model.id_card_processor - INFO - Enhancing im12_card_2.jpg
|
||||
2025-08-05 19:04:29,284 - src.model.id_card_processor - INFO - Normalizing im12_card_2.jpg
|
||||
2025-08-05 19:04:29,289 - src.model.id_card_processor - INFO - Processed im12_card_2.jpg
|
||||
2025-08-05 19:04:29,290 - src.model.id_card_processor - INFO - Processing 6/42: im14_card_1.jpg
|
||||
2025-08-05 19:04:29,301 - src.model.id_card_processor - INFO - Removing background from im14_card_1.jpg
|
||||
2025-08-05 19:04:29,774 - src.model.id_card_processor - INFO - Enhancing im14_card_1.jpg
|
||||
2025-08-05 19:04:29,775 - src.model.id_card_processor - INFO - Normalizing im14_card_1.jpg
|
||||
2025-08-05 19:04:29,779 - src.model.id_card_processor - INFO - Processed im14_card_1.jpg
|
||||
2025-08-05 19:04:29,780 - src.model.id_card_processor - INFO - Processing 7/42: im15_card_1.jpg
|
||||
2025-08-05 19:04:29,791 - src.model.id_card_processor - INFO - Removing background from im15_card_1.jpg
|
||||
2025-08-05 19:04:30,009 - src.model.id_card_processor - INFO - Enhancing im15_card_1.jpg
|
||||
2025-08-05 19:04:30,010 - src.model.id_card_processor - INFO - Normalizing im15_card_1.jpg
|
||||
2025-08-05 19:04:30,015 - src.model.id_card_processor - INFO - Processed im15_card_1.jpg
|
||||
2025-08-05 19:04:30,015 - src.model.id_card_processor - INFO - Processing 8/42: im2_card_1.jpg
|
||||
2025-08-05 19:04:30,017 - src.model.id_card_processor - INFO - Removing background from im2_card_1.jpg
|
||||
2025-08-05 19:04:31,861 - src.model.id_card_processor - INFO - Enhancing im2_card_1.jpg
|
||||
2025-08-05 19:04:31,863 - src.model.id_card_processor - INFO - Normalizing im2_card_1.jpg
|
||||
2025-08-05 19:04:31,869 - src.model.id_card_processor - INFO - Processed im2_card_1.jpg
|
||||
2025-08-05 19:04:31,869 - src.model.id_card_processor - INFO - Processing 9/42: im2_card_2.jpg
|
||||
2025-08-05 19:04:31,884 - src.model.id_card_processor - INFO - Removing background from im2_card_2.jpg
|
||||
2025-08-05 19:04:38,985 - src.model.id_card_processor - INFO - Enhancing im2_card_2.jpg
|
||||
2025-08-05 19:04:38,996 - src.model.id_card_processor - INFO - Normalizing im2_card_2.jpg
|
||||
2025-08-05 19:04:39,007 - src.model.id_card_processor - INFO - Processed im2_card_2.jpg
|
||||
2025-08-05 19:04:39,008 - src.model.id_card_processor - INFO - Processing 10/42: im3_card_1.jpg
|
||||
2025-08-05 19:04:39,009 - src.model.id_card_processor - INFO - Removing background from im3_card_1.jpg
|
||||
2025-08-05 19:04:39,177 - src.model.id_card_processor - INFO - Enhancing im3_card_1.jpg
|
||||
2025-08-05 19:04:39,178 - src.model.id_card_processor - INFO - Normalizing im3_card_1.jpg
|
||||
2025-08-05 19:04:39,182 - src.model.id_card_processor - INFO - Processed im3_card_1.jpg
|
||||
2025-08-05 19:04:39,182 - src.model.id_card_processor - INFO - Processing 11/42: im4_card_1.jpg
|
||||
2025-08-05 19:04:39,184 - src.model.id_card_processor - INFO - Removing background from im4_card_1.jpg
|
||||
2025-08-05 19:04:39,374 - src.model.id_card_processor - INFO - Enhancing im4_card_1.jpg
|
||||
2025-08-05 19:04:39,375 - src.model.id_card_processor - INFO - Normalizing im4_card_1.jpg
|
||||
2025-08-05 19:04:39,379 - src.model.id_card_processor - INFO - Processed im4_card_1.jpg
|
||||
2025-08-05 19:04:39,379 - src.model.id_card_processor - INFO - Processing 12/42: im5_card_1.jpg
|
||||
2025-08-05 19:04:39,389 - src.model.id_card_processor - INFO - Removing background from im5_card_1.jpg
|
||||
2025-08-05 19:04:39,842 - src.model.id_card_processor - INFO - Enhancing im5_card_1.jpg
|
||||
2025-08-05 19:04:39,843 - src.model.id_card_processor - INFO - Normalizing im5_card_1.jpg
|
||||
2025-08-05 19:04:39,846 - src.model.id_card_processor - INFO - Processed im5_card_1.jpg
|
||||
2025-08-05 19:04:39,846 - src.model.id_card_processor - INFO - Processing 13/42: im5_card_2.jpg
|
||||
2025-08-05 19:04:39,859 - src.model.id_card_processor - INFO - Removing background from im5_card_2.jpg
|
||||
2025-08-05 19:04:42,430 - src.model.id_card_processor - INFO - Enhancing im5_card_2.jpg
|
||||
2025-08-05 19:04:42,434 - src.model.id_card_processor - INFO - Normalizing im5_card_2.jpg
|
||||
2025-08-05 19:04:42,438 - src.model.id_card_processor - INFO - Processed im5_card_2.jpg
|
||||
2025-08-05 19:04:42,439 - src.model.id_card_processor - INFO - Processing 14/42: im6_card_1.jpg
|
||||
2025-08-05 19:04:42,449 - src.model.id_card_processor - INFO - Removing background from im6_card_1.jpg
|
||||
2025-08-05 19:04:47,647 - src.model.id_card_processor - INFO - Enhancing im6_card_1.jpg
|
||||
2025-08-05 19:04:47,652 - src.model.id_card_processor - INFO - Normalizing im6_card_1.jpg
|
||||
2025-08-05 19:04:47,657 - src.model.id_card_processor - INFO - Processed im6_card_1.jpg
|
||||
2025-08-05 19:04:47,657 - src.model.id_card_processor - INFO - Processing 15/42: im6_card_2.jpg
|
||||
2025-08-05 19:04:47,680 - src.model.id_card_processor - INFO - Removing background from im6_card_2.jpg
|
234
id_card_processor_main.py
Normal file
234
id_card_processor_main.py
Normal file
@@ -0,0 +1,234 @@
|
||||
"""
|
||||
Main script for ID Card Processing with YOLO Detection
|
||||
"""
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
import logging
|
||||
|
||||
# Add src to path for imports
|
||||
sys.path.append(str(Path(__file__).parent / "src"))
|
||||
|
||||
from src.model.yolo_detector import YOLODetector
|
||||
from src.model.id_card_processor import IDCardProcessor
|
||||
from src.utils import setup_logging
|
||||
|
||||
def parse_arguments():
|
||||
"""Parse command line arguments"""
|
||||
parser = argparse.ArgumentParser(description="ID Card Processing with YOLO Detection")
|
||||
|
||||
parser.add_argument(
|
||||
"--input-dir",
|
||||
type=str,
|
||||
required=True,
|
||||
help="Input directory containing ID card images"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output-dir",
|
||||
type=str,
|
||||
default="data/processed_id_cards",
|
||||
help="Output directory for processed images"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--model-path",
|
||||
type=str,
|
||||
help="Path to custom YOLO model (.pt file)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--confidence",
|
||||
type=float,
|
||||
default=0.5,
|
||||
help="Confidence threshold for YOLO detection"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--detect-only",
|
||||
action="store_true",
|
||||
help="Only detect and crop ID cards, skip preprocessing"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--preprocess-only",
|
||||
action="store_true",
|
||||
help="Skip detection, directly preprocess images"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--bg-removal",
|
||||
type=str,
|
||||
default="grabcut",
|
||||
choices=["grabcut", "threshold", "contour", "none"],
|
||||
help="Background removal method"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--target-size",
|
||||
type=str,
|
||||
default="800x600",
|
||||
help="Target size for normalization (width x height)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--save-annotated",
|
||||
action="store_true",
|
||||
help="Save annotated images with bounding boxes"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--log-level",
|
||||
type=str,
|
||||
default="INFO",
|
||||
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
|
||||
help="Logging level"
|
||||
)
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
def parse_size(size_str: str) -> tuple:
|
||||
"""Parse size string like '800x600' to tuple (800, 600)"""
|
||||
try:
|
||||
width, height = map(int, size_str.split('x'))
|
||||
return (width, height)
|
||||
except ValueError:
|
||||
print(f"Invalid size format: {size_str}. Expected format: widthxheight")
|
||||
sys.exit(1)
|
||||
|
||||
def main():
|
||||
"""Main function"""
|
||||
args = parse_arguments()
|
||||
|
||||
# Setup logging
|
||||
logging_config = {"level": args.log_level}
|
||||
logger = setup_logging(logging_config.get("level", "INFO"))
|
||||
logger.info("Starting ID Card Processing")
|
||||
|
||||
# Parse paths
|
||||
input_dir = Path(args.input_dir)
|
||||
output_dir = Path(args.output_dir)
|
||||
|
||||
# Check if input directory exists
|
||||
if not input_dir.exists():
|
||||
logger.error(f"Input directory does not exist: {input_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
# Create output directory
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Parse target size
|
||||
target_size = parse_size(args.target_size)
|
||||
|
||||
# Initialize YOLO detector
|
||||
logger.info("Initializing YOLO detector...")
|
||||
yolo_detector = YOLODetector(
|
||||
model_path=args.model_path,
|
||||
confidence=args.confidence
|
||||
)
|
||||
|
||||
# Initialize ID card processor
|
||||
logger.info("Initializing ID card processor...")
|
||||
id_processor = IDCardProcessor(yolo_detector)
|
||||
|
||||
if args.detect_only:
|
||||
# Only detect and crop ID cards
|
||||
logger.info("Running YOLO detection only...")
|
||||
results = yolo_detector.batch_process(
|
||||
input_dir,
|
||||
output_dir / "cropped",
|
||||
save_annotated=args.save_annotated
|
||||
)
|
||||
|
||||
print("\n" + "="*50)
|
||||
print("YOLO DETECTION RESULTS")
|
||||
print("="*50)
|
||||
print(f"Total images: {results['total_images']}")
|
||||
print(f"Processed images: {results['processed_images']}")
|
||||
print(f"Total detections: {results['total_detections']}")
|
||||
print(f"Total cropped: {results['total_cropped']}")
|
||||
print(f"Output directory: {output_dir / 'cropped'}")
|
||||
print("="*50)
|
||||
|
||||
elif args.preprocess_only:
|
||||
# Skip detection, directly preprocess
|
||||
logger.info("Running preprocessing only...")
|
||||
results = id_processor.batch_process_id_cards(
|
||||
input_dir,
|
||||
output_dir / "processed",
|
||||
detect_first=False,
|
||||
remove_bg=args.bg_removal != "none",
|
||||
enhance=True,
|
||||
normalize=True,
|
||||
target_size=target_size
|
||||
)
|
||||
|
||||
print("\n" + "="*50)
|
||||
print("PREPROCESSING RESULTS")
|
||||
print("="*50)
|
||||
print(f"Total images: {results['total_images']}")
|
||||
print(f"Processed images: {results['processed_images']}")
|
||||
print(f"Output directory: {output_dir / 'processed'}")
|
||||
print("="*50)
|
||||
|
||||
else:
|
||||
# Full pipeline: detect + preprocess
|
||||
logger.info("Running full pipeline: detection + preprocessing...")
|
||||
|
||||
# Step 1: Detect and crop ID cards
|
||||
logger.info("Step 1: Detecting and cropping ID cards...")
|
||||
detection_results = yolo_detector.batch_process(
|
||||
input_dir,
|
||||
output_dir / "cropped",
|
||||
save_annotated=args.save_annotated
|
||||
)
|
||||
|
||||
# Step 2: Preprocess cropped images
|
||||
cropped_dir = output_dir / "cropped"
|
||||
if cropped_dir.exists():
|
||||
logger.info("Step 2: Preprocessing cropped ID cards...")
|
||||
preprocessing_results = id_processor.batch_process_id_cards(
|
||||
cropped_dir,
|
||||
output_dir / "processed",
|
||||
detect_first=False,
|
||||
remove_bg=args.bg_removal != "none",
|
||||
enhance=True,
|
||||
normalize=True,
|
||||
target_size=target_size
|
||||
)
|
||||
else:
|
||||
logger.warning("No cropped images found, preprocessing original images")
|
||||
preprocessing_results = id_processor.batch_process_id_cards(
|
||||
input_dir,
|
||||
output_dir / "processed",
|
||||
detect_first=False,
|
||||
remove_bg=args.bg_removal != "none",
|
||||
enhance=True,
|
||||
normalize=True,
|
||||
target_size=target_size
|
||||
)
|
||||
|
||||
# Print summary
|
||||
print("\n" + "="*50)
|
||||
print("FULL PIPELINE RESULTS")
|
||||
print("="*50)
|
||||
print("DETECTION PHASE:")
|
||||
print(f" - Total images: {detection_results['total_images']}")
|
||||
print(f" - Processed images: {detection_results['processed_images']}")
|
||||
print(f" - Total detections: {detection_results['total_detections']}")
|
||||
print(f" - Total cropped: {detection_results['total_cropped']}")
|
||||
print("\nPREPROCESSING PHASE:")
|
||||
print(f" - Total images: {preprocessing_results['total_images']}")
|
||||
print(f" - Processed images: {preprocessing_results['processed_images']}")
|
||||
print(f"\nOutput directories:")
|
||||
print(f" - Cropped images: {output_dir / 'cropped'}")
|
||||
print(f" - Processed images: {output_dir / 'processed'}")
|
||||
if args.save_annotated:
|
||||
print(f" - Annotated images: {output_dir / 'cropped'}")
|
||||
print("="*50)
|
||||
|
||||
logger.info("ID Card Processing completed successfully")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
276
main.py
Normal file
276
main.py
Normal file
@@ -0,0 +1,276 @@
|
||||
"""
|
||||
Main script for data augmentation
|
||||
"""
|
||||
import argparse
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any
|
||||
|
||||
# Add src to path for imports
|
||||
sys.path.append(str(Path(__file__).parent / "src"))
|
||||
|
||||
from src.config_manager import ConfigManager
|
||||
from src.data_augmentation import DataAugmentation
|
||||
from src.image_processor import ImageProcessor
|
||||
from src.utils import setup_logging, get_image_files, print_progress
|
||||
|
||||
def parse_arguments():
|
||||
"""Parse command line arguments"""
|
||||
parser = argparse.ArgumentParser(description="Image Data Augmentation Tool")
|
||||
|
||||
parser.add_argument(
|
||||
"--config",
|
||||
type=str,
|
||||
default="config/config.yaml",
|
||||
help="Path to configuration file"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--preset",
|
||||
type=str,
|
||||
help="Apply augmentation preset (light, medium, heavy, ocr_optimized, document)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--input-dir",
|
||||
type=str,
|
||||
help="Input directory containing images (overrides config)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--output-dir",
|
||||
type=str,
|
||||
help="Output directory for augmented images (overrides config)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--num-augmentations",
|
||||
type=int,
|
||||
help="Number of augmented versions per image (overrides config)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--target-size",
|
||||
type=str,
|
||||
help="Target size for images (width x height) (overrides config)"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--preview",
|
||||
action="store_true",
|
||||
help="Preview augmentation on first image only"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--info",
|
||||
action="store_true",
|
||||
help="Show information about images in input directory"
|
||||
)
|
||||
|
||||
|
||||
|
||||
parser.add_argument(
|
||||
"--list-presets",
|
||||
action="store_true",
|
||||
help="List available presets and exit"
|
||||
)
|
||||
|
||||
parser.add_argument(
|
||||
"--log-level",
|
||||
type=str,
|
||||
default="INFO",
|
||||
choices=["DEBUG", "INFO", "WARNING", "ERROR"],
|
||||
help="Logging level"
|
||||
)
|
||||
|
||||
return parser.parse_args()
|
||||
|
||||
def parse_range(range_str: str) -> tuple:
|
||||
"""Parse range string like '0.8-1.2' to tuple (0.8, 1.2)"""
|
||||
try:
|
||||
min_val, max_val = map(float, range_str.split('-'))
|
||||
return (min_val, max_val)
|
||||
except ValueError:
|
||||
print(f"Invalid range format: {range_str}. Expected format: min-max")
|
||||
sys.exit(1)
|
||||
|
||||
def parse_size(size_str: str) -> tuple:
|
||||
"""Parse size string like '224x224' to tuple (224, 224)"""
|
||||
try:
|
||||
width, height = map(int, size_str.split('x'))
|
||||
return (width, height)
|
||||
except ValueError:
|
||||
print(f"Invalid size format: {size_str}. Expected format: widthxheight")
|
||||
sys.exit(1)
|
||||
|
||||
def show_image_info(input_dir: Path):
|
||||
"""Show information about images in input directory"""
|
||||
image_files = get_image_files(input_dir)
|
||||
|
||||
if not image_files:
|
||||
print(f"No images found in {input_dir}")
|
||||
return
|
||||
|
||||
print(f"\nFound {len(image_files)} images in {input_dir}")
|
||||
print("\nImage Information:")
|
||||
print("-" * 80)
|
||||
|
||||
processor = ImageProcessor()
|
||||
total_size = 0
|
||||
|
||||
for i, image_path in enumerate(image_files[:10]): # Show first 10 images
|
||||
info = processor.get_image_info(image_path)
|
||||
if info:
|
||||
print(f"{i+1:2d}. {image_path.name}")
|
||||
print(f" Size: {info['width']}x{info['height']} pixels")
|
||||
print(f" Channels: {info['channels']}")
|
||||
print(f" File size: {info['file_size_mb']} MB")
|
||||
print(f" Format: {info['format']}")
|
||||
total_size += info['file_size_mb']
|
||||
|
||||
if len(image_files) > 10:
|
||||
print(f"\n... and {len(image_files) - 10} more images")
|
||||
|
||||
print(f"\nTotal file size: {total_size:.2f} MB")
|
||||
print(f"Average file size: {total_size/len(image_files):.2f} MB")
|
||||
|
||||
def preview_augmentation(input_dir: Path, output_dir: Path, config: Dict[str, Any]):
|
||||
"""Preview augmentation on first image"""
|
||||
image_files = get_image_files(input_dir)
|
||||
|
||||
if not image_files:
|
||||
print(f"No images found in {input_dir}")
|
||||
return
|
||||
|
||||
print(f"\nPreviewing augmentation on: {image_files[0].name}")
|
||||
|
||||
# Create augmentation instance
|
||||
augmenter = DataAugmentation(config)
|
||||
|
||||
# Augment first image
|
||||
augmented_paths = augmenter.augment_image_file(
|
||||
image_files[0],
|
||||
output_dir,
|
||||
num_augmentations=3
|
||||
)
|
||||
|
||||
if augmented_paths:
|
||||
print(f"Created {len(augmented_paths)} augmented versions:")
|
||||
for i, path in enumerate(augmented_paths, 1):
|
||||
print(f" {i}. {path.name}")
|
||||
else:
|
||||
print("Failed to create augmented images")
|
||||
|
||||
def main():
|
||||
"""Main function"""
|
||||
args = parse_arguments()
|
||||
|
||||
# Initialize config manager
|
||||
config_manager = ConfigManager(args.config)
|
||||
|
||||
# List presets if requested
|
||||
if args.list_presets:
|
||||
presets = config_manager.list_presets()
|
||||
print("\nAvailable presets:")
|
||||
for preset in presets:
|
||||
print(f" - {preset}")
|
||||
return
|
||||
|
||||
# Apply preset if specified
|
||||
if args.preset:
|
||||
if not config_manager.apply_preset(args.preset):
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
|
||||
# Override config with command line arguments
|
||||
if args.input_dir:
|
||||
config_manager.update_config({"paths": {"input_dir": args.input_dir}})
|
||||
|
||||
if args.output_dir:
|
||||
config_manager.update_config({"paths": {"output_dir": args.output_dir}})
|
||||
|
||||
if args.num_augmentations:
|
||||
config_manager.update_config({"processing": {"num_augmentations": args.num_augmentations}})
|
||||
|
||||
if args.target_size:
|
||||
target_size = parse_size(args.target_size)
|
||||
config_manager.update_config({"processing": {"target_size": list(target_size)}})
|
||||
|
||||
# Get configuration
|
||||
config = config_manager.get_config()
|
||||
paths_config = config_manager.get_paths_config()
|
||||
processing_config = config_manager.get_processing_config()
|
||||
augmentation_config = config_manager.get_augmentation_config()
|
||||
logging_config = config_manager.get_logging_config()
|
||||
|
||||
# Setup logging
|
||||
logger = setup_logging(logging_config.get("level", "INFO"))
|
||||
logger.info("Starting data augmentation process")
|
||||
|
||||
# Parse paths
|
||||
input_dir = Path(paths_config.get("input_dir", "data/dataset/training_data/images"))
|
||||
output_dir = Path(paths_config.get("output_dir", "data/augmented_data"))
|
||||
|
||||
# Check if input directory exists
|
||||
if not input_dir.exists():
|
||||
logger.error(f"Input directory does not exist: {input_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
# Create output directory
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Show image information if requested
|
||||
if args.info:
|
||||
show_image_info(input_dir)
|
||||
return
|
||||
|
||||
# Preview augmentation if requested
|
||||
if args.preview:
|
||||
preview_augmentation(input_dir, output_dir, augmentation_config)
|
||||
return
|
||||
|
||||
# Get image files
|
||||
image_files = get_image_files(input_dir)
|
||||
|
||||
if not image_files:
|
||||
logger.error(f"No images found in {input_dir}")
|
||||
sys.exit(1)
|
||||
|
||||
logger.info(f"Found {len(image_files)} images to process")
|
||||
logger.info(f"Output directory: {output_dir}")
|
||||
logger.info(f"Number of augmentations per image: {processing_config.get('num_augmentations', 3)}")
|
||||
logger.info(f"Target size: {processing_config.get('target_size', [224, 224])}")
|
||||
|
||||
# Create augmentation instance with new config
|
||||
augmenter = DataAugmentation(augmentation_config)
|
||||
|
||||
# Update target size
|
||||
target_size = tuple(processing_config.get("target_size", [224, 224]))
|
||||
augmenter.image_processor.target_size = target_size
|
||||
|
||||
# Perform batch augmentation
|
||||
logger.info("Starting batch augmentation...")
|
||||
results = augmenter.batch_augment(
|
||||
input_dir,
|
||||
output_dir,
|
||||
num_augmentations=processing_config.get("num_augmentations", 3)
|
||||
)
|
||||
|
||||
# Get and display summary
|
||||
summary = augmenter.get_augmentation_summary(results)
|
||||
|
||||
print("\n" + "="*50)
|
||||
print("AUGMENTATION SUMMARY")
|
||||
print("="*50)
|
||||
print(f"Original images: {summary['total_original_images']}")
|
||||
print(f"Augmented images: {summary['total_augmented_images']}")
|
||||
print(f"Augmentation ratio: {summary['augmentation_ratio']:.2f}")
|
||||
print(f"Successful augmentations: {summary['successful_augmentations']}")
|
||||
print(f"Output directory: {output_dir}")
|
||||
print("="*50)
|
||||
|
||||
logger.info("Data augmentation completed successfully")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
23
src/__init__.py
Normal file
23
src/__init__.py
Normal file
@@ -0,0 +1,23 @@
|
||||
"""
|
||||
Data Augmentation Package
|
||||
"""
|
||||
|
||||
__version__ = "1.0.0"
|
||||
__author__ = "OCR Data Augmentation Tool"
|
||||
|
||||
from .utils import *
|
||||
from .image_processor import ImageProcessor
|
||||
from .data_augmentation import DataAugmentation
|
||||
from .config_manager import ConfigManager
|
||||
|
||||
__all__ = [
|
||||
"ImageProcessor",
|
||||
"DataAugmentation",
|
||||
"ConfigManager",
|
||||
"setup_logging",
|
||||
"get_image_files",
|
||||
"load_image",
|
||||
"save_image",
|
||||
"validate_image",
|
||||
"print_progress",
|
||||
]
|
BIN
src/__pycache__/__init__.cpython-313.pyc
Normal file
BIN
src/__pycache__/__init__.cpython-313.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/__init__.cpython-39.pyc
Normal file
BIN
src/__pycache__/__init__.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/config_manager.cpython-39.pyc
Normal file
BIN
src/__pycache__/config_manager.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/data_augmentation.cpython-39.pyc
Normal file
BIN
src/__pycache__/data_augmentation.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/image_processor.cpython-39.pyc
Normal file
BIN
src/__pycache__/image_processor.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/utils.cpython-313.pyc
Normal file
BIN
src/__pycache__/utils.cpython-313.pyc
Normal file
Binary file not shown.
BIN
src/__pycache__/utils.cpython-39.pyc
Normal file
BIN
src/__pycache__/utils.cpython-39.pyc
Normal file
Binary file not shown.
40
src/config.py
Normal file
40
src/config.py
Normal file
@@ -0,0 +1,40 @@
|
||||
"""
|
||||
Configuration file for data augmentation
|
||||
"""
|
||||
import os
|
||||
from pathlib import Path
|
||||
|
||||
# Paths
|
||||
BASE_DIR = Path(__file__).parent.parent
|
||||
DATA_DIR = BASE_DIR / "data"
|
||||
INPUT_IMAGES_DIR = DATA_DIR / "dataset" / "training_data" / "images"
|
||||
OUTPUT_DIR = DATA_DIR / "augmented_data"
|
||||
|
||||
# Data augmentation parameters
|
||||
AUGMENTATION_CONFIG = {
|
||||
"rotation_range": 15, # degrees
|
||||
"width_shift_range": 0.1, # fraction of total width
|
||||
"height_shift_range": 0.1, # fraction of total height
|
||||
"brightness_range": [0.8, 1.2], # brightness factor
|
||||
"zoom_range": [0.9, 1.1], # zoom factor
|
||||
"horizontal_flip": True,
|
||||
"vertical_flip": False,
|
||||
"fill_mode": "nearest",
|
||||
"cval": 0,
|
||||
"rescale": 1./255,
|
||||
}
|
||||
|
||||
# Processing parameters
|
||||
PROCESSING_CONFIG = {
|
||||
"target_size": (224, 224), # (width, height)
|
||||
"batch_size": 32,
|
||||
"num_augmentations": 3, # number of augmented versions per image
|
||||
"save_format": "jpg",
|
||||
"quality": 95,
|
||||
}
|
||||
|
||||
# Supported image formats
|
||||
SUPPORTED_FORMATS = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff']
|
||||
|
||||
# Create output directory if it doesn't exist
|
||||
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
175
src/config_manager.py
Normal file
175
src/config_manager.py
Normal file
@@ -0,0 +1,175 @@
|
||||
"""
|
||||
Configuration manager for data augmentation
|
||||
"""
|
||||
import yaml
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Dict, Any, Optional, Union
|
||||
|
||||
class ConfigManager:
|
||||
"""Manages configuration loading and validation"""
|
||||
|
||||
def __init__(self, config_path: Optional[Union[str, Path]] = None):
|
||||
"""
|
||||
Initialize ConfigManager
|
||||
|
||||
Args:
|
||||
config_path: Path to main config file
|
||||
"""
|
||||
self.config_path = Path(config_path) if config_path else Path("config/config.yaml")
|
||||
self.config = {}
|
||||
|
||||
self._load_config()
|
||||
|
||||
def _load_config(self):
|
||||
"""Load main configuration file"""
|
||||
try:
|
||||
if self.config_path.exists():
|
||||
with open(self.config_path, 'r', encoding='utf-8') as f:
|
||||
self.config = yaml.safe_load(f)
|
||||
print(f"✅ Loaded configuration from {self.config_path}")
|
||||
else:
|
||||
print(f"⚠️ Config file not found: {self.config_path}")
|
||||
self.config = self._get_default_config()
|
||||
except Exception as e:
|
||||
print(f"❌ Error loading config: {e}")
|
||||
self.config = self._get_default_config()
|
||||
|
||||
def _get_default_config(self) -> Dict[str, Any]:
|
||||
"""Get default configuration"""
|
||||
return {
|
||||
"paths": {
|
||||
"input_dir": "data/dataset/training_data/images",
|
||||
"output_dir": "data/augmented_data",
|
||||
"log_file": "logs/data_augmentation.log"
|
||||
},
|
||||
"augmentation": {
|
||||
"rotation": {"enabled": True, "angles": [30, 60, 120, 150, 180, 210, 240, 300, 330], "probability": 1.0}
|
||||
},
|
||||
"processing": {
|
||||
"target_size": [224, 224],
|
||||
"batch_size": 32,
|
||||
"num_augmentations": 3,
|
||||
"save_format": "jpg",
|
||||
"quality": 95
|
||||
},
|
||||
"supported_formats": [".jpg", ".jpeg", ".png", ".bmp", ".tiff"],
|
||||
"logging": {
|
||||
"level": "INFO",
|
||||
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
|
||||
},
|
||||
"performance": {
|
||||
"num_workers": 4,
|
||||
"prefetch_factor": 2,
|
||||
"pin_memory": True,
|
||||
"use_gpu": False
|
||||
}
|
||||
}
|
||||
|
||||
def get_config(self) -> Dict[str, Any]:
|
||||
"""Get current configuration"""
|
||||
return self.config
|
||||
|
||||
def get_augmentation_config(self) -> Dict[str, Any]:
|
||||
"""Get augmentation configuration"""
|
||||
return self.config.get("augmentation", {})
|
||||
|
||||
def get_processing_config(self) -> Dict[str, Any]:
|
||||
"""Get processing configuration"""
|
||||
return self.config.get("processing", {})
|
||||
|
||||
def get_paths_config(self) -> Dict[str, Any]:
|
||||
"""Get paths configuration"""
|
||||
return self.config.get("paths", {})
|
||||
|
||||
def get_logging_config(self) -> Dict[str, Any]:
|
||||
"""Get logging configuration"""
|
||||
return self.config.get("logging", {})
|
||||
|
||||
def get_performance_config(self) -> Dict[str, Any]:
|
||||
"""Get performance configuration"""
|
||||
return self.config.get("performance", {})
|
||||
|
||||
|
||||
|
||||
def update_config(self, updates: Dict[str, Any]) -> bool:
|
||||
"""
|
||||
Update configuration with new values
|
||||
|
||||
Args:
|
||||
updates: Dictionary with updates to apply
|
||||
|
||||
Returns:
|
||||
True if updated successfully
|
||||
"""
|
||||
try:
|
||||
self.config = self._merge_configs(self.config, updates)
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"❌ Error updating config: {e}")
|
||||
return False
|
||||
|
||||
def _merge_configs(self, base_config: Dict[str, Any], updates: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""Merge updates with base configuration"""
|
||||
merged = base_config.copy()
|
||||
|
||||
def deep_merge(base: Dict[str, Any], update: Dict[str, Any]) -> Dict[str, Any]:
|
||||
result = base.copy()
|
||||
for key, value in update.items():
|
||||
if key in result and isinstance(result[key], dict) and isinstance(value, dict):
|
||||
result[key] = deep_merge(result[key], value)
|
||||
else:
|
||||
result[key] = value
|
||||
return result
|
||||
|
||||
return deep_merge(merged, updates)
|
||||
|
||||
def save_config(self, output_path: Optional[Union[str, Path]] = None) -> bool:
|
||||
"""
|
||||
Save current configuration to file
|
||||
|
||||
Args:
|
||||
output_path: Path to save config file
|
||||
|
||||
Returns:
|
||||
True if saved successfully
|
||||
"""
|
||||
try:
|
||||
output_path = Path(output_path) if output_path else self.config_path
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
with open(output_path, 'w', encoding='utf-8') as f:
|
||||
yaml.dump(self.config, f, default_flow_style=False, indent=2, allow_unicode=True)
|
||||
|
||||
print(f"✅ Configuration saved to {output_path}")
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"❌ Error saving config: {e}")
|
||||
return False
|
||||
|
||||
def print_config_summary(self):
|
||||
"""Print configuration summary"""
|
||||
print("\n" + "="*50)
|
||||
print("CONFIGURATION SUMMARY")
|
||||
print("="*50)
|
||||
|
||||
# Paths
|
||||
paths = self.get_paths_config()
|
||||
print(f"Input directory: {paths.get('input_dir', 'Not set')}")
|
||||
print(f"Output directory: {paths.get('output_dir', 'Not set')}")
|
||||
|
||||
# Processing
|
||||
processing = self.get_processing_config()
|
||||
print(f"Target size: {processing.get('target_size', 'Not set')}")
|
||||
print(f"Number of augmentations: {processing.get('num_augmentations', 'Not set')}")
|
||||
|
||||
# Augmentation
|
||||
augmentation = self.get_augmentation_config()
|
||||
enabled_augmentations = []
|
||||
for name, config in augmentation.items():
|
||||
if isinstance(config, dict) and config.get('enabled', False):
|
||||
enabled_augmentations.append(name)
|
||||
|
||||
print(f"Enabled augmentations: {', '.join(enabled_augmentations) if enabled_augmentations else 'None'}")
|
||||
|
||||
print("="*50)
|
161
src/data_augmentation.py
Normal file
161
src/data_augmentation.py
Normal file
@@ -0,0 +1,161 @@
|
||||
"""
|
||||
Data augmentation class for image augmentation - ONLY ROTATION
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from typing import List, Tuple, Optional, Dict, Any
|
||||
import random
|
||||
import math
|
||||
from image_processor import ImageProcessor
|
||||
from utils import load_image, save_image, create_augmented_filename, print_progress
|
||||
|
||||
class DataAugmentation:
|
||||
"""Class for image data augmentation - ONLY ROTATION"""
|
||||
|
||||
def __init__(self, config: Dict[str, Any] = None):
|
||||
"""
|
||||
Initialize DataAugmentation
|
||||
|
||||
Args:
|
||||
config: Configuration dictionary for augmentation parameters
|
||||
"""
|
||||
self.config = config or {}
|
||||
self.image_processor = ImageProcessor()
|
||||
|
||||
def rotate_image(self, image: np.ndarray, angle: float) -> np.ndarray:
|
||||
"""
|
||||
Rotate image by given angle
|
||||
|
||||
Args:
|
||||
image: Input image
|
||||
angle: Rotation angle in degrees
|
||||
|
||||
Returns:
|
||||
Rotated image
|
||||
"""
|
||||
height, width = image.shape[:2]
|
||||
center = (width // 2, height // 2)
|
||||
|
||||
# Create rotation matrix
|
||||
rotation_matrix = cv2.getRotationMatrix2D(center, angle, 1.0)
|
||||
|
||||
# Perform rotation
|
||||
rotated = cv2.warpAffine(image, rotation_matrix, (width, height),
|
||||
borderMode=cv2.BORDER_REPLICATE)
|
||||
|
||||
return rotated
|
||||
|
||||
def augment_single_image(self, image: np.ndarray, num_augmentations: int = None) -> List[np.ndarray]:
|
||||
"""
|
||||
Apply rotation augmentation to a single image
|
||||
|
||||
Args:
|
||||
image: Input image
|
||||
num_augmentations: Number of augmented versions to create
|
||||
|
||||
Returns:
|
||||
List of augmented images
|
||||
"""
|
||||
num_augmentations = num_augmentations or 3 # Default value
|
||||
augmented_images = []
|
||||
|
||||
# Get rotation configuration
|
||||
rotation_config = self.config.get("rotation", {})
|
||||
angles = rotation_config.get("angles", [30, 60, 120, 150, 180, 210, 240, 300, 330])
|
||||
|
||||
for i in range(num_augmentations):
|
||||
augmented = image.copy()
|
||||
|
||||
# Apply rotation with random angle from the specified list
|
||||
if rotation_config.get("enabled", False):
|
||||
angle = random.choice(angles)
|
||||
augmented = self.rotate_image(augmented, angle)
|
||||
|
||||
augmented_images.append(augmented)
|
||||
|
||||
return augmented_images
|
||||
|
||||
def augment_image_file(self, image_path: Path, output_dir: Path, num_augmentations: int = None) -> List[Path]:
|
||||
"""
|
||||
Augment a single image file and save results
|
||||
|
||||
Args:
|
||||
image_path: Path to input image
|
||||
output_dir: Output directory for augmented images
|
||||
num_augmentations: Number of augmented versions to create
|
||||
|
||||
Returns:
|
||||
List of paths to saved augmented images
|
||||
"""
|
||||
# Load image
|
||||
image = load_image(image_path, self.image_processor.target_size)
|
||||
if image is None:
|
||||
return []
|
||||
|
||||
# Apply augmentations
|
||||
augmented_images = self.augment_single_image(image, num_augmentations)
|
||||
|
||||
# Save augmented images
|
||||
saved_paths = []
|
||||
for i, aug_image in enumerate(augmented_images):
|
||||
# Create output filename
|
||||
output_filename = create_augmented_filename(image_path, i + 1)
|
||||
output_path = output_dir / output_filename.name
|
||||
|
||||
# Save image
|
||||
if save_image(aug_image, output_path):
|
||||
saved_paths.append(output_path)
|
||||
|
||||
return saved_paths
|
||||
|
||||
def batch_augment(self, input_dir: Path, output_dir: Path, num_augmentations: int = None) -> Dict[str, List[Path]]:
|
||||
"""
|
||||
Augment all images in a directory
|
||||
|
||||
Args:
|
||||
input_dir: Input directory containing images
|
||||
output_dir: Output directory for augmented images
|
||||
num_augmentations: Number of augmented versions per image
|
||||
|
||||
Returns:
|
||||
Dictionary mapping original images to their augmented versions
|
||||
"""
|
||||
from utils import get_image_files
|
||||
|
||||
image_files = get_image_files(input_dir)
|
||||
results = {}
|
||||
|
||||
print(f"Found {len(image_files)} images to augment")
|
||||
|
||||
for i, image_path in enumerate(image_files):
|
||||
print_progress(i + 1, len(image_files), "Augmenting images")
|
||||
|
||||
# Augment single image
|
||||
augmented_paths = self.augment_image_file(image_path, output_dir, num_augmentations)
|
||||
|
||||
if augmented_paths:
|
||||
results[str(image_path)] = augmented_paths
|
||||
|
||||
print(f"\nAugmented {len(results)} images successfully")
|
||||
return results
|
||||
|
||||
def get_augmentation_summary(self, results: Dict[str, List[Path]]) -> Dict[str, Any]:
|
||||
"""
|
||||
Get summary of augmentation results
|
||||
|
||||
Args:
|
||||
results: Results from batch_augment
|
||||
|
||||
Returns:
|
||||
Summary dictionary
|
||||
"""
|
||||
total_original = len(results)
|
||||
total_augmented = sum(len(paths) for paths in results.values())
|
||||
|
||||
return {
|
||||
"total_original_images": total_original,
|
||||
"total_augmented_images": total_augmented,
|
||||
"augmentation_ratio": total_augmented / total_original if total_original > 0 else 0,
|
||||
"successful_augmentations": len([paths for paths in results.values() if paths])
|
||||
}
|
174
src/image_processor.py
Normal file
174
src/image_processor.py
Normal file
@@ -0,0 +1,174 @@
|
||||
"""
|
||||
Image processing class for basic image operations
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from typing import Tuple, Optional, List
|
||||
from utils import load_image, save_image, validate_image, get_image_files
|
||||
|
||||
class ImageProcessor:
|
||||
"""Class for basic image processing operations"""
|
||||
|
||||
def __init__(self, target_size: Tuple[int, int] = None):
|
||||
"""
|
||||
Initialize ImageProcessor
|
||||
|
||||
Args:
|
||||
target_size: Target size for image resizing (width, height)
|
||||
"""
|
||||
self.target_size = target_size or (224, 224) # Default size
|
||||
|
||||
def load_and_preprocess(self, image_path: Path) -> Optional[np.ndarray]:
|
||||
"""
|
||||
Load and preprocess image
|
||||
|
||||
Args:
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
Preprocessed image as numpy array or None if failed
|
||||
"""
|
||||
if not validate_image(image_path):
|
||||
print(f"Invalid image file: {image_path}")
|
||||
return None
|
||||
|
||||
image = load_image(image_path, self.target_size)
|
||||
if image is None:
|
||||
return None
|
||||
|
||||
# Normalize pixel values
|
||||
image = image.astype(np.float32) / 255.0
|
||||
|
||||
return image
|
||||
|
||||
def resize_image(self, image: np.ndarray, target_size: Tuple[int, int]) -> np.ndarray:
|
||||
"""
|
||||
Resize image to target size
|
||||
|
||||
Args:
|
||||
image: Input image as numpy array
|
||||
target_size: Target size (width, height)
|
||||
|
||||
Returns:
|
||||
Resized image
|
||||
"""
|
||||
return cv2.resize(image, target_size, interpolation=cv2.INTER_AREA)
|
||||
|
||||
def normalize_image(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Normalize image pixel values to [0, 1]
|
||||
|
||||
Args:
|
||||
image: Input image
|
||||
|
||||
Returns:
|
||||
Normalized image
|
||||
"""
|
||||
return image.astype(np.float32) / 255.0
|
||||
|
||||
def denormalize_image(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Denormalize image pixel values to [0, 255]
|
||||
|
||||
Args:
|
||||
image: Input image (normalized)
|
||||
|
||||
Returns:
|
||||
Denormalized image
|
||||
"""
|
||||
return (image * 255).astype(np.uint8)
|
||||
|
||||
def get_image_info(self, image_path: Path) -> dict:
|
||||
"""
|
||||
Get information about image
|
||||
|
||||
Args:
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
Dictionary containing image information
|
||||
"""
|
||||
try:
|
||||
image = cv2.imread(str(image_path))
|
||||
if image is None:
|
||||
return {}
|
||||
|
||||
height, width, channels = image.shape
|
||||
file_size = image_path.stat().st_size / (1024 * 1024) # MB
|
||||
|
||||
return {
|
||||
"path": str(image_path),
|
||||
"width": width,
|
||||
"height": height,
|
||||
"channels": channels,
|
||||
"file_size_mb": round(file_size, 2),
|
||||
"format": image_path.suffix
|
||||
}
|
||||
except Exception as e:
|
||||
print(f"Error getting image info for {image_path}: {e}")
|
||||
return {}
|
||||
|
||||
def batch_process_images(self, input_dir: Path, output_dir: Path) -> List[Path]:
|
||||
"""
|
||||
Process all images in a directory
|
||||
|
||||
Args:
|
||||
input_dir: Input directory containing images
|
||||
output_dir: Output directory for processed images
|
||||
|
||||
Returns:
|
||||
List of processed image paths
|
||||
"""
|
||||
image_files = get_image_files(input_dir)
|
||||
processed_files = []
|
||||
|
||||
print(f"Found {len(image_files)} images to process")
|
||||
|
||||
for i, image_path in enumerate(image_files):
|
||||
print_progress(i + 1, len(image_files), "Processing images")
|
||||
|
||||
# Load and preprocess image
|
||||
image = self.load_and_preprocess(image_path)
|
||||
if image is None:
|
||||
continue
|
||||
|
||||
# Create output path
|
||||
output_path = output_dir / image_path.name
|
||||
|
||||
# Denormalize for saving
|
||||
image = self.denormalize_image(image)
|
||||
|
||||
# Save processed image
|
||||
if save_image(image, output_path):
|
||||
processed_files.append(output_path)
|
||||
|
||||
print(f"\nProcessed {len(processed_files)} images successfully")
|
||||
return processed_files
|
||||
|
||||
def create_thumbnail(self, image: np.ndarray, size: Tuple[int, int] = (100, 100)) -> np.ndarray:
|
||||
"""
|
||||
Create thumbnail of image
|
||||
|
||||
Args:
|
||||
image: Input image
|
||||
size: Thumbnail size (width, height)
|
||||
|
||||
Returns:
|
||||
Thumbnail image
|
||||
"""
|
||||
return cv2.resize(image, size, interpolation=cv2.INTER_AREA)
|
||||
|
||||
def convert_to_grayscale(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Convert image to grayscale
|
||||
|
||||
Args:
|
||||
image: Input image (RGB)
|
||||
|
||||
Returns:
|
||||
Grayscale image
|
||||
"""
|
||||
if len(image.shape) == 3:
|
||||
return cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
|
||||
return image
|
8
src/model/__init__.py
Normal file
8
src/model/__init__.py
Normal file
@@ -0,0 +1,8 @@
|
||||
"""
|
||||
Model module for YOLO-based ID card detection and cropping
|
||||
"""
|
||||
|
||||
from .yolo_detector import YOLODetector
|
||||
from .id_card_processor import IDCardProcessor
|
||||
|
||||
__all__ = ['YOLODetector', 'IDCardProcessor']
|
BIN
src/model/__pycache__/__init__.cpython-39.pyc
Normal file
BIN
src/model/__pycache__/__init__.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/model/__pycache__/id_card_processor.cpython-39.pyc
Normal file
BIN
src/model/__pycache__/id_card_processor.cpython-39.pyc
Normal file
Binary file not shown.
BIN
src/model/__pycache__/yolo_detector.cpython-39.pyc
Normal file
BIN
src/model/__pycache__/yolo_detector.cpython-39.pyc
Normal file
Binary file not shown.
343
src/model/id_card_processor.py
Normal file
343
src/model/id_card_processor.py
Normal file
@@ -0,0 +1,343 @@
|
||||
"""
|
||||
ID Card Processor for background removal and preprocessing
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from typing import List, Optional, Dict, Any, Tuple
|
||||
import logging
|
||||
from .yolo_detector import YOLODetector
|
||||
|
||||
class IDCardProcessor:
|
||||
"""
|
||||
ID Card Processor for background removal and preprocessing
|
||||
"""
|
||||
|
||||
def __init__(self, yolo_detector: Optional[YOLODetector] = None):
|
||||
"""
|
||||
Initialize ID Card Processor
|
||||
|
||||
Args:
|
||||
yolo_detector: YOLO detector instance
|
||||
"""
|
||||
self.yolo_detector = yolo_detector or YOLODetector()
|
||||
self.logger = logging.getLogger(__name__)
|
||||
|
||||
def remove_background(self, image: np.ndarray, method: str = 'grabcut') -> np.ndarray:
|
||||
"""
|
||||
Remove background from image
|
||||
|
||||
Args:
|
||||
image: Input image
|
||||
method: Background removal method ('grabcut', 'threshold', 'contour')
|
||||
|
||||
Returns:
|
||||
Image with background removed
|
||||
"""
|
||||
if method == 'grabcut':
|
||||
return self._grabcut_background_removal(image)
|
||||
elif method == 'threshold':
|
||||
return self._threshold_background_removal(image)
|
||||
elif method == 'contour':
|
||||
return self._contour_background_removal(image)
|
||||
else:
|
||||
self.logger.warning(f"Unknown method: {method}, using grabcut")
|
||||
return self._grabcut_background_removal(image)
|
||||
|
||||
def _grabcut_background_removal(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Remove background using GrabCut algorithm
|
||||
"""
|
||||
try:
|
||||
# Create mask
|
||||
mask = np.zeros(image.shape[:2], np.uint8)
|
||||
|
||||
# Create temporary arrays
|
||||
bgd_model = np.zeros((1, 65), np.float64)
|
||||
fgd_model = np.zeros((1, 65), np.float64)
|
||||
|
||||
# Define rectangle (assuming ID card is in center)
|
||||
height, width = image.shape[:2]
|
||||
rect = (width//8, height//8, width*3//4, height*3//4)
|
||||
|
||||
# Apply GrabCut
|
||||
cv2.grabCut(image, mask, rect, bgd_model, fgd_model, 5, cv2.GC_INIT_WITH_RECT)
|
||||
|
||||
# Create mask
|
||||
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')
|
||||
|
||||
# Apply mask
|
||||
result = image * mask2[:, :, np.newaxis]
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error in grabcut background removal: {e}")
|
||||
return image
|
||||
|
||||
def _threshold_background_removal(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Remove background using thresholding
|
||||
"""
|
||||
try:
|
||||
# Convert to grayscale
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Apply Gaussian blur
|
||||
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
|
||||
|
||||
# Apply threshold
|
||||
_, thresh = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
|
||||
|
||||
# Find contours
|
||||
contours, _ = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
# Find largest contour (assumed to be the ID card)
|
||||
if contours:
|
||||
largest_contour = max(contours, key=cv2.contourArea)
|
||||
|
||||
# Create mask
|
||||
mask = np.zeros_like(gray)
|
||||
cv2.fillPoly(mask, [largest_contour], 255)
|
||||
|
||||
# Apply mask
|
||||
result = cv2.bitwise_and(image, image, mask=mask)
|
||||
return result
|
||||
|
||||
return image
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error in threshold background removal: {e}")
|
||||
return image
|
||||
|
||||
def _contour_background_removal(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Remove background using contour detection
|
||||
"""
|
||||
try:
|
||||
# Convert to grayscale
|
||||
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
|
||||
|
||||
# Apply edge detection
|
||||
edges = cv2.Canny(gray, 50, 150)
|
||||
|
||||
# Find contours
|
||||
contours, _ = cv2.findContours(edges, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
|
||||
|
||||
# Find largest contour
|
||||
if contours:
|
||||
largest_contour = max(contours, key=cv2.contourArea)
|
||||
|
||||
# Approximate contour to get rectangle
|
||||
epsilon = 0.02 * cv2.arcLength(largest_contour, True)
|
||||
approx = cv2.approxPolyDP(largest_contour, epsilon, True)
|
||||
|
||||
# Create mask
|
||||
mask = np.zeros_like(gray)
|
||||
cv2.fillPoly(mask, [approx], 255)
|
||||
|
||||
# Apply mask
|
||||
result = cv2.bitwise_and(image, image, mask=mask)
|
||||
return result
|
||||
|
||||
return image
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error in contour background removal: {e}")
|
||||
return image
|
||||
|
||||
def enhance_image(self, image: np.ndarray) -> np.ndarray:
|
||||
"""
|
||||
Enhance image quality for better OCR
|
||||
"""
|
||||
try:
|
||||
# Convert to LAB color space
|
||||
lab = cv2.cvtColor(image, cv2.COLOR_BGR2LAB)
|
||||
|
||||
# Apply CLAHE to L channel
|
||||
clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))
|
||||
lab[:, :, 0] = clahe.apply(lab[:, :, 0])
|
||||
|
||||
# Convert back to BGR
|
||||
enhanced = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
|
||||
|
||||
# Apply slight Gaussian blur to reduce noise
|
||||
enhanced = cv2.GaussianBlur(enhanced, (3, 3), 0)
|
||||
|
||||
return enhanced
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error enhancing image: {e}")
|
||||
return image
|
||||
|
||||
def normalize_image(self, image: np.ndarray, target_size: Tuple[int, int] = (800, 600)) -> np.ndarray:
|
||||
"""
|
||||
Normalize image size and orientation
|
||||
"""
|
||||
try:
|
||||
# Resize image
|
||||
resized = cv2.resize(image, target_size, interpolation=cv2.INTER_AREA)
|
||||
|
||||
# Convert to grayscale if needed
|
||||
if len(resized.shape) == 3:
|
||||
gray = cv2.cvtColor(resized, cv2.COLOR_BGR2GRAY)
|
||||
else:
|
||||
gray = resized
|
||||
|
||||
# Apply histogram equalization
|
||||
equalized = cv2.equalizeHist(gray)
|
||||
|
||||
# Convert back to BGR for consistency
|
||||
if len(image.shape) == 3:
|
||||
result = cv2.cvtColor(equalized, cv2.COLOR_GRAY2BGR)
|
||||
else:
|
||||
result = equalized
|
||||
|
||||
return result
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error normalizing image: {e}")
|
||||
return image
|
||||
|
||||
def process_id_card(self, image_path: Path, output_dir: Path,
|
||||
remove_bg: bool = True, enhance: bool = True,
|
||||
normalize: bool = True, target_size: Tuple[int, int] = (800, 600)) -> Dict[str, Any]:
|
||||
"""
|
||||
Process a single ID card image
|
||||
|
||||
Args:
|
||||
image_path: Path to input image
|
||||
output_dir: Output directory
|
||||
remove_bg: Whether to remove background
|
||||
enhance: Whether to enhance image
|
||||
normalize: Whether to normalize image
|
||||
target_size: Target size for normalization
|
||||
|
||||
Returns:
|
||||
Processing results
|
||||
"""
|
||||
result = {
|
||||
'input_path': str(image_path),
|
||||
'output_paths': [],
|
||||
'success': False
|
||||
}
|
||||
|
||||
try:
|
||||
# Load image
|
||||
image = cv2.imread(str(image_path))
|
||||
if image is None:
|
||||
self.logger.error(f"Could not load image: {image_path}")
|
||||
return result
|
||||
|
||||
# Create output filename
|
||||
stem = image_path.stem
|
||||
processed_path = output_dir / f"{stem}_processed.jpg"
|
||||
|
||||
# Apply processing steps
|
||||
processed_image = image.copy()
|
||||
|
||||
if remove_bg:
|
||||
self.logger.info(f"Removing background from {image_path.name}")
|
||||
processed_image = self.remove_background(processed_image)
|
||||
|
||||
if enhance:
|
||||
self.logger.info(f"Enhancing {image_path.name}")
|
||||
processed_image = self.enhance_image(processed_image)
|
||||
|
||||
if normalize:
|
||||
self.logger.info(f"Normalizing {image_path.name}")
|
||||
processed_image = self.normalize_image(processed_image, target_size)
|
||||
|
||||
# Save processed image
|
||||
processed_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
cv2.imwrite(str(processed_path), processed_image)
|
||||
result['output_paths'].append(str(processed_path))
|
||||
|
||||
result['success'] = True
|
||||
self.logger.info(f"Processed {image_path.name}")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error processing {image_path}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
def batch_process_id_cards(self, input_dir: Path, output_dir: Path,
|
||||
detect_first: bool = True, **kwargs) -> Dict[str, Any]:
|
||||
"""
|
||||
Process all ID card images in a directory
|
||||
|
||||
Args:
|
||||
input_dir: Input directory
|
||||
output_dir: Output directory
|
||||
detect_first: Whether to detect ID cards first using YOLO
|
||||
**kwargs: Additional arguments for processing
|
||||
|
||||
Returns:
|
||||
Batch processing results
|
||||
"""
|
||||
# Create output directory
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
if detect_first:
|
||||
# First detect and crop ID cards
|
||||
self.logger.info("Detecting and cropping ID cards...")
|
||||
detection_results = self.yolo_detector.batch_process(input_dir, output_dir / "cropped")
|
||||
|
||||
# Process cropped images
|
||||
cropped_dir = output_dir / "cropped"
|
||||
if cropped_dir.exists():
|
||||
self.logger.info("Processing cropped ID cards...")
|
||||
return self._process_cropped_images(cropped_dir, output_dir / "processed", **kwargs)
|
||||
else:
|
||||
self.logger.warning("No cropped images found, processing original images")
|
||||
return self._process_cropped_images(input_dir, output_dir / "processed", **kwargs)
|
||||
else:
|
||||
# Process original images directly
|
||||
return self._process_cropped_images(input_dir, output_dir / "processed", **kwargs)
|
||||
|
||||
def _process_cropped_images(self, input_dir: Path, output_dir: Path, **kwargs) -> Dict[str, Any]:
|
||||
"""
|
||||
Process cropped ID card images recursively
|
||||
"""
|
||||
# Get all image files recursively from input directory and subdirectories
|
||||
image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff'}
|
||||
image_files = []
|
||||
|
||||
# Recursively find all image files
|
||||
for file_path in input_dir.rglob('*'):
|
||||
if file_path.is_file() and file_path.suffix.lower() in image_extensions:
|
||||
image_files.append(file_path)
|
||||
|
||||
if not image_files:
|
||||
self.logger.error(f"No images found in {input_dir} and subdirectories")
|
||||
return {'success': False, 'error': 'No images found'}
|
||||
|
||||
self.logger.info(f"Processing {len(image_files)} images from {input_dir} and subdirectories")
|
||||
|
||||
results = {
|
||||
'total_images': len(image_files),
|
||||
'processed_images': 0,
|
||||
'results': []
|
||||
}
|
||||
|
||||
# Process each image
|
||||
for i, image_path in enumerate(image_files):
|
||||
self.logger.info(f"Processing {i+1}/{len(image_files)}: {image_path.name}")
|
||||
|
||||
# Create subdirectory structure in output to match input structure
|
||||
relative_path = image_path.relative_to(input_dir)
|
||||
output_subdir = output_dir / relative_path.parent
|
||||
output_subdir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
result = self.process_id_card(image_path, output_subdir, **kwargs)
|
||||
results['results'].append(result)
|
||||
|
||||
if result['success']:
|
||||
results['processed_images'] += 1
|
||||
|
||||
# Summary
|
||||
self.logger.info(f"ID card processing completed:")
|
||||
self.logger.info(f" - Total images: {results['total_images']}")
|
||||
self.logger.info(f" - Processed: {results['processed_images']}")
|
||||
|
||||
return results
|
266
src/model/yolo_detector.py
Normal file
266
src/model/yolo_detector.py
Normal file
@@ -0,0 +1,266 @@
|
||||
"""
|
||||
YOLO Detector for ID Card Detection and Cropping
|
||||
"""
|
||||
import cv2
|
||||
import numpy as np
|
||||
from pathlib import Path
|
||||
from typing import List, Tuple, Optional, Dict, Any
|
||||
import logging
|
||||
from ultralytics import YOLO
|
||||
import torch
|
||||
|
||||
class YOLODetector:
|
||||
"""
|
||||
YOLO-based detector for ID card detection and cropping
|
||||
"""
|
||||
|
||||
def __init__(self, model_path: Optional[str] = None, confidence: float = 0.5):
|
||||
"""
|
||||
Initialize YOLO detector
|
||||
|
||||
Args:
|
||||
model_path: Path to YOLO model file (.pt)
|
||||
confidence: Confidence threshold for detection
|
||||
"""
|
||||
self.confidence = confidence
|
||||
self.logger = logging.getLogger(__name__)
|
||||
|
||||
# Initialize model
|
||||
if model_path and Path(model_path).exists():
|
||||
self.model = YOLO(model_path)
|
||||
self.logger.info(f"Loaded custom YOLO model from {model_path}")
|
||||
else:
|
||||
# Use pre-trained YOLO model for general object detection
|
||||
self.model = YOLO('yolov8n.pt')
|
||||
self.logger.info("Using pre-trained YOLOv8n model")
|
||||
|
||||
# Set device
|
||||
self.device = 'cuda' if torch.cuda.is_available() else 'cpu'
|
||||
self.logger.info(f"Using device: {self.device}")
|
||||
|
||||
def detect_id_cards(self, image_path: Path) -> List[Dict[str, Any]]:
|
||||
"""
|
||||
Detect ID cards in an image
|
||||
|
||||
Args:
|
||||
image_path: Path to image file
|
||||
|
||||
Returns:
|
||||
List of detection results with bounding boxes
|
||||
"""
|
||||
try:
|
||||
# Load image
|
||||
image = cv2.imread(str(image_path))
|
||||
if image is None:
|
||||
self.logger.error(f"Could not load image: {image_path}")
|
||||
return []
|
||||
|
||||
# Run detection
|
||||
results = self.model(image, conf=self.confidence)
|
||||
|
||||
detections = []
|
||||
for result in results:
|
||||
boxes = result.boxes
|
||||
if boxes is not None:
|
||||
for box in boxes:
|
||||
# Get coordinates
|
||||
x1, y1, x2, y2 = box.xyxy[0].cpu().numpy()
|
||||
confidence = float(box.conf[0])
|
||||
class_id = int(box.cls[0])
|
||||
class_name = self.model.names[class_id]
|
||||
|
||||
detection = {
|
||||
'bbox': [int(x1), int(y1), int(x2), int(y2)],
|
||||
'confidence': confidence,
|
||||
'class_id': class_id,
|
||||
'class_name': class_name,
|
||||
'area': (x2 - x1) * (y2 - y1)
|
||||
}
|
||||
detections.append(detection)
|
||||
|
||||
# Sort by confidence and area (prefer larger, more confident detections)
|
||||
detections.sort(key=lambda x: (x['confidence'], x['area']), reverse=True)
|
||||
|
||||
self.logger.info(f"Found {len(detections)} detections in {image_path.name}")
|
||||
return detections
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error detecting ID cards in {image_path}: {e}")
|
||||
return []
|
||||
|
||||
def crop_id_card(self, image_path: Path, bbox: List[int],
|
||||
output_path: Optional[Path] = None,
|
||||
padding: int = 10) -> Optional[np.ndarray]:
|
||||
"""
|
||||
Crop ID card from image using bounding box
|
||||
|
||||
Args:
|
||||
image_path: Path to input image
|
||||
bbox: Bounding box [x1, y1, x2, y2]
|
||||
output_path: Path to save cropped image
|
||||
padding: Padding around the bounding box
|
||||
|
||||
Returns:
|
||||
Cropped image as numpy array
|
||||
"""
|
||||
try:
|
||||
# Load image
|
||||
image = cv2.imread(str(image_path))
|
||||
if image is None:
|
||||
self.logger.error(f"Could not load image: {image_path}")
|
||||
return None
|
||||
|
||||
height, width = image.shape[:2]
|
||||
x1, y1, x2, y2 = bbox
|
||||
|
||||
# Add padding
|
||||
x1 = max(0, x1 - padding)
|
||||
y1 = max(0, y1 - padding)
|
||||
x2 = min(width, x2 + padding)
|
||||
y2 = min(height, y2 + padding)
|
||||
|
||||
# Crop image
|
||||
cropped = image[y1:y2, x1:x2]
|
||||
|
||||
# Save if output path provided
|
||||
if output_path:
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
cv2.imwrite(str(output_path), cropped)
|
||||
self.logger.info(f"Saved cropped image to {output_path}")
|
||||
|
||||
return cropped
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error cropping ID card from {image_path}: {e}")
|
||||
return None
|
||||
|
||||
def process_single_image(self, image_path: Path, output_dir: Path,
|
||||
save_original: bool = False) -> Dict[str, Any]:
|
||||
"""
|
||||
Process a single image: detect and crop ID cards
|
||||
|
||||
Args:
|
||||
image_path: Path to input image
|
||||
output_dir: Output directory for cropped images
|
||||
save_original: Whether to save original image with bounding boxes
|
||||
|
||||
Returns:
|
||||
Processing results
|
||||
"""
|
||||
result = {
|
||||
'input_path': str(image_path),
|
||||
'detections': [],
|
||||
'cropped_paths': [],
|
||||
'success': False
|
||||
}
|
||||
|
||||
try:
|
||||
# Detect ID cards
|
||||
detections = self.detect_id_cards(image_path)
|
||||
|
||||
if not detections:
|
||||
self.logger.warning(f"No ID cards detected in {image_path.name}")
|
||||
return result
|
||||
|
||||
# Process each detection
|
||||
for i, detection in enumerate(detections):
|
||||
bbox = detection['bbox']
|
||||
|
||||
# Create output filename
|
||||
stem = image_path.stem
|
||||
suffix = f"_card_{i+1}.jpg"
|
||||
output_path = output_dir / f"{stem}{suffix}"
|
||||
|
||||
# Crop ID card
|
||||
cropped = self.crop_id_card(image_path, bbox, output_path)
|
||||
|
||||
if cropped is not None:
|
||||
result['detections'].append(detection)
|
||||
result['cropped_paths'].append(str(output_path))
|
||||
|
||||
# Save original with bounding boxes if requested
|
||||
if save_original and detections:
|
||||
image = cv2.imread(str(image_path))
|
||||
for detection in detections:
|
||||
bbox = detection['bbox']
|
||||
cv2.rectangle(image, (bbox[0], bbox[1]), (bbox[2], bbox[3]), (0, 255, 0), 2)
|
||||
cv2.putText(image, f"{detection['confidence']:.2f}",
|
||||
(bbox[0], bbox[1] - 10), cv2.FONT_HERSHEY_SIMPLEX,
|
||||
0.5, (0, 255, 0), 2)
|
||||
|
||||
annotated_path = output_dir / f"{image_path.stem}_annotated.jpg"
|
||||
cv2.imwrite(str(annotated_path), image)
|
||||
result['annotated_path'] = str(annotated_path)
|
||||
|
||||
result['success'] = True
|
||||
self.logger.info(f"Processed {image_path.name}: {len(result['cropped_paths'])} cards cropped")
|
||||
|
||||
except Exception as e:
|
||||
self.logger.error(f"Error processing {image_path}: {e}")
|
||||
|
||||
return result
|
||||
|
||||
def batch_process(self, input_dir: Path, output_dir: Path,
|
||||
save_annotated: bool = False) -> Dict[str, Any]:
|
||||
"""
|
||||
Process all images in a directory and subdirectories
|
||||
|
||||
Args:
|
||||
input_dir: Input directory containing images
|
||||
output_dir: Output directory for cropped images
|
||||
save_annotated: Whether to save annotated images
|
||||
|
||||
Returns:
|
||||
Batch processing results
|
||||
"""
|
||||
# Create output directory
|
||||
output_dir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Get all image files recursively from input directory and subdirectories
|
||||
image_extensions = {'.jpg', '.jpeg', '.png', '.bmp', '.tiff'}
|
||||
image_files = []
|
||||
|
||||
# Recursively find all image files
|
||||
for file_path in input_dir.rglob('*'):
|
||||
if file_path.is_file() and file_path.suffix.lower() in image_extensions:
|
||||
image_files.append(file_path)
|
||||
|
||||
if not image_files:
|
||||
self.logger.error(f"No images found in {input_dir} and subdirectories")
|
||||
return {'success': False, 'error': 'No images found'}
|
||||
|
||||
self.logger.info(f"Processing {len(image_files)} images from {input_dir} and subdirectories")
|
||||
|
||||
results = {
|
||||
'total_images': len(image_files),
|
||||
'processed_images': 0,
|
||||
'total_detections': 0,
|
||||
'total_cropped': 0,
|
||||
'results': []
|
||||
}
|
||||
|
||||
# Process each image
|
||||
for i, image_path in enumerate(image_files):
|
||||
self.logger.info(f"Processing {i+1}/{len(image_files)}: {image_path.name}")
|
||||
|
||||
# Create subdirectory structure in output to match input structure
|
||||
relative_path = image_path.relative_to(input_dir)
|
||||
output_subdir = output_dir / relative_path.parent
|
||||
output_subdir.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
result = self.process_single_image(image_path, output_subdir, save_annotated)
|
||||
results['results'].append(result)
|
||||
|
||||
if result['success']:
|
||||
results['processed_images'] += 1
|
||||
results['total_detections'] += len(result['detections'])
|
||||
results['total_cropped'] += len(result['cropped_paths'])
|
||||
|
||||
# Summary
|
||||
self.logger.info(f"Batch processing completed:")
|
||||
self.logger.info(f" - Total images: {results['total_images']}")
|
||||
self.logger.info(f" - Processed: {results['processed_images']}")
|
||||
self.logger.info(f" - Total detections: {results['total_detections']}")
|
||||
self.logger.info(f" - Total cropped: {results['total_cropped']}")
|
||||
|
||||
return results
|
98
src/utils.py
Normal file
98
src/utils.py
Normal file
@@ -0,0 +1,98 @@
|
||||
"""
|
||||
Utility functions for data augmentation
|
||||
"""
|
||||
import os
|
||||
import logging
|
||||
from pathlib import Path
|
||||
from typing import List, Tuple, Optional
|
||||
import cv2
|
||||
import numpy as np
|
||||
from PIL import Image
|
||||
|
||||
def setup_logging(log_level: str = "INFO") -> logging.Logger:
|
||||
"""Setup logging configuration"""
|
||||
logging.basicConfig(
|
||||
level=getattr(logging, log_level.upper()),
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
|
||||
handlers=[
|
||||
logging.FileHandler('data_augmentation.log'),
|
||||
logging.StreamHandler()
|
||||
]
|
||||
)
|
||||
return logging.getLogger(__name__)
|
||||
|
||||
def get_image_files(directory: Path) -> List[Path]:
|
||||
"""Get all image files from directory"""
|
||||
SUPPORTED_FORMATS = ['.jpg', '.jpeg', '.png', '.bmp', '.tiff']
|
||||
|
||||
image_files = []
|
||||
if directory.exists():
|
||||
for ext in SUPPORTED_FORMATS:
|
||||
image_files.extend(directory.glob(f"*{ext}"))
|
||||
image_files.extend(directory.glob(f"*{ext.upper()}"))
|
||||
return sorted(image_files)
|
||||
|
||||
def validate_image(image_path: Path) -> bool:
|
||||
"""Validate if file is a valid image"""
|
||||
try:
|
||||
with Image.open(image_path) as img:
|
||||
img.verify()
|
||||
return True
|
||||
except Exception:
|
||||
return False
|
||||
|
||||
def load_image(image_path: Path, target_size: Tuple[int, int] = None) -> Optional[np.ndarray]:
|
||||
"""Load and resize image"""
|
||||
try:
|
||||
# Load image using OpenCV
|
||||
image = cv2.imread(str(image_path))
|
||||
if image is None:
|
||||
return None
|
||||
|
||||
# Convert BGR to RGB
|
||||
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
|
||||
|
||||
# Resize if target_size is provided
|
||||
if target_size:
|
||||
image = cv2.resize(image, target_size, interpolation=cv2.INTER_AREA)
|
||||
|
||||
return image
|
||||
except Exception as e:
|
||||
print(f"Error loading image {image_path}: {e}")
|
||||
return None
|
||||
|
||||
def save_image(image: np.ndarray, output_path: Path, quality: int = 95) -> bool:
|
||||
"""Save image to file"""
|
||||
try:
|
||||
# Convert RGB to BGR for OpenCV
|
||||
image_bgr = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
|
||||
|
||||
# Create output directory if it doesn't exist
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
|
||||
# Save image
|
||||
cv2.imwrite(str(output_path), image_bgr, [cv2.IMWRITE_JPEG_QUALITY, quality])
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error saving image {output_path}: {e}")
|
||||
return False
|
||||
|
||||
def create_augmented_filename(original_path: Path, index: int, suffix: str = "aug") -> Path:
|
||||
"""Create filename for augmented image"""
|
||||
stem = original_path.stem
|
||||
suffix = f"_{suffix}_{index:02d}"
|
||||
return original_path.parent / f"{stem}{suffix}{original_path.suffix}"
|
||||
|
||||
def get_file_size_mb(file_path: Path) -> float:
|
||||
"""Get file size in MB"""
|
||||
return file_path.stat().st_size / (1024 * 1024)
|
||||
|
||||
def print_progress(current: int, total: int, prefix: str = "Progress"):
|
||||
"""Print progress bar"""
|
||||
bar_length = 50
|
||||
filled_length = int(round(bar_length * current / float(total)))
|
||||
percents = round(100.0 * current / float(total), 1)
|
||||
bar = '=' * filled_length + '-' * (bar_length - filled_length)
|
||||
print(f'\r{prefix}: [{bar}] {percents}% ({current}/{total})', end='')
|
||||
if current == total:
|
||||
print()
|
Reference in New Issue
Block a user