Introduction: Why Is YOLO So Popular?
"I want to build an object detection project with YOLO, but I don't know where to start..."
This is a real struggle for many AI developers. YOLO (You Only Look Once) is one of the most widely used algorithms in object detection. From YOLOv1 to the latest YOLOv10, the YOLO series has achieved a strong balance between speed and accuracy.
YOLO Application Scenarios:
- Autonomous Driving: Real-time detection of vehicles, pedestrians, and traffic signs
- Industrial Quality Inspection: Rapid detection of product defects
- Medical Imaging: Assisting doctors in identifying lesions
- Retail Analytics: Product recognition and inventory management
- Security Surveillance: Real-time monitoring and anomaly detection
YOLO's Advantages:
- Fast: Can process video streams in real time
- Accurate: Achieves a good balance between speed and accuracy
- Easy to use: Comprehensive tools and documentation
- Active community: Abundant tutorials and examples
But the first hurdle many developers face when using YOLO is: How do you create a high-quality YOLO dataset?
Today, we'll walk you through creating a complete YOLO dataset from scratch, all the way to successful model training. Whether you're a beginner or an experienced developer, you'll find practical methods and tips in this article.
What Is a YOLO Dataset?
YOLO Data Format
YOLO uses a concise text format to store annotation information:
File Structure:
dataset/
├── images/
│ ├── train/
│ │ ├── image001.jpg
│ │ ├── image002.jpg
│ │ └── ...
│ └── val/
│ ├── image101.jpg
│ └── ...
└── labels/
├── train/
│ ├── image001.txt
│ ├── image002.txt
│ └── ...
└── val/
├── image101.txt
└── ...
Annotation File Format (image001.txt):
class_id center_x center_y width height
0 0.5 0.5 0.3 0.4
1 0.2 0.3 0.1 0.2
Format Description:
class_id: Category ID (starting from 0)center_x, center_y: Normalized coordinates of the bounding box center (0-1)width, height: Normalized width and height of the bounding box (0-1)
Key point: YOLO uses normalized coordinates — all coordinate values are between 0 and 1.
YOLO Version Differences
Different YOLO versions have slightly different dataset format requirements:
| Version | Format Requirements | Special Notes |
|---|---|---|
| YOLOv5 | Standard format | Supports custom class counts |
| YOLOv8 | Standard format | Ultralytics format recommended |
| YOLOv9 | Standard format | Compatible with YOLOv5 format |
| YOLOv10 | Standard format | Latest version, best performance |
Good news: All YOLO versions use the same data format — your dataset is universally compatible!
Step 1: Data Collection and Preparation
1.1 Define Dataset Requirements
Before you begin, clarifying your requirements is the first step to success. A clear requirements plan can save you significant time and cost.
Requirements Analysis Checklist
1. Target Category Definition
Define Detection Targets:
- List all object categories to detect
- Define boundaries for each category (what counts, what doesn't)
- Consider category hierarchy (e.g., vehicle -> car, truck, bus)
Real Case:
A traffic monitoring project initially defined only one "vehicle" category. After training, they found the model couldn't distinguish cars from trucks. After subdividing into "car," "truck," "bus," and "motorcycle," model accuracy improved by 15%.
Category Count Recommendations:
- Simple projects: 1-5 categories (suitable for beginners)
- Medium projects: 5-20 categories (common applications)
- Complex projects: 20+ categories (requires more data and annotation time)
2. Data Scale Planning
Data Volume Estimates:
| Project Type | Min Images Per Class | Recommended Images | Total Images (5 classes) |
|---|---|---|---|
| Quick Prototype | 100-200 | 500 | 2,500 |
| Production Application | 1,000 | 3,000 | 15,000 |
| High-Precision Application | 5,000 | 10,000 | 50,000 |
Factors Affecting Data Volume:
- Number of categories: More categories require more data
- Scene complexity: Complex scenes need more data
- Precision requirements: High precision demands more high-quality data
- Class balance: Ensure relatively balanced data across categories (ratio no more than 10:1)
Real Case:
An industrial quality inspection project needed to detect 10 defect types. Normal products had 10,000 images, but defect samples only had 500. Through active defect sample collection and data augmentation, each defect category eventually reached 2,000 samples, and model accuracy improved from 75% to 92%.
3. Scene Diversity Planning
Scene Coverage Dimensions:
Time Dimension:
- Daytime, nighttime, dusk, dawn
- Different seasons (spring, summer, fall, winter)
- Different time periods (morning, noon, evening)
Weather Dimension:
- Sunny, rainy, snowy, foggy
- Different lighting conditions (bright light, shadows, backlight)
Environment Dimension:
- Indoor, outdoor
- Urban, rural, highway
- Different background complexity levels
Target State Dimension:
- Stationary, moving
- Complete, partially occluded
- Different angles (front, side, back)
Scene Diversity Checklist:
- Cover at least 3-5 major scenarios
- Include edge cases (extreme situations)
- Avoid overly uniform scenes (prone to overfitting)
- Ensure consistent scene distribution between training and test sets 4. Image Quality Requirements
Resolution Requirements:
| Application Scenario | Minimum Resolution | Recommended Resolution | Notes |
|---|---|---|---|
| Small Object Detection | 1280x1280 | 1920x1920+ | Higher resolution needed for small targets |
| Standard Detection | 640x640 | 1280x1280 | YOLO default input size |
| Fast Detection | 416x416 | 640x640 | Speed priority, acceptable precision |
Image Quality Checks:
- Clarity: Target objects clearly visible, no blur
- Contrast: Obvious contrast between target and background
- Color: True colors, no severe distortion
- Exposure: Normal exposure, not overexposed or underexposed
- Format: Unified format (JPG or PNG), avoid format inconsistency
5. Budget and Timeline Planning
Time Estimates (for 5 classes, 1000 images each):
| Phase | Time Estimate | Notes |
|---|---|---|
| Data Collection | 1-2 weeks | Varies by data source |
| Data Annotation | 2-4 weeks | Can be shortened to 1 week with AI assistance |
| Quality Check | 3-5 days | Multiple review rounds |
| Format Conversion | 1 day | Automated processing |
| Total | 4-7 weeks | Can be shortened to 2-3 weeks with AI assistance |
Cost Estimates (for 5 classes, 1000 images each):
| Approach | Annotation Cost | Tool Cost | Total Cost |
|---|---|---|---|
| Pure Manual Annotation | $8,000-12,000 | $0 | $8,000-12,000 |
| AI-Assisted Annotation | $1,600-2,400 | $0 (free tools) | $1,600-2,400 |
| Savings | 80% | - | 80% |
Requirements Document Template:
# YOLO Dataset Requirements Document
## Project Information
- Project Name: [Project Name]
- Application Scenario: [Scenario Description]
- Target Accuracy: [Target mAP Value]
## Category Definitions
1. [Category 1]: [Detailed Definition]
2. [Category 2]: [Detailed Definition]
...
## Data Scale
- Number of Categories: [N]
- Images Per Category: [M]
- Total Images: [N x M]
## Scene Requirements
- Time: [Daytime/Nighttime/All Day]
- Weather: [Sunny/Rainy/All Weather]
- Environment: [Indoor/Outdoor/Mixed]
## Quality Requirements
- Resolution: [Minimum Resolution]
- Annotation Precision: [IoU Requirement]
- Category Accuracy: [Accuracy Requirement]
## Timeline
- Start Date: [Date]
- Completion Date: [Date]
- Milestones: [Key Checkpoints]
## Budget
- Annotation Cost: [Budget]
- Tool Cost: [Budget]
- Total Budget: [Total Budget]
1.2 Collecting Image Data: A Complete Guide to Data Sources
Data Source 1: Public Datasets (Ideal for Quick Starts)
Public datasets are the go-to choice for quickly starting a project, especially suitable for learning and prototyping.
Major Public Dataset Comparison:
| Dataset | Classes | Images | Annotations | Features | Use Cases |
|---|---|---|---|---|---|
| COCO | 80 | 330K | 2.5M | High quality, precise annotations | General object detection |
| Open Images | 600 | 9M | 36M | Many classes, large volume | Large-scale training |
| ImageNet | 1000 | 14M | - | Classification dataset | Pre-trained models |
| Pascal VOC | 20 | 11K | 27K | Classic dataset | Learning and research |
| Cityscapes | 30 | 25K | - | Urban street scenes | Autonomous driving |
COCO Dataset Details:
Download Methods:
# Method 1: Official download
# Visit https://cocodataset.org/#download
# Download train2017.zip, val2017.zip, annotations_trainval2017.zip
# Method 2: Using API
from pycocotools.coco import COCO
import requests
# Download images and annotations
Category List (partial):
- People: person
- Vehicles: car, truck, bus, motorcycle, bicycle
- Animals: cat, dog, horse, cow, elephant
- Furniture: chair, couch, bed, table
- Electronics: laptop, mouse, keyboard, cell phone
Converting to YOLO Format:
Using a Python Script:
from pycocotools.coco import COCO
import json
import os
from PIL import Image
def coco_to_yolo(coco_annotation_file, output_dir):
"""
Convert COCO format to YOLO format
"""
coco = COCO(coco_annotation_file)
# Create output directories
os.makedirs(f'{output_dir}/images', exist_ok=True)
os.makedirs(f'{output_dir}/labels', exist_ok=True)
# Get all image IDs
img_ids = coco.getImgIds()
for img_id in img_ids:
# Get image info
img_info = coco.loadImgs(img_id)[0]
img_width = img_info['width']
img_height = img_info['height']
# Get all annotations for this image
ann_ids = coco.getAnnIds(imgIds=img_id)
anns = coco.loadAnns(ann_ids)
# Create YOLO format annotation file
label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
with open(label_file, 'w') as f:
for ann in anns:
# Get category ID (YOLO starts from 0)
class_id = ann['category_id'] - 1 # COCO starts from 1
# Get bounding box (COCO format: x, y, width, height)
bbox = ann['bbox']
x, y, w, h = bbox
# Convert to YOLO format (normalized center coordinates and dimensions)
center_x = (x + w / 2) / img_width
center_y = (y + h / 2) / img_height
norm_w = w / img_width
norm_h = h / img_height
# Write to file
f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")
# Copy image
# ... (copy image to images directory)
# Usage
coco_to_yolo('annotations/instances_train2017.json', 'yolo_dataset')
Advantages:
- Large volume, high quality
- Precise annotations, professionally reviewed
- Free to use, no copyright issues
- Community support, abundant tutorials
- Ideal for quick starts and prototyping
Disadvantages:
- May not match your specific application scenario
- Categories may not be granular enough
- Scenes may not be diverse enough
- Requires filtering and format conversion
Usage Recommendations:
- Suitable for quickly validating ideas
- Suitable as pre-training data
- Suitable for learning YOLO
- Not suitable for production (unless it perfectly matches your scenario) Data Source 2: Self-Captured (Recommended for Specific Scenarios)
Self-captured data is the most reliable source, giving you full control over data quality and scene coverage.
Shooting Plan Development:
1. Scene Coverage Plan
Time Coverage:
- Daytime: Morning (8am-12pm), Afternoon (12pm-6pm)
- Nighttime: Evening (6pm-8pm), Late night (8pm-12am)
- Special times: Dusk, dawn, harsh midday light
Shooting Tips:
- Capture at least 100-200 images per time period
- Ensure scene diversity across different time periods
- Record shooting time and lighting conditions
Weather Coverage:
- Sunny: Normal lighting, clear visibility
- Rainy: Wet surfaces, reflective effects
- Overcast: Soft lighting, no harsh shadows
- Foggy: Low visibility, blurred targets
Shooting Tips:
- Capture at least 200-300 images per weather condition
- Note how weather affects target appearance
- Consider extreme weather situations
Angle Coverage:
- Front: 0 degrees, target fully visible
- Side: 45 degrees, 90 degrees, partial occlusion
- Top-down: From above, suitable for surveillance scenarios
- Bottom-up: From below, suitable for special viewpoints
Distance Coverage:
- Close-up: Target occupies 50%+ of image, clear details
- Medium range: Target occupies 20-50% of image, common scenario
- Long range: Target occupies 5-20% of image, small object detection
2. Target Diversity Planning
Size Diversity:
- Large objects: Occupying 30-80% of image, easy to detect
- Medium objects: Occupying 10-30% of image, standard detection
- Small objects: Occupying 1-10% of image, requires high resolution
State Diversity:
- Stationary: Target at rest, clearly visible
- Moving: Target in motion, possible blur
- Partially occluded: 20-50% occluded by other objects
- Heavily occluded: 50%+ occluded (optional, for robustness training)
Lighting Diversity:
- Bright: Sufficient lighting, clear contrast
- Shadow: Partially in shadow, reduced contrast
- Backlit: Target backlit, clear silhouette but blurred details
- Harsh light: Overexposed, lost details
3. Equipment Selection and Settings
Smartphone Capture (Recommended for beginners):
Advantages:
- Portable, capture anytime
- Auto-focus, simple operation
- Modern phones have sufficient quality (12MP+)
- Low cost, no extra equipment needed
Settings:
- Resolution: Set to maximum (typically 4K or higher)
- Format: Use JPG, balancing quality and file size
- Focus: Ensure target is in sharp focus
- Stability: Use a tripod or stabilizer to avoid shake
Camera Capture (Recommended for professional projects):
Advantages:
- Higher image quality, richer details
- More controllable parameters (ISO, aperture, shutter)
- Suitable for professional projects
Settings:
- ISO: Keep as low as possible (100-400) to reduce noise
- Aperture: f/5.6-f/8, balancing depth of field and quality
- Shutter: 1/250s+, avoiding motion blur
- White balance: Adjust per scene, maintaining color accuracy
Drone Capture (Suitable for large scenes):
Advantages:
- Top-down perspective, ideal for surveillance scenarios
- Covers large areas efficiently
- Unique viewpoints, adding data diversity
Considerations:
- Comply with flight regulations
- Monitor weather conditions (wind, rain)
- Ensure sufficient battery 4. Shooting Workflow
Preparation Phase (1-2 days):
-
Create a shooting plan
- List all scenes to cover
- Plan shooting routes and schedules
- Prepare equipment (camera, memory cards, batteries)
-
Equipment check
- Check camera/phone battery level
- Check storage space (recommend at least 100GB)
- Check lens cleanliness
Shooting Phase (varies by project scale):
-
Shoot according to plan
- Strictly follow the scene coverage plan
- Capture at least 50-100 images per scene
- Record shooting info (time, location, scene)
-
Real-time checks
- Periodically check photo quality
- Delete blurry or out-of-focus photos
- Ensure targets are clearly visible
-
Data backup
- Back up immediately after each day's shooting
- Use multiple storage devices
- Prevent data loss
Organization Phase (after shooting):
-
Photo screening
- Delete blurry or out-of-focus photos
- Delete duplicate photos
- Keep high-quality photos
-
Photo naming
- Use meaningful naming conventions
- Example:
scene_time_weather_001.jpg - Facilitates later management and annotation
-
Data statistics
- Count photos per scene type
- Check if scene coverage is complete
- Supplement missing scenes
Real Cases:
Case 1: Autonomous Driving Road Scenes
An autonomous driving company needed to collect road scene data. The team created a detailed shooting plan:
- Time: 1 month each for daytime, nighttime, and dusk
- Weather: 2 weeks each for sunny, rainy, and overcast
- Locations: 5 different cities, covering urban roads, highways, and rural roads
- Equipment: 8 vehicle-mounted cameras, shooting simultaneously
- Result: 50,000 high-quality images collected in 3 months, covering all scenarios
Case 2: Industrial Quality Inspection Product Photography
A factory needed to detect product defects. The team used industrial cameras:
- Fixed shooting positions for consistency
- Standard light sources to reduce lighting variation
- Multiple angles per product (front, side, top)
- Result: 20,000 product images collected in 1 month, including 5,000 defect samples
Shooting Checklist:
Equipment Preparation:
- Camera/phone fully charged
- Sufficient storage space (recommend 100GB+)
- Clean lens, no smudges
- Backup batteries and memory cards
Shooting Quality:
- Targets clear, no blur
- Accurate focus, no defocus
- Normal exposure, not over/underexposed
- Good composition, targets complete
Scene Coverage:
- Complete time coverage (day/night)
- Complete weather coverage (sunny/rainy)
- Complete angle coverage (front/side)
- Complete distance coverage (close/far)
Data Management:
- Standardized photo naming
- Timely data backup
- Complete shooting info records
Data Source 3: Video Frame Extraction (Efficient Method)
Advantages:
- Extract frames from video, highly efficient
- Covers continuous actions
- Natural scenes
Using TjMakeBot for Extraction:
- Upload video file
- Set extraction frame rate (e.g., 1fps)
- Automatically extract key frames
- Directly annotate extracted frames
Tips:
- Select key frames: Avoid duplicate frames
- Set appropriate frame rate: 1-5fps is usually sufficient
- Process multiple videos: Cover different scenes
Data Source 4: Other Sources (Use with Caution)
Considerations:
- Comply with data usage license agreements
- Respect intellectual property and copyright
- Obtain necessary authorization or permissions
- Do not use copyright-protected content
Data Requirements Checklist:
Clarity:
- Images are clear, target objects visible
- Avoid blurry or out-of-focus images
- Resolution at least 640x640
Target Size:
- Target objects appropriately sized (recommend 5%-50% of image)
- Avoid targets too small (< 1%) or too large (> 80%)
- Small targets require higher resolution
Scene Diversity:
- Cover different scenes
- Avoid overfitting
- Include edge cases
Target Completeness:
- Annotation targets are complete
- Avoid severe occlusion (> 50%)
- Partial occlusion (< 50%) can be annotated
1.3 Data Preprocessing
Data preprocessing is a critical step for ensuring data quality, directly impacting model training effectiveness.
Preprocessing Workflow:
Step 1: Data Cleaning
Remove Low-Quality Images:
Checks:
- Blurry images: Targets unclear, unidentifiable
- Out-of-focus images: Focus not on the target
- Over/underexposed: Severely abnormal exposure
- Duplicate images: Identical or highly similar
- Irrelevant images: Don't contain target objects
Automated Cleaning Script:
import cv2
import numpy as np
import os
from PIL import Image
import imagehash
def calculate_blur_score(image_path):
"""Calculate image blur score"""
img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
laplacian_var = cv2.Laplacian(img, cv2.CV_64F).var()
return laplacian_var
def find_duplicates(image_dir, threshold=5):
"""Find duplicate images"""
image_hashes = {}
duplicates = []
for filename in os.listdir(image_dir):
if filename.endswith(('.jpg', '.png')):
filepath = os.path.join(image_dir, filename)
img_hash = imagehash.average_hash(Image.open(filepath))
# Check for similar images
for existing_file, existing_hash in image_hashes.items():
if img_hash - existing_hash < threshold:
duplicates.append((existing_file, filename))
break
image_hashes[filename] = img_hash
return duplicates
def clean_dataset(image_dir, blur_threshold=100):
"""Clean the dataset"""
cleaned_dir = os.path.join(image_dir, 'cleaned')
os.makedirs(cleaned_dir, exist_ok=True)
removed_count = 0
for filename in os.listdir(image_dir):
if filename.endswith(('.jpg', '.png')):
filepath = os.path.join(image_dir, filename)
# Check blur score
blur_score = calculate_blur_score(filepath)
if blur_score < blur_threshold:
print(f"Removing blurry image: {filename} (blur score: {blur_score:.2f})")
removed_count += 1
continue
# Copy to cleaned directory
import shutil
shutil.copy(filepath, os.path.join(cleaned_dir, filename))
print(f"Cleaning complete, removed {removed_count} low-quality images")
# Usage
clean_dataset('./raw_images')
Manual Check:
- Quickly browse all images
- Flag obviously problematic images
- Batch delete
Step 2: Unify Format
Format Selection:
| Format | Advantages | Disadvantages | Recommended Scenario |
|---|---|---|---|
| JPG | Small files, fast loading | Lossy compression | Most scenarios (recommended) |
| PNG | Lossless compression, high quality | Large files | Scenarios requiring high quality |
Conversion Script:
from PIL import Image
import os
def convert_format(input_dir, output_dir, target_format='JPG', quality=95):
"""Unify image format"""
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(input_dir):
if filename.endswith(('.jpg', '.png', '.bmp', '.tiff')):
input_path = os.path.join(input_dir, filename)
output_filename = os.path.splitext(filename)[0] + f'.{target_format.lower()}'
output_path = os.path.join(output_dir, output_filename)
# Open and convert
img = Image.open(input_path)
# Convert to RGB (if RGBA)
if img.mode == 'RGBA':
rgb_img = Image.new('RGB', img.size, (255, 255, 255))
rgb_img.paste(img, mask=img.split()[3])
img = rgb_img
# Save
if target_format == 'JPG':
img.save(output_path, 'JPEG', quality=quality)
else:
img.save(output_path, target_format)
print(f"Converted: {filename} -> {output_filename}")
# Usage
convert_format('./raw_images', './formatted_images', 'JPG', quality=95)
Step 3: Unify Dimensions
Size Selection Principles:
YOLO Input Sizes:
- 640x640: Standard size, balancing speed and precision (recommended)
- 416x416: Fast detection, suitable for real-time applications
- 1280x1280: High-precision detection, suitable for small objects
Resizing Methods:
Method 1: Aspect-Ratio-Preserving Resize (Recommended)
from PIL import Image
def resize_with_aspect_ratio(image_path, target_size=640, padding_color=(114, 114, 114)):
"""
Resize while preserving aspect ratio, padding with gray
"""
img = Image.open(image_path)
original_width, original_height = img.size
# Calculate scale
scale = min(target_size / original_width, target_size / original_height)
new_width = int(original_width * scale)
new_height = int(original_height * scale)
# Resize image
img_resized = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
# Create target-size canvas
img_padded = Image.new('RGB', (target_size, target_size), padding_color)
# Calculate centering position
x_offset = (target_size - new_width) // 2
y_offset = (target_size - new_height) // 2
# Paste resized image
img_padded.paste(img_resized, (x_offset, y_offset))
return img_padded
# Batch processing
def batch_resize(input_dir, output_dir, target_size=640):
"""Batch resize"""
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(input_dir):
if filename.endswith(('.jpg', '.png')):
input_path = os.path.join(input_dir, filename)
output_path = os.path.join(output_dir, filename)
img_resized = resize_with_aspect_ratio(input_path, target_size)
img_resized.save(output_path)
print(f"Resized: {filename}")
# Usage
batch_resize('./formatted_images', './resized_images', target_size=640)
Method 2: Direct Stretching (Not Recommended)
- Distorts target shape
- May cause the model to learn incorrect features
- Only use when target shape doesn't matter
Step 4: Data Augmentation (Optional)
When to Use Data Augmentation:
- When data volume is insufficient
- When you need to improve model generalization
- When classes are imbalanced
Common Augmentation Methods:
1. Geometric Transforms:
- Rotation: +/-15 degrees, simulating different angles
- Flip: Horizontal flip, vertical flip
- Scale: 0.8-1.2x, simulating different distances
- Translation: +/-10%, simulating position changes
2. Color Transforms:
- Brightness adjustment: +/-20%, simulating different lighting
- Contrast adjustment: +/-20%, enhancing/reducing contrast
- Saturation adjustment: +/-30%, simulating different environments
- Hue adjustment: +/-10 degrees, simulating different light sources
3. Noise Addition:
- Gaussian noise: Simulating sensor noise
- Salt-and-pepper noise: Simulating transmission errors
Augmentation Script:
from PIL import Image, ImageEnhance
import random
import os
def augment_image(image_path, output_dir, num_augmentations=3):
"""Augment a single image"""
img = Image.open(image_path)
base_name = os.path.splitext(os.path.basename(image_path))[0]
for i in range(num_augmentations):
# Random rotation
angle = random.uniform(-15, 15)
img_rotated = img.rotate(angle, expand=False)
# Random flip
if random.random() > 0.5:
img_rotated = img_rotated.transpose(Image.FLIP_LEFT_RIGHT)
# Random brightness adjustment
enhancer = ImageEnhance.Brightness(img_rotated)
img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))
# Random contrast adjustment
enhancer = ImageEnhance.Contrast(img_rotated)
img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))
# Save
output_path = os.path.join(output_dir, f"{base_name}_aug_{i}.jpg")
img_rotated.save(output_path)
print(f"Augmented: {base_name}_aug_{i}.jpg")
def batch_augment(input_dir, output_dir, num_augmentations=3):
"""Batch augmentation"""
os.makedirs(output_dir, exist_ok=True)
for filename in os.listdir(input_dir):
if filename.endswith(('.jpg', '.png')):
input_path = os.path.join(input_dir, filename)
augment_image(input_path, output_dir, num_augmentations)
# Usage
batch_augment('./resized_images', './augmented_images', num_augmentations=3)
Note: Data augmentation should be performed before annotation, or use a tool that supports automatic annotation coordinate adjustment.
Preprocessing Checklist
Data Cleaning:
- Remove blurry images
- Remove out-of-focus images
- Remove duplicate images
- Remove irrelevant images
Format Unification:
- Unified to JPG or PNG format
- Converted to RGB mode
- File integrity verified
Size Unification:
- Resized to target dimensions (e.g., 640x640)
- Aspect ratio preserved (recommended)
- Image quality verified
Data Augmentation (optional):
- Augmentation methods determined
- Augmentation applied
- Augmentation results verified
Data Statistics:
- Final image count tallied
- Class distribution checked
- Data quality verified
Step 2: Data Annotation
2.1 Choosing an Annotation Tool
Tool Selection Advice:
Different tools have different characteristics:
- Free tools: Suitable for budget-limited users, features may be relatively simple
- Paid tools: Typically more comprehensive features, suitable for enterprise users with budget
- Selection principle: Choose based on project needs, budget, and technical capability
TjMakeBot Features:
- Free (basic features)
- AI chat-based annotation, significantly improving efficiency
- Supports batch processing
- Online and ready to use, no installation needed
- Supports video-to-frame conversion
2.2 Creating Category Labels
Create your categories in TjMakeBot:
Category List Example:
0: car
1: person
2: bicycle
3: motorcycle
4: bus
Naming Conventions:
- Use lowercase English
- Avoid spaces and special characters
- Category names should be clear and unambiguous
2.3 Starting Annotation: Two Methods Explained
Method 1: AI Chat-Based Annotation (Highly Recommended)
Suitable Scenarios:
- Batch annotation (> 100 images)
- Standard scenes (common objects)
- Rapid prototyping
- Budget-limited projects
Complete Workflow:
Step 1: Upload Images (1 minute)
- Batch upload all images
- Recommend testing with 10-20 images first
Step 2: Open AI Assistant (5 seconds)
- Click the "AI Assistant" button
- Chat panel opens
Step 3: Enter Instructions (10 seconds)
Basic instruction:
"Please annotate all cars and pedestrians"
Advanced instructions:
"Annotate all vehicles, but exclude motorcycles"
"Annotate all targets in the center area of the image"
"Annotate all cars larger than 100 pixels"
Step 4: AI Auto-Annotation (automatic)
- AI understands the instruction
- Automatically identifies targets
- Generates annotation results
Step 5: Review and Fine-Tune (5-10 minutes per 100 images)
- Quickly browse annotation results
- Correct obvious errors
- Supplement missed annotations
Step 6: Apply to All (1 second)
- Confirm satisfactory results
- One-click apply to all images
Advantages:
- Fast: 1000 images completed in 2-3 hours
- High accuracy: AI accuracy typically >90%
- Low cost: Free tool, virtually zero cost
- High efficiency: Batch processing, dramatically improved efficiency
Real Case:
A student project needed to annotate 2000 images. Using AI chat-based annotation, annotation was completed in 2 days with 95% accuracy. Traditional methods would have taken 2 weeks.
Method 2: Manual Annotation (Suitable for Complex Scenes)
Suitable Scenarios:
- Complex scenes (AI has difficulty recognizing)
- Special objects (categories AI hasn't been trained on)
- High precision requirements (pixel-level precision needed)
- Small-scale projects (< 100 images)
Complete Workflow:
Step 1: Select Image (5 seconds)
- Click an image to open the annotation interface
Step 2: Select Category (3 seconds)
- Choose from the category list
- Or create a new category
Step 3: Draw Bounding Box (10-30 seconds)
- Mouse drag to draw a rectangle
- Drag from top-left to bottom-right
- Or use keyboard shortcuts
Step 4: Adjust Position and Size (10-20 seconds)
- Drag the bounding box to move position
- Drag corner points to adjust size
- Use arrow keys for fine-tuning
Step 5: Save Annotation (2 seconds)
- Auto-save
- Or manual save
Manual Annotation Tips:
Tip 1: Use Keyboard Shortcuts
W: Switch toolsDelete: Delete selected annotation- Arrow keys: Fine-tune position
Ctrl+Z: Undo
Tip 2: Precise Adjustment
- Use zoom to enlarge the image
- Use crosshairs for precise positioning
- Multiple fine adjustments for optimal placement
Tip 3: Batch Operations
- Copy annotations to the next image
- Batch delete incorrect annotations
- Batch modify categories
Advantages:
- High precision: Pixel-level accuracy
- Flexible: Can handle any scenario
- Controllable: Full control over the annotation process
Disadvantages:
- Slow: 2-5 minutes per image
- Expensive: Requires significant manpower
- Fatiguing: Long annotation sessions lead to errors
Recommendation: Combine AI-assisted and manual annotation — AI handles standard scenes, manual handles complex scenes.
2.4 Annotation Quality Check: Ensuring Data Quality
Why Is Quality Checking So Important?
A real case:
A project annotated 5000 images, but after training, the model only achieved 70% accuracy. Upon inspection, 15% of the annotation data contained errors. After re-annotation, model accuracy improved to 92%.
Quality Check Checklist:
1. Completeness Check (Most Important)
- All target objects are annotated
- No missed objects
- Partially occluded objects are also annotated
Check Methods:
- Browse image by image, looking for omissions
- Use AI-assisted checking (AI can identify omissions)
- Sampling check (check 1 in every 10)
2. Accuracy Check
- Bounding boxes precisely cover targets
- Bounding boxes don't include excessive background (< 10%)
- Bounding boxes don't miss parts of the target
Check Methods:
- Check if bounding boxes are tight to target edges
- Check for obvious deviations
- Use IoU metrics for evaluation
3. Category Accuracy
- Category labels are correct
- No category confusion
- Edge cases handled correctly
Check Methods:
- Check each annotation box's category
- Pay special attention to easily confused categories
- Standardize edge case handling
4. Consistency Check
- No duplicate annotations
- Annotation standards are uniform
- Different annotators maintain consistent standards
Check Methods:
- Check for overlapping annotation boxes
- Compare annotations from different annotators
- Analyze annotation differences
Quality Metric Standards:
| Metric | Minimum Standard | Recommended Standard | Excellent Standard |
|---|---|---|---|
| Annotation Completeness | > 90% | > 95% | > 98% |
| Bounding Box Accuracy | > 85% | > 90% | > 95% |
| Category Accuracy | > 95% | > 98% | > 99% |
| Annotation Consistency | > 85% | > 90% | > 95% |
Quality Check Tools:
TjMakeBot Built-in Quality Check:
- Automatically detects missed annotations
- Automatically detects duplicate annotations
- Automatically detects bounding box deviations
- Generates quality reports
Usage Steps:
- After completing annotation, click "Quality Check"
- System automatically analyzes annotation quality
- Generates quality report
- Fix issues based on the report
Quality Improvement Workflow:
First Round (after annotation completion):
- Quickly browse all images
- Identify obvious errors
- Correct erroneous annotations
Second Round (after corrections):
- Sampling check (20-30%)
- Detailed bounding box inspection
- Check category accuracy
Third Round (final confirmation):
- Expert review
- Performance testing
- Final confirmation
Quality Check Time Allocation:
- Annotation time: 70%
- Quality checking: 20%
- Correction time: 10%
Remember: The time invested in quality checking is worthwhile — it prevents costly rework later.
Step 3: Data Format Conversion
Data format conversion is the critical step of converting annotation results into the format required for YOLO training.
3.1 Exporting YOLO Format
Using TjMakeBot Export
Steps:
-
Select Annotation Data
- Open the annotation project in TjMakeBot
- Select all annotated images
- Or select images of specific categories
-
Export Settings
- Click the "Export" button
- Select "YOLO Format"
- Choose export options:
- Include images
- Include annotation files
- Maintain directory structure
-
Download Files
- Wait for export to complete
- Download ZIP file
- Extract to local directory
Export Result Structure:
dataset/
├── images/
│ ├── image001.jpg
│ ├── image002.jpg
│ └── ...
└── labels/
├── image001.txt
├── image002.txt
└── ...
Manual Conversion (From Other Formats)
Converting from VOC Format:
import xml.etree.ElementTree as ET
import os
def voc_to_yolo(voc_xml_path, yolo_txt_path, img_width, img_height, class_mapping):
"""
Convert VOC format to YOLO format
"""
tree = ET.parse(voc_xml_path)
root = tree.getroot()
with open(yolo_txt_path, 'w') as f:
for obj in root.findall('object'):
# Get category
class_name = obj.find('name').text
class_id = class_mapping[class_name]
# Get bounding box (VOC format: xmin, ymin, xmax, ymax)
bbox = obj.find('bndbox')
xmin = float(bbox.find('xmin').text)
ymin = float(bbox.find('ymin').text)
xmax = float(bbox.find('xmax').text)
ymax = float(bbox.find('ymax').text)
# Convert to YOLO format
center_x = ((xmin + xmax) / 2) / img_width
center_y = ((ymin + ymax) / 2) / img_height
width = (xmax - xmin) / img_width
height = (ymax - ymin) / img_height
# Write to file
f.write(f"{class_id} {center_x} {center_y} {width} {height}\n")
# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
voc_to_yolo('annotations/image001.xml', 'labels/image001.txt', 1920, 1080, class_mapping)
Converting from COCO Format:
import json
from PIL import Image
def coco_to_yolo(coco_json_path, output_dir, class_mapping):
"""
Convert COCO format to YOLO format
"""
with open(coco_json_path, 'r') as f:
coco_data = json.load(f)
# Create output directory
os.makedirs(f'{output_dir}/labels', exist_ok=True)
# Build image ID to filename mapping
img_id_to_info = {img['id']: img for img in coco_data['images']}
# Group annotations by image ID
annotations_by_img = {}
for ann in coco_data['annotations']:
img_id = ann['image_id']
if img_id not in annotations_by_img:
annotations_by_img[img_id] = []
annotations_by_img[img_id].append(ann)
# Convert annotations for each image
for img_id, anns in annotations_by_img.items():
img_info = img_id_to_info[img_id]
img_width = img_info['width']
img_height = img_info['height']
# Create YOLO format file
label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
with open(label_file, 'w') as f:
for ann in anns:
category_id = ann['category_id']
class_name = next(cat['name'] for cat in coco_data['categories'] if cat['id'] == category_id)
class_id = class_mapping.get(class_name, -1)
if class_id == -1:
continue # Skip unmapped categories
# COCO format: x, y, width, height (absolute coordinates)
bbox = ann['bbox']
x, y, w, h = bbox
# Convert to YOLO format (normalized)
center_x = (x + w / 2) / img_width
center_y = (y + h / 2) / img_height
norm_w = w / img_width
norm_h = h / img_height
f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")
# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
coco_to_yolo('annotations/instances_train2017.json', './yolo_dataset', class_mapping)
3.2 Validating Annotation Files
Validating annotation files is a critical step for ensuring data quality and avoiding errors during training.
Validation Script
Complete Validation Script:
import os
from PIL import Image
def validate_yolo_dataset(dataset_dir):
"""
Validate a YOLO dataset
"""
images_dir = os.path.join(dataset_dir, 'images')
labels_dir = os.path.join(dataset_dir, 'labels')
errors = []
warnings = []
# Get all image files
image_files = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
for img_file in image_files:
img_path = os.path.join(images_dir, img_file)
label_file = os.path.splitext(img_file)[0] + '.txt'
label_path = os.path.join(labels_dir, label_file)
# Check 1: Does annotation file exist?
if not os.path.exists(label_path):
errors.append(f"Missing annotation file: {label_file}")
continue
# Check 2: Can the image be opened?
try:
img = Image.open(img_path)
img_width, img_height = img.size
except Exception as e:
errors.append(f"Cannot open image: {img_file} - {str(e)}")
continue
# Check 3: Read annotation file
try:
with open(label_path, 'r') as f:
lines = f.readlines()
except Exception as e:
errors.append(f"Cannot read annotation file: {label_file} - {str(e)}")
continue
# Check 4: Validate each line's format
for line_num, line in enumerate(lines, 1):
line = line.strip()
if not line:
continue
parts = line.split()
# Format check: should have 5 numbers
if len(parts) != 5:
errors.append(f"{label_file}:{line_num} - Format error, expected 5 numbers, got {len(parts)}")
continue
try:
class_id = int(parts[0])
center_x = float(parts[1])
center_y = float(parts[2])
width = float(parts[3])
height = float(parts[4])
except ValueError as e:
errors.append(f"{label_file}:{line_num} - Cannot parse numbers: {str(e)}")
continue
# Check 5: Is class ID valid?
if class_id < 0:
errors.append(f"{label_file}:{line_num} - Class ID cannot be negative: {class_id}")
# Check 6: Are coordinates in 0-1 range?
if not (0 <= center_x <= 1):
errors.append(f"{label_file}:{line_num} - center_x out of range: {center_x}")
if not (0 <= center_y <= 1):
errors.append(f"{label_file}:{line_num} - center_y out of range: {center_y}")
if not (0 < width <= 1):
errors.append(f"{label_file}:{line_num} - width out of range: {width}")
if not (0 < height <= 1):
errors.append(f"{label_file}:{line_num} - height out of range: {height}")
# Check 7: Does bounding box exceed image bounds?
x_min = center_x - width / 2
x_max = center_x + width / 2
y_min = center_y - height / 2
y_max = center_y + height / 2
if x_min < 0 or x_max > 1 or y_min < 0 or y_max > 1:
warnings.append(f"{label_file}:{line_num} - Bounding box exceeds image bounds")
# Check 8: Is bounding box too small?
if width < 0.01 or height < 0.01:
warnings.append(f"{label_file}:{line_num} - Bounding box too small (possible annotation error)")
# Check 9: Is bounding box too large?
if width > 0.95 or height > 0.95:
warnings.append(f"{label_file}:{line_num} - Bounding box too large (possible annotation error)")
# Output results
print("=" * 50)
print("Validation Results")
print("=" * 50)
if errors:
print(f"\nFound {len(errors)} errors:")
for error in errors[:10]: # Show first 10 only
print(f" - {error}")
if len(errors) > 10:
print(f" ... and {len(errors) - 10} more errors")
else:
print("\nNo errors found")
if warnings:
print(f"\nFound {len(warnings)} warnings:")
for warning in warnings[:10]: # Show first 10 only
print(f" - {warning}")
if len(warnings) > 10:
print(f" ... and {len(warnings) - 10} more warnings")
else:
print("\nNo warnings found")
return len(errors) == 0
# Usage
is_valid = validate_yolo_dataset('./dataset')
if is_valid:
print("\nDataset validation passed, ready to start training")
else:
print("\nDataset validation failed, please fix errors before training")
Validation Checklist
File Integrity:
- Every image has a corresponding annotation file
- Every annotation file has a corresponding image
- Filenames match (except for extensions)
Format Correctness:
- Each annotation file line has 5 numbers
- All numbers are valid floats
- Class IDs are integers
Coordinate Validity:
- All coordinate values are in the 0-1 range
- Bounding boxes don't exceed image bounds
- Bounding box sizes are reasonable (not too small or too large)
Data Consistency:
- Class IDs are consecutive (0, 1, 2, ...)
- No duplicate annotations
- Annotations match image content
3.3 Creating Dataset Configuration Files
The dataset configuration file is required for YOLO training, defining dataset paths, categories, and other information.
YOLOv8 Configuration File
Standard Format (dataset.yaml):
# Dataset path (relative to this file or absolute path)
path: /path/to/dataset # Dataset root directory
# Training and validation set paths (relative to path)
train: images/train # Training set image directory
val: images/val # Validation set image directory
test: images/test # Test set image directory (optional)
# Number of categories
nc: 5
# Category names (must correspond to class IDs)
names:
0: car
1: person
2: bicycle
3: motorcycle
4: bus
YOLOv5 Configuration File
Standard Format (dataset.yaml):
# Dataset paths
train: /path/to/dataset/images/train
val: /path/to/dataset/images/val
test: /path/to/dataset/images/test # Optional
# Number of categories
nc: 5
# Category names
names: ['car', 'person', 'bicycle', 'motorcycle', 'bus']
Configuration File Generation Script
Auto-Generation Script:
import os
import yaml
def create_dataset_yaml(dataset_dir, class_names, output_file='dataset.yaml', yolo_version='v8'):
"""
Auto-generate dataset configuration file
"""
# Check directory structure
images_dir = os.path.join(dataset_dir, 'images')
labels_dir = os.path.join(dataset_dir, 'labels')
# Check for train/val/test subdirectories
has_splits = os.path.exists(os.path.join(images_dir, 'train'))
if yolo_version == 'v8':
if has_splits:
config = {
'path': os.path.abspath(dataset_dir),
'train': 'images/train',
'val': 'images/val',
'nc': len(class_names),
'names': {i: name for i, name in enumerate(class_names)}
}
# If test set exists
if os.path.exists(os.path.join(images_dir, 'test')):
config['test'] = 'images/test'
else:
# If no splits, use images directory
config = {
'path': os.path.abspath(dataset_dir),
'train': 'images',
'val': 'images', # Note: actual use requires splitting
'nc': len(class_names),
'names': {i: name for i, name in enumerate(class_names)}
}
else: # YOLOv5
if has_splits:
config = {
'train': os.path.join(os.path.abspath(dataset_dir), 'images', 'train'),
'val': os.path.join(os.path.abspath(dataset_dir), 'images', 'val'),
'nc': len(class_names),
'names': class_names
}
if os.path.exists(os.path.join(images_dir, 'test')):
config['test'] = os.path.join(os.path.abspath(dataset_dir), 'images', 'test')
else:
config = {
'train': os.path.join(os.path.abspath(dataset_dir), 'images'),
'val': os.path.join(os.path.abspath(dataset_dir), 'images'),
'nc': len(class_names),
'names': class_names
}
# Save configuration file
with open(output_file, 'w', encoding='utf-8') as f:
yaml.dump(config, f, allow_unicode=True, default_flow_style=False)
print(f"Configuration file generated: {output_file}")
print("\nConfiguration file contents:")
print("=" * 50)
with open(output_file, 'r', encoding='utf-8') as f:
print(f.read())
print("=" * 50)
# Usage example
class_names = ['car', 'person', 'bicycle', 'motorcycle', 'bus']
create_dataset_yaml('./dataset', class_names, 'dataset.yaml', yolo_version='v8')
Configuration File Validation
Validation Script:
import yaml
import os
def validate_dataset_yaml(yaml_file, dataset_dir):
"""
Validate dataset configuration file
"""
with open(yaml_file, 'r', encoding='utf-8') as f:
config = yaml.safe_load(f)
errors = []
# Check required fields
required_fields = ['nc', 'names']
for field in required_fields:
if field not in config:
errors.append(f"Missing required field: {field}")
# Check category count
if 'nc' in config and 'names' in config:
if isinstance(config['names'], dict):
num_names = len(config['names'])
else:
num_names = len(config['names'])
if config['nc'] != num_names:
errors.append(f"Category count mismatch: nc={config['nc']}, names count={num_names}")
# Check paths
if 'path' in config:
path = config['path']
if not os.path.isabs(path):
path = os.path.join(os.path.dirname(yaml_file), path)
if not os.path.exists(path):
errors.append(f"Dataset path does not exist: {path}")
# Check training and validation set paths
for split in ['train', 'val']:
if split in config:
split_path = config[split]
if 'path' in config:
full_path = os.path.join(config['path'], split_path)
else:
full_path = split_path
if not os.path.exists(full_path):
errors.append(f"{split} path does not exist: {full_path}")
if errors:
print("Configuration file validation failed:")
for error in errors:
print(f" - {error}")
return False
else:
print("Configuration file validation passed")
return True
# Usage
validate_dataset_yaml('dataset.yaml', './dataset')
Configuration File Checklist
Basic Configuration:
- Category count (nc) is correct
- Category names (names) are complete
- Class IDs start from 0 consecutively
Path Configuration:
- Dataset path (path) is correct
- Training set path (train) exists
- Validation set path (val) exists
- Test set path (test) exists (if used)
Format Correctness:
- YAML format is correct
- Encoding is UTF-8
- Indentation is correct (using spaces, not tabs)
Step 4: Dataset Splitting
Dataset splitting is a critical pre-training step. Proper splitting ensures accurate model evaluation.
4.1 Splitting Strategy
Choosing Split Ratios
Standard Split Ratios:
| Dataset Size | Training Set | Validation Set | Test Set | Notes |
|---|---|---|---|---|
| Small (< 1000 images) | 70% | 15% | 15% | Ensure sufficient training data |
| Medium (1000-10000 images) | 75% | 12.5% | 12.5% | Balance training and evaluation |
| Large (> 10000 images) | 80% | 10% | 10% | Ample training data, sufficient validation |
Why Three Sets?
-
Training Set (Train):
- Used for model training
- Model learns data features
- Typically 70-80%
-
Validation Set (Validation):
- Used for hyperparameter tuning
- Monitors training progress
- Prevents overfitting
- Typically 10-15%
-
Test Set (Test):
- Used for final evaluation
- Not involved in training or tuning
- Reflects true model performance
- Typically 10-15%
Splitting Principles
1. Random Split (Basic Method)
Suitable Scenarios:
- Similar data scenes
- No time series relationships
- No scene correlations
Method:
- Randomly shuffle all data
- Split by ratio
- Ensure consistent class distribution
2. Stratified Split (Recommended)
Suitable Scenarios:
- Imbalanced classes
- Need to ensure consistent class ratios
Method:
- Split each class separately
- Each class split at the same ratio
- Ensure consistent class distribution across train, val, and test sets
3. Scene-Based Split (Advanced Method)
Suitable Scenarios:
- Data from different scenes
- Need to test generalization ability
- Avoid data leakage
Method:
- Group by scene
- Data from the same scene stays in the same set
- Avoid scene overlap between training and test sets
Real Case:
An autonomous driving project had road data from 5 cities. Random splitting could result in both training and test sets containing data from the same city, making test results overly optimistic. The correct approach is to split by city: 3 cities for training, 1 for validation, 1 for testing.
Class Balance Check
Check Script:
import os
from collections import Counter
def check_class_balance(dataset_dir, splits=['train', 'val', 'test']):
"""
Check class distribution across splits
"""
results = {}
for split in splits:
labels_dir = os.path.join(dataset_dir, 'labels', split)
if not os.path.exists(labels_dir):
continue
class_counts = Counter()
total_objects = 0
for label_file in os.listdir(labels_dir):
if label_file.endswith('.txt'):
with open(os.path.join(labels_dir, label_file), 'r') as f:
for line in f:
if line.strip():
class_id = int(line.split()[0])
class_counts[class_id] += 1
total_objects += 1
results[split] = {
'class_counts': dict(class_counts),
'total_objects': total_objects,
'num_images': len([f for f in os.listdir(labels_dir) if f.endswith('.txt')])
}
# Print results
print("=" * 60)
print("Class Distribution Statistics")
print("=" * 60)
for split, data in results.items():
print(f"\n{split.upper()} set:")
print(f" Image count: {data['num_images']}")
print(f" Total objects: {data['total_objects']}")
print(f" Class distribution:")
for class_id in sorted(data['class_counts'].keys()):
count = data['class_counts'][class_id]
percentage = count / data['total_objects'] * 100
print(f" Class {class_id}: {count} ({percentage:.1f}%)")
# Check balance
print("\n" + "=" * 60)
print("Balance Check")
print("=" * 60)
if 'train' in results:
train_counts = results['train']['class_counts']
max_count = max(train_counts.values())
min_count = min(train_counts.values())
imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')
print(f"Training set class imbalance ratio: {imbalance_ratio:.2f}")
if imbalance_ratio > 10:
print("Warning: Severe class imbalance, recommend balancing data")
elif imbalance_ratio > 5:
print("Note: Class imbalance exists, consider balancing")
else:
print("Class distribution is relatively balanced")
# Usage
check_class_balance('./dataset')
4.2 Splitting with Scripts
Basic Splitting Script
Simple Random Split:
import os
import shutil
import random
def split_dataset_simple(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
"""
Simple random dataset split
"""
# Set random seed for reproducibility
random.seed(seed)
images_dir = os.path.join(source_dir, 'images')
labels_dir = os.path.join(source_dir, 'labels')
# Get all images
images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
random.shuffle(images)
# Calculate split points
total = len(images)
train_end = int(total * train_ratio)
val_end = train_end + int(total * val_ratio)
# Split
train_images = images[:train_end]
val_images = images[train_end:val_end]
test_images = images[val_end:]
print(f"Total images: {total}")
print(f"Training set: {len(train_images)} ({len(train_images)/total*100:.1f}%)")
print(f"Validation set: {len(val_images)} ({len(val_images)/total*100:.1f}%)")
print(f"Test set: {len(test_images)} ({len(test_images)/total*100:.1f}%)")
# Copy files
for split, img_list in [('train', train_images),
('val', val_images),
('test', test_images)]:
split_images_dir = os.path.join(source_dir, 'images', split)
split_labels_dir = os.path.join(source_dir, 'labels', split)
os.makedirs(split_images_dir, exist_ok=True)
os.makedirs(split_labels_dir, exist_ok=True)
for img in img_list:
# Copy image
src_img = os.path.join(images_dir, img)
dst_img = os.path.join(split_images_dir, img)
shutil.copy(src_img, dst_img)
# Copy annotation
label_name = os.path.splitext(img)[0] + '.txt'
src_label = os.path.join(labels_dir, label_name)
dst_label = os.path.join(split_labels_dir, label_name)
if os.path.exists(src_label):
shutil.copy(src_label, dst_label)
else:
print(f"Warning: Annotation file missing: {label_name}")
print("\nDataset split complete")
# Usage
split_dataset_simple('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)
Stratified Splitting Script (Recommended)
Stratified Split by Class:
import os
import shutil
import random
from collections import defaultdict
def split_dataset_stratified(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
"""
Stratified dataset split (by class)
"""
random.seed(seed)
images_dir = os.path.join(source_dir, 'images')
labels_dir = os.path.join(source_dir, 'labels')
# Group images by class
images_by_class = defaultdict(list)
for img_file in os.listdir(images_dir):
if img_file.endswith(('.jpg', '.png')):
label_file = os.path.splitext(img_file)[0] + '.txt'
label_path = os.path.join(labels_dir, label_file)
if os.path.exists(label_path):
# Read annotation file, get classes
with open(label_path, 'r') as f:
classes = set()
for line in f:
if line.strip():
class_id = int(line.split()[0])
classes.add(class_id)
# If image contains multiple classes, use the dominant class
if classes:
main_class = max(classes, key=lambda c: sum(1 for line in open(label_path) if line.strip() and int(line.split()[0]) == c))
images_by_class[main_class].append(img_file)
# Split each class separately
train_images = []
val_images = []
test_images = []
for class_id, images in images_by_class.items():
random.shuffle(images)
total = len(images)
train_end = int(total * train_ratio)
val_end = train_end + int(total * val_ratio)
train_images.extend(images[:train_end])
val_images.extend(images[train_end:val_end])
test_images.extend(images[val_end:])
print(f"Class {class_id}: total={total}, train={train_end}, val={val_end-train_end}, test={total-val_end}")
# Shuffle final lists
random.shuffle(train_images)
random.shuffle(val_images)
random.shuffle(test_images)
print(f"\nFinal split results:")
print(f"Training set: {len(train_images)}")
print(f"Validation set: {len(val_images)}")
print(f"Test set: {len(test_images)}")
# Copy files
for split, img_list in [('train', train_images),
('val', val_images),
('test', test_images)]:
split_images_dir = os.path.join(source_dir, 'images', split)
split_labels_dir = os.path.join(source_dir, 'labels', split)
os.makedirs(split_images_dir, exist_ok=True)
os.makedirs(split_labels_dir, exist_ok=True)
for img in img_list:
# Copy image
shutil.copy(os.path.join(images_dir, img),
os.path.join(split_images_dir, img))
# Copy annotation
label_name = os.path.splitext(img)[0] + '.txt'
src_label = os.path.join(labels_dir, label_name)
dst_label = os.path.join(split_labels_dir, label_name)
if os.path.exists(src_label):
shutil.copy(src_label, dst_label)
print("\nStratified split complete")
# Usage
split_dataset_stratified('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)
Post-Split Validation
Validation Script:
def verify_split(dataset_dir):
"""
Verify dataset split results
"""
splits = ['train', 'val', 'test']
for split in splits:
images_dir = os.path.join(dataset_dir, 'images', split)
labels_dir = os.path.join(dataset_dir, 'labels', split)
if not os.path.exists(images_dir):
print(f"{split} set image directory does not exist")
continue
images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
labels = [f for f in os.listdir(labels_dir) if f.endswith('.txt')]
# Check if images and annotations match
missing_labels = []
for img in images:
label_name = os.path.splitext(img)[0] + '.txt'
if label_name not in labels:
missing_labels.append(label_name)
if missing_labels:
print(f"{split} set has {len(missing_labels)} images missing annotation files")
else:
print(f"{split} set: {len(images)} images, {len(labels)} annotation files, all matched")
# Usage
verify_split('./dataset')
Split Checklist
Pre-Split Preparation:
- All images are annotated
- Annotation files are validated
- Data is cleaned
Split Process:
- Random seed used for reproducibility
- Stratified split by class (recommended)
- Consistent class distribution maintained
Post-Split Validation:
- Images and annotation files match
- Class distribution checked per split
- Split ratios match expectations
Directory Structure:
- train/val/test subdirectories created
- Images and annotation files correctly copied
- Clear directory structure
Step 5: Model Training
Model training is the process of converting annotated data into a usable model, requiring proper parameter configuration and training process monitoring.
5.1 Installing the YOLO Environment
YOLOv8 Installation (Recommended)
Why Choose YOLOv8?
- Latest version, best performance
- Simple installation, one command
- Friendly API, easy to use
- Comprehensive documentation, active community
Installation Steps:
1. Basic Installation:
# Install ultralytics (includes YOLOv8)
pip install ultralytics
# Verify installation
python -c "from ultralytics import YOLO; print('YOLOv8 installed successfully')"
2. GPU Support (Optional but Highly Recommended):
# Check if CUDA is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
# If CUDA is not available, install CPU version
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
3. Dependency Check:
# Check key dependencies
pip list | grep -E "torch|ultralytics|opencv|pillow"
Environment Requirements:
- Python 3.8+
- PyTorch 1.8+
- CUDA 11.0+ (for GPU training, optional)
YOLOv5 Installation (Alternative)
Installation Steps:
# Clone repository
git clone https://github.com/ultralytics/yolov5
cd yolov5
# Install dependencies
pip install -r requirements.txt
# Verify installation
python detect.py --help
Dependency Requirements:
- Python 3.7+
- PyTorch 1.7+
- Other dependencies in requirements.txt
5.2 Training Configuration
YOLOv8 Training Configuration Details
Complete Training Script:
from ultralytics import YOLO
import torch
# Check device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")
# Load pre-trained model
# Model selection:
# - yolov8n.pt: nano (smallest, fastest)
# - yolov8s.pt: small (small, fast)
# - yolov8m.pt: medium (balanced)
# - yolov8l.pt: large (high precision)
# - yolov8x.pt: xlarge (highest precision)
model = YOLO('yolov8n.pt') # Choose based on needs
# Training configuration
results = model.train(
# Dataset configuration
data='dataset.yaml', # Dataset config file path
# Training parameters
epochs=100, # Training epochs (recommend: 100-300)
imgsz=640, # Input image size (640/416/1280)
batch=16, # Batch size (adjust based on GPU memory)
device=device, # Device ('cuda'/'cpu'/'0,1' for multi-GPU)
# Optimizer parameters
lr0=0.01, # Initial learning rate
lrf=0.01, # Final learning rate (lr0 * lrf)
momentum=0.937, # Momentum
weight_decay=0.0005, # Weight decay
# Data augmentation
hsv_h=0.015, # Hue augmentation
hsv_s=0.7, # Saturation augmentation
hsv_v=0.4, # Value augmentation
degrees=0.0, # Rotation angle
translate=0.1, # Translation
scale=0.5, # Scale
flipud=0.0, # Vertical flip probability
fliplr=0.5, # Horizontal flip probability
mosaic=1.0, # Mosaic augmentation probability
mixup=0.0, # MixUp augmentation probability
# Training settings
patience=50, # Early stopping patience (epochs without improvement)
save=True, # Save checkpoints
save_period=10, # Save every N epochs
val=True, # Validate during training
plots=True, # Generate training curve plots
# Project settings
project='runs/detect', # Project directory
name='my_model', # Experiment name
exist_ok=True, # Allow overwriting existing experiments
pretrained=True, # Use pre-trained weights
optimizer='SGD', # Optimizer (SGD/Adam/AdamW)
verbose=True, # Verbose output
seed=0, # Random seed
deterministic=True, # Deterministic training
single_cls=False, # Single class mode
rect=False, # Rectangular training
cos_lr=False, # Cosine learning rate schedule
close_mosaic=10, # Disable Mosaic for last N epochs
resume=False, # Resume training
amp=True, # Automatic mixed precision
fraction=1.0, # Fraction of dataset to use
profile=False, # Performance profiling
freeze=None, # Freeze layers (e.g., freeze=10 freezes first 10 layers)
)
# After training
print("Training complete!")
print(f"Best model saved at: {results.save_dir}")
Key Parameter Details
1. Model Selection:
| Model | Parameters | Speed | Precision | Use Case |
|---|---|---|---|---|
| yolov8n | 3.2M | Fastest | Lower | Real-time detection, edge devices |
| yolov8s | 11.2M | Fast | Medium | Balance speed and precision |
| yolov8m | 25.9M | Medium | Higher | Production environment (recommended) |
| yolov8l | 43.7M | Slower | High | High precision requirements |
| yolov8x | 68.2M | Slowest | Highest | Research, maximum precision |
Selection Advice:
- Beginners: yolov8n (quick validation)
- Production: yolov8m (balanced)
- High precision: yolov8l or yolov8x
2. Batch Size:
GPU Memory vs Batch Size:
| GPU Memory | Recommended Batch Size (640x640) |
|---|---|
| 4GB | 4-8 |
| 6GB | 8-12 |
| 8GB | 12-16 |
| 12GB | 16-24 |
| 16GB+ | 24-32 |
Adjustment Method:
- If out of memory, reduce batch or imgsz
- If memory is sufficient, larger batch improves training stability
3. Learning Rate (lr0):
Learning Rate Selection:
- Default: 0.01 (SGD optimizer)
- Small datasets: 0.001-0.005
- Large datasets: 0.01-0.02
- Fine-tuning: 0.0001-0.001
Learning Rate Scheduling:
- Cosine annealing: cos_lr=True, learning rate follows cosine curve
- Linear decay: Default, learning rate decreases linearly
4. Training Epochs:
Epoch Recommendations:
- Small datasets (< 1000 images): 200-300 epochs
- Medium datasets (1000-10000 images): 100-200 epochs
- Large datasets (> 10000 images): 50-100 epochs
Early Stopping:
- patience=50: Stops if validation performance doesn't improve for 50 epochs
- Prevents overfitting, saves training time
YOLOv5 Training Configuration
Training Script:
import torch
from pathlib import Path
# Set paths
data_yaml = 'dataset.yaml'
weights = 'yolov5s.pt' # Pre-trained weights
epochs = 100
batch_size = 16
img_size = 640
device = '0' if torch.cuda.is_available() else 'cpu'
# Training command (via command line)
# python train.py --data dataset.yaml --weights yolov5s.pt --epochs 100 --batch-size 16 --img 640 --device 0
5.3 Training Monitoring
Key Metrics Explained
1. mAP (Mean Average Precision):
mAP50:
- Average precision at IoU threshold=0.5
- Measures overall model performance
- Target: > 0.5 (50%)
mAP50-95:
- Average precision across IoU thresholds from 0.5 to 0.95
- Stricter evaluation standard
- Target: > 0.3 (30%)
2. Precision:
- Proportion of true positives among predicted positives
- Measures false positive rate
- Target: > 0.8 (80%)
3. Recall:
- Proportion of true positives correctly predicted
- Measures miss rate
- Target: > 0.8 (80%)
4. Loss:
Training Loss (train/box_loss):
- Bounding box loss on training set
- Should continuously decrease
Validation Loss (val/box_loss):
- Bounding box loss on validation set
- Should decrease; if it increases, indicates overfitting
Training Process Monitoring
Real-Time Monitoring:
# Training automatically generates:
# - Training curve plots (results.png)
# - Confusion matrix (confusion_matrix.png)
# - Validation results (val_batch*.jpg)
# - Training logs (results.csv)
Viewing Training Logs:
import pandas as pd
import matplotlib.pyplot as plt
# Read training logs
df = pd.read_csv('runs/detect/my_model/results.csv')
# Plot training curves
plt.figure(figsize=(12, 4))
plt.subplot(1, 3, 1)
plt.plot(df['epoch'], df['train/box_loss'], label='Train Loss')
plt.plot(df['epoch'], df['val/box_loss'], label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Loss Curve')
plt.subplot(1, 3, 2)
plt.plot(df['epoch'], df['metrics/mAP50(B)'], label='mAP50')
plt.xlabel('Epoch')
plt.ylabel('mAP50')
plt.legend()
plt.title('mAP50 Curve')
plt.subplot(1, 3, 3)
plt.plot(df['epoch'], df['metrics/precision(B)'], label='Precision')
plt.plot(df['epoch'], df['metrics/recall(B)'], label='Recall')
plt.xlabel('Epoch')
plt.ylabel('Score')
plt.legend()
plt.title('Precision & Recall')
plt.tight_layout()
plt.savefig('training_curves.png')
plt.show()
Training Tips and Best Practices
1. Learning Rate Adjustment Strategy:
Warm-up:
- Use a smaller learning rate for the first few epochs
- Helps stabilize training
- YOLOv8 supports this by default
Learning Rate Decay:
- Use cosine annealing: cos_lr=True
- Or linear decay: default
2. Data Augmentation Strategy:
Basic Augmentation (enabled by default):
- Horizontal flip: fliplr=0.5
- Color augmentation: hsv_h/s/v
- Mosaic: mosaic=1.0
Advanced Augmentation (optional):
- MixUp: mixup=0.15 (for small datasets)
- Rotation: degrees=10 (if target orientation doesn't matter)
3. Early Stopping:
Settings:
patience=50 # Stop if validation performance doesn't improve for 50 epochs
Benefits:
- Prevents overfitting
- Saves training time
- Automatically selects the best model
4. Model Checkpoints:
Auto-Save:
- Best model automatically saved each epoch
- Saved at: runs/detect/my_model/weights/best.pt
Manual Save:
# Save at any point during training
model.save('my_checkpoint.pt')
Resume Training:
# Resume training from checkpoint
model = YOLO('runs/detect/my_model/weights/last.pt')
model.train(resume=True)
Training Problem Diagnosis
Problem 1: Loss Not Decreasing
Possible Causes:
- Learning rate too high or too low
- Poor data quality
- Inappropriate model selection
Solutions:
- Adjust learning rate (try 0.001-0.01)
- Check data quality
- Try a larger model
Problem 2: Overfitting (Training loss decreasing, validation loss increasing)
Possible Causes:
- Insufficient data
- Model too large
- Insufficient data augmentation
Solutions:
- Increase data volume
- Use a smaller model
- Increase data augmentation
- Use dropout or regularization
Problem 3: Training Too Slow
Possible Causes:
- Training on CPU
- Batch size too small
- Image size too large
Solutions:
- Use GPU training
- Increase batch size
- Reduce image size (e.g., 640 -> 416)
Training Checklist
Pre-Training Preparation:
- Dataset split (train/val/test)
- Dataset config file (dataset.yaml) correct
- Environment installed (YOLOv8/YOLOv5)
- GPU available (if using GPU)
Training Configuration:
- Appropriate model size selected
- Batch size set based on GPU memory
- Learning rate set reasonably
- Sufficient training epochs
Training Monitoring:
- Real-time training log review
- Loss curve monitoring
- mAP curve monitoring
- Validation set performance check
Training Optimization:
- Early stopping enabled
- Checkpoints saved
- Hyperparameters tuned
- Training curves analyzed
Step 6: Model Evaluation and Optimization
Model evaluation is the critical step for validating model performance, and optimization is the ongoing process of improving it.
6.1 Evaluating the Model
Basic Evaluation
YOLOv8 Evaluation Script:
from ultralytics import YOLO
# Load trained model
model = YOLO('runs/detect/my_model/weights/best.pt')
# Evaluate on validation set
metrics = model.val(data='dataset.yaml', split='val')
# Print key metrics
print("=" * 50)
print("Model Evaluation Results")
print("=" * 50)
print(f"mAP50: {metrics.box.map50:.4f}")
print(f"mAP50-95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall: {metrics.box.mr:.4f}")
print("=" * 50)
# Evaluate on test set (if exists)
if os.path.exists('dataset/images/test'):
test_metrics = model.val(data='dataset.yaml', split='test')
print("\nTest set evaluation results:")
print(f"mAP50: {test_metrics.box.map50:.4f}")
print(f"mAP50-95: {test_metrics.box.map:.4f}")
Detailed Evaluation Metrics
1. Per-Class Evaluation:
# Get detailed metrics for each class
for i, class_name in enumerate(model.names.values()):
print(f"\nClass {i} ({class_name}):")
print(f" Precision: {metrics.box.p[i]:.4f}")
print(f" Recall: {metrics.box.r[i]:.4f}")
print(f" mAP50: {metrics.box.ap50[i]:.4f}")
print(f" mAP50-95: {metrics.box.ap[i]:.4f}")
2. Confusion Matrix Analysis:
# View confusion matrix (auto-generated in results directory)
# File location: runs/detect/my_model/confusion_matrix.png
# Analysis:
# - Diagonal: Correct classifications
# - Off-diagonal: Misclassifications
# - Identify easily confused class pairs
3. Visualizing Detection Results:
# Visualize detection results on test images
results = model('dataset/images/test', save=True, conf=0.25)
# View detection results
for result in results:
# Get detection boxes
boxes = result.boxes
# Get classes
classes = boxes.cls
# Get confidence scores
confidences = boxes.conf
print(f"Detected {len(boxes)} objects")
for i in range(len(boxes)):
class_name = model.names[int(classes[i])]
conf = confidences[i]
print(f" {class_name}: {conf:.2f}")
Performance Benchmarks
Performance Evaluation Standards:
| Application Scenario | mAP50 Target | mAP50-95 Target | Notes |
|---|---|---|---|
| Quick Prototype | > 0.5 | > 0.3 | Validate ideas |
| Production Environment | > 0.7 | > 0.5 | Real-world application |
| High-Precision Application | > 0.9 | > 0.7 | Critical applications |
Real Case:
An industrial quality inspection project:
- Initial model: mAP50=0.65, couldn't meet production requirements
- After optimization: mAP50=0.85, met production standards
- Optimization methods: Improved data quality, increased data volume, tuned hyperparameters
6.2 Common Problems and Solutions
Problem Diagnosis Workflow
1. Low Accuracy (mAP < 0.5)
Diagnosis Steps:
# 1. Check data quality
# - Are annotations accurate?
# - Is data balanced?
# - Are scenes diverse?
# 2. Check model training
# - Is loss decreasing normally?
# - Is training sufficient?
# - Is learning rate appropriate?
# 3. Check model selection
# - Is the model too small?
# - Do you need a larger model?
Solutions:
- Improve data quality: Re-check annotations, correct errors
- Increase data volume: Collect more high-quality data
- Use a larger model: Upgrade from yolov8n to yolov8m
- Tune hyperparameters: Learning rate, batch size, etc.
2. Overfitting (Low training loss, high validation loss)
Diagnosis:
# Check training curves
# - train/box_loss continuously decreasing
# - val/box_loss first decreasing then increasing
# - High training mAP, low validation mAP
Solutions:
- Increase data volume: Collect more data
- Data augmentation: Enable more augmentation
- Use a smaller model: Reduce model complexity
- Regularization: Increase dropout or weight decay
- Early stopping: Use early stopping mechanism
3. High Miss Rate (Low Recall)
Diagnosis:
# Check per-class recall
for i, class_name in enumerate(model.names.values()):
recall = metrics.box.r[i]
if recall < 0.7:
print(f"Warning: {class_name} recall is low: {recall:.2f}")
Possible Causes:
- Imbalanced data (some classes have few samples)
- Small object detection difficulty
- Threshold set too high
Solutions:
- Balance data: Increase minority class samples
- Lower confidence threshold: conf=0.15-0.25
- Use higher resolution: imgsz=1280
- Data augmentation: Target small object augmentation
4. High False Positive Rate (Low Precision)
Diagnosis:
# Check per-class precision
for i, class_name in enumerate(model.names.values()):
precision = metrics.box.p[i]
if precision < 0.7:
print(f"Warning: {class_name} precision is low: {precision:.2f}")
Possible Causes:
- Insufficient negative samples
- High class similarity
- Threshold set too low
Solutions:
- Add negative samples: Include images without targets
- Raise confidence threshold: conf=0.3-0.5
- Refine categories: Distinguish similar classes
- Post-processing optimization: Adjust NMS threshold
5. Training Too Slow or Not Converging
Diagnosis:
# Check training process
# - Is loss decreasing?
# - Is learning rate appropriate?
# - Is GPU utilization high?
Solutions:
- Use GPU: Ensure GPU training
- Adjust batch size: Based on GPU memory
- Adjust learning rate: Try different learning rates
- Check data: Ensure correct data format
Problem-Solution Reference Table
| Problem | Symptoms | Possible Causes | Solutions |
|---|---|---|---|
| Low accuracy | mAP < 0.5 | Poor data quality, insufficient data | Improve data quality, increase data |
| Overfitting | Good on train, poor on val | Insufficient data, model too large | More data, smaller model, augmentation |
| High miss rate | Recall < 0.7 | Imbalanced data, high threshold | Balance data, lower threshold |
| High false positives | Precision < 0.7 | Insufficient negatives, low threshold | Add negatives, raise threshold |
| Slow training | Long training time | CPU training, small batch | Use GPU, increase batch |
| Not converging | Loss not decreasing | Wrong learning rate, data issues | Adjust learning rate, check data |
6.3 Model Optimization
Optimization Strategies
1. Data Optimization
Increase Data Volume:
- Collect more high-quality data
- Use data augmentation (rotation, flip, brightness, etc.)
- Supplement from public datasets
Improve Data Quality:
- Re-check annotations, correct errors
- Standardize annotation criteria
- Balance class data
Data Augmentation Script:
# Using YOLOv8's built-in data augmentation
# Automatically applied during training, no manual processing needed
# Adjustable via parameters:
model.train(
hsv_h=0.015, # Hue augmentation
hsv_s=0.7, # Saturation augmentation
hsv_v=0.4, # Value augmentation
degrees=10, # Rotation angle
translate=0.1, # Translation
scale=0.5, # Scale
mosaic=1.0, # Mosaic augmentation
mixup=0.15, # MixUp augmentation
)
2. Hyperparameter Optimization
Learning Rate Optimization:
# Try different learning rates
learning_rates = [0.001, 0.005, 0.01, 0.02]
for lr in learning_rates:
model = YOLO('yolov8n.pt')
results = model.train(
data='dataset.yaml',
epochs=50,
lr0=lr,
name=f'lr_{lr}',
)
print(f"LR={lr}, mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")
Batch Size Optimization:
# Adjust batch size based on GPU memory
# Larger batches are generally more stable but require more memory
batch_sizes = [8, 16, 32]
for batch in batch_sizes:
model = YOLO('yolov8n.pt')
results = model.train(
data='dataset.yaml',
epochs=50,
batch=batch,
name=f'batch_{batch}',
)
3. Model Selection Optimization
Model Size Comparison:
# Test different model sizes
models = ['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']
for model_name in models:
model = YOLO(model_name)
results = model.train(
data='dataset.yaml',
epochs=100,
name=model_name.replace('.pt', ''),
)
print(f"{model_name}: mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")
4. Post-Processing Optimization
Adjusting Confidence Threshold:
# Default threshold is 0.25, adjustable based on needs
# Higher threshold: fewer false positives, but may increase misses
# Lower threshold: fewer misses, but may increase false positives
# Adjust during inference
results = model('test_image.jpg', conf=0.3) # Raise threshold
results = model('test_image.jpg', conf=0.15) # Lower threshold
Adjusting NMS Threshold:
# NMS (Non-Maximum Suppression) removes duplicate detections
# iou parameter controls NMS IoU threshold
# Higher iou: stricter NMS, fewer duplicate detections
# Lower iou: more lenient NMS, may keep more detection boxes
results = model('test_image.jpg', iou=0.45) # Default is 0.7
5. Model Ensemble
Multi-Model Voting:
from ultralytics import YOLO
import numpy as np
# Load multiple models
models = [
YOLO('runs/detect/model1/weights/best.pt'),
YOLO('runs/detect/model2/weights/best.pt'),
YOLO('runs/detect/model3/weights/best.pt'),
]
# Predict on the same image
image = 'test_image.jpg'
predictions = [model(image, conf=0.25) for model in models]
# Voting or averaging (simplified example)
# Real applications require more sophisticated ensemble strategies
Optimization Checklist
Data Optimization:
- Sufficient data volume
- High data quality
- Balanced classes
- Diverse scenes
Training Optimization:
- Appropriate learning rate
- Reasonable batch size
- Sufficient training epochs
- Data augmentation enabled
Model Optimization:
- Appropriate model size
- Pre-trained weights used
- Different models tried
Post-Processing Optimization:
- Appropriate confidence threshold
- Appropriate NMS threshold
- Model ensemble considered
Performance Evaluation:
- mAP meets target
- Precision and Recall balanced
- Per-class performance balanced
- Real-world application results satisfactory
Accelerate Dataset Creation with TjMakeBot
TjMakeBot's Advantages:
-
AI Chat-Based Annotation
- Natural language instructions, fast annotation
- Supports batch processing
- High accuracy
-
Video-to-Frame Feature
- Extract frames from video
- Custom frame rate
- Batch processing
-
Multi-Format Support
- YOLO format export
- VOC, COCO format support
- Convenient format conversion
-
Free (Basic Features)
- No usage limits
- No feature restrictions
- Online and ready to use
Start Using TjMakeBot to Create YOLO Datasets for Free ->
Related Reading
- Why Do 90% of AI Projects Fail? Data Labeling Quality Is Key
- Say Goodbye to Manual Annotation: How AI Chat-Based Annotation Saves 80% of Time
- Multi-Format Annotation: An In-Depth Guide to YOLO/VOC/COCO Formats
Conclusion
Creating a high-quality YOLO dataset is the foundation for model success. By choosing the right tools, following practical methods, and continuously optimizing, you can create high-quality datasets and train excellent models.
Remember: Data quality > Model architecture. Investing time in data yields significant returns.
Legal Disclaimer: The content of this article is for reference only and does not constitute any legal, commercial, or technical advice. When using any tools or methods, please comply with applicable laws and regulations, respect intellectual property rights, and obtain necessary authorizations. All company names, product names, and trademarks mentioned in this article are the property of their respective owners.
About the Author: The TjMakeBot team focuses on AI data annotation tool development, helping developers quickly create high-quality YOLO datasets.
Recommended Reading
- Sports Analytics: A Guide to Athlete Pose and Action Annotation
- Drone Aerial Image Annotation: A Complete Practical Guide from Collection to Training
- Cognitive Bias in Data Labeling: How to Avoid Annotation Errors
- Edge Computing and Lightweight Models: Optimization Strategies for Annotation Data
- AI-Assisted vs Manual Annotation: An In-Depth Cost-Benefit Analysis
- Data Augmentation Techniques: Training Better Models with Limited Data
- Cognitive Bias in Data Labeling: How to Avoid Annotation Errors
- Free vs Paid Annotation Tools: How to Choose the Right One?
Keywords: YOLO dataset, object detection, YOLO annotation, YOLOv8, YOLOv5, dataset creation, image annotation, TjMakeBot
