Skip to main content
TjMakeBot Blogtjmakebot.com

YOLO Dataset Complete Guide: From Zero to Model Training

TjMakeBot TeamTechnical Tutorial12 min
Technical TutorialPractical Methods
YOLO Dataset Complete Guide: From Zero to Model Training

"I want to build an object detection project with YOLO, but I don't know where to start..."

This is a real struggle for many AI developers. YOLO (You Only Look Once) is one of the most widely used algorithms in object detection. From YOLOv1 to the latest YOLOv10, the YOLO series has achieved a strong balance between speed and accuracy.

YOLO Application Scenarios:

  • Autonomous Driving: Real-time detection of vehicles, pedestrians, and traffic signs
  • Industrial Quality Inspection: Rapid detection of product defects
  • Medical Imaging: Assisting doctors in identifying lesions
  • Retail Analytics: Product recognition and inventory management
  • Security Surveillance: Real-time monitoring and anomaly detection

YOLO's Advantages:

  • Fast: Can process video streams in real time
  • Accurate: Achieves a good balance between speed and accuracy
  • Easy to use: Comprehensive tools and documentation
  • Active community: Abundant tutorials and examples

But the first hurdle many developers face when using YOLO is: How do you create a high-quality YOLO dataset?

Today, we'll walk you through creating a complete YOLO dataset from scratch, all the way to successful model training. Whether you're a beginner or an experienced developer, you'll find practical methods and tips in this article.

What Is a YOLO Dataset?

YOLO Data Format

YOLO uses a concise text format to store annotation information:

File Structure:

dataset/
├── images/
│   ├── train/
│   │   ├── image001.jpg
│   │   ├── image002.jpg
│   │   └── ...
│   └── val/
│       ├── image101.jpg
│       └── ...
└── labels/
    ├── train/
    │   ├── image001.txt
    │   ├── image002.txt
    │   └── ...
    └── val/
        ├── image101.txt
        └── ...

Annotation File Format (image001.txt):

class_id center_x center_y width height
0 0.5 0.5 0.3 0.4
1 0.2 0.3 0.1 0.2

Format Description:

  • class_id: Category ID (starting from 0)
  • center_x, center_y: Normalized coordinates of the bounding box center (0-1)
  • width, height: Normalized width and height of the bounding box (0-1)

Key point: YOLO uses normalized coordinates — all coordinate values are between 0 and 1.

YOLO Version Differences

Different YOLO versions have slightly different dataset format requirements:

Version Format Requirements Special Notes
YOLOv5 Standard format Supports custom class counts
YOLOv8 Standard format Ultralytics format recommended
YOLOv9 Standard format Compatible with YOLOv5 format
YOLOv10 Standard format Latest version, best performance

Good news: All YOLO versions use the same data format — your dataset is universally compatible!

Step 1: Data Collection and Preparation

1.1 Define Dataset Requirements

Before you begin, clarifying your requirements is the first step to success. A clear requirements plan can save you significant time and cost.

Requirements Analysis Checklist

1. Target Category Definition

Define Detection Targets:

  • List all object categories to detect
  • Define boundaries for each category (what counts, what doesn't)
  • Consider category hierarchy (e.g., vehicle -> car, truck, bus)

Real Case:

A traffic monitoring project initially defined only one "vehicle" category. After training, they found the model couldn't distinguish cars from trucks. After subdividing into "car," "truck," "bus," and "motorcycle," model accuracy improved by 15%.

Category Count Recommendations:

  • Simple projects: 1-5 categories (suitable for beginners)
  • Medium projects: 5-20 categories (common applications)
  • Complex projects: 20+ categories (requires more data and annotation time)

2. Data Scale Planning

Data Volume Estimates:

Project Type Min Images Per Class Recommended Images Total Images (5 classes)
Quick Prototype 100-200 500 2,500
Production Application 1,000 3,000 15,000
High-Precision Application 5,000 10,000 50,000

Factors Affecting Data Volume:

  • Number of categories: More categories require more data
  • Scene complexity: Complex scenes need more data
  • Precision requirements: High precision demands more high-quality data
  • Class balance: Ensure relatively balanced data across categories (ratio no more than 10:1)

Real Case:

An industrial quality inspection project needed to detect 10 defect types. Normal products had 10,000 images, but defect samples only had 500. Through active defect sample collection and data augmentation, each defect category eventually reached 2,000 samples, and model accuracy improved from 75% to 92%.

3. Scene Diversity Planning

Scene Coverage Dimensions:

Time Dimension:

  • Daytime, nighttime, dusk, dawn
  • Different seasons (spring, summer, fall, winter)
  • Different time periods (morning, noon, evening)

Weather Dimension:

  • Sunny, rainy, snowy, foggy
  • Different lighting conditions (bright light, shadows, backlight)

Environment Dimension:

  • Indoor, outdoor
  • Urban, rural, highway
  • Different background complexity levels

Target State Dimension:

  • Stationary, moving
  • Complete, partially occluded
  • Different angles (front, side, back)

Scene Diversity Checklist:

  • Cover at least 3-5 major scenarios
  • Include edge cases (extreme situations)
  • Avoid overly uniform scenes (prone to overfitting)
  • Ensure consistent scene distribution between training and test sets 4. Image Quality Requirements

Resolution Requirements:

Application Scenario Minimum Resolution Recommended Resolution Notes
Small Object Detection 1280x1280 1920x1920+ Higher resolution needed for small targets
Standard Detection 640x640 1280x1280 YOLO default input size
Fast Detection 416x416 640x640 Speed priority, acceptable precision

Image Quality Checks:

  • Clarity: Target objects clearly visible, no blur
  • Contrast: Obvious contrast between target and background
  • Color: True colors, no severe distortion
  • Exposure: Normal exposure, not overexposed or underexposed
  • Format: Unified format (JPG or PNG), avoid format inconsistency

5. Budget and Timeline Planning

Time Estimates (for 5 classes, 1000 images each):

Phase Time Estimate Notes
Data Collection 1-2 weeks Varies by data source
Data Annotation 2-4 weeks Can be shortened to 1 week with AI assistance
Quality Check 3-5 days Multiple review rounds
Format Conversion 1 day Automated processing
Total 4-7 weeks Can be shortened to 2-3 weeks with AI assistance

Cost Estimates (for 5 classes, 1000 images each):

Approach Annotation Cost Tool Cost Total Cost
Pure Manual Annotation $8,000-12,000 $0 $8,000-12,000
AI-Assisted Annotation $1,600-2,400 $0 (free tools) $1,600-2,400
Savings 80% - 80%

Requirements Document Template:

# YOLO Dataset Requirements Document

## Project Information
- Project Name: [Project Name]
- Application Scenario: [Scenario Description]
- Target Accuracy: [Target mAP Value]

## Category Definitions
1. [Category 1]: [Detailed Definition]
2. [Category 2]: [Detailed Definition]
...

## Data Scale
- Number of Categories: [N]
- Images Per Category: [M]
- Total Images: [N x M]

## Scene Requirements
- Time: [Daytime/Nighttime/All Day]
- Weather: [Sunny/Rainy/All Weather]
- Environment: [Indoor/Outdoor/Mixed]

## Quality Requirements
- Resolution: [Minimum Resolution]
- Annotation Precision: [IoU Requirement]
- Category Accuracy: [Accuracy Requirement]

## Timeline
- Start Date: [Date]
- Completion Date: [Date]
- Milestones: [Key Checkpoints]

## Budget
- Annotation Cost: [Budget]
- Tool Cost: [Budget]
- Total Budget: [Total Budget]

1.2 Collecting Image Data: A Complete Guide to Data Sources

Data Source 1: Public Datasets (Ideal for Quick Starts)

Public datasets are the go-to choice for quickly starting a project, especially suitable for learning and prototyping.

Major Public Dataset Comparison:

Dataset Classes Images Annotations Features Use Cases
COCO 80 330K 2.5M High quality, precise annotations General object detection
Open Images 600 9M 36M Many classes, large volume Large-scale training
ImageNet 1000 14M - Classification dataset Pre-trained models
Pascal VOC 20 11K 27K Classic dataset Learning and research
Cityscapes 30 25K - Urban street scenes Autonomous driving

COCO Dataset Details:

Download Methods:

# Method 1: Official download
# Visit https://cocodataset.org/#download
# Download train2017.zip, val2017.zip, annotations_trainval2017.zip

# Method 2: Using API
from pycocotools.coco import COCO
import requests

# Download images and annotations

Category List (partial):

  • People: person
  • Vehicles: car, truck, bus, motorcycle, bicycle
  • Animals: cat, dog, horse, cow, elephant
  • Furniture: chair, couch, bed, table
  • Electronics: laptop, mouse, keyboard, cell phone

Converting to YOLO Format:

Using a Python Script:

from pycocotools.coco import COCO
import json
import os
from PIL import Image

def coco_to_yolo(coco_annotation_file, output_dir):
    """
    Convert COCO format to YOLO format
    """
    coco = COCO(coco_annotation_file)

    # Create output directories
    os.makedirs(f'{output_dir}/images', exist_ok=True)
    os.makedirs(f'{output_dir}/labels', exist_ok=True)

    # Get all image IDs
    img_ids = coco.getImgIds()

    for img_id in img_ids:
        # Get image info
        img_info = coco.loadImgs(img_id)[0]
        img_width = img_info['width']
        img_height = img_info['height']

        # Get all annotations for this image
        ann_ids = coco.getAnnIds(imgIds=img_id)
        anns = coco.loadAnns(ann_ids)

        # Create YOLO format annotation file
        label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
        with open(label_file, 'w') as f:
            for ann in anns:
                # Get category ID (YOLO starts from 0)
                class_id = ann['category_id'] - 1  # COCO starts from 1

                # Get bounding box (COCO format: x, y, width, height)
                bbox = ann['bbox']
                x, y, w, h = bbox

                # Convert to YOLO format (normalized center coordinates and dimensions)
                center_x = (x + w / 2) / img_width
                center_y = (y + h / 2) / img_height
                norm_w = w / img_width
                norm_h = h / img_height

                # Write to file
                f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")

        # Copy image
        # ... (copy image to images directory)

# Usage
coco_to_yolo('annotations/instances_train2017.json', 'yolo_dataset')

Advantages:

  • Large volume, high quality
  • Precise annotations, professionally reviewed
  • Free to use, no copyright issues
  • Community support, abundant tutorials
  • Ideal for quick starts and prototyping

Disadvantages:

  • May not match your specific application scenario
  • Categories may not be granular enough
  • Scenes may not be diverse enough
  • Requires filtering and format conversion

Usage Recommendations:

  • Suitable for quickly validating ideas
  • Suitable as pre-training data
  • Suitable for learning YOLO
  • Not suitable for production (unless it perfectly matches your scenario) Data Source 2: Self-Captured (Recommended for Specific Scenarios)

Self-captured data is the most reliable source, giving you full control over data quality and scene coverage.

Shooting Plan Development:

1. Scene Coverage Plan

Time Coverage:

  • Daytime: Morning (8am-12pm), Afternoon (12pm-6pm)
  • Nighttime: Evening (6pm-8pm), Late night (8pm-12am)
  • Special times: Dusk, dawn, harsh midday light

Shooting Tips:

  • Capture at least 100-200 images per time period
  • Ensure scene diversity across different time periods
  • Record shooting time and lighting conditions

Weather Coverage:

  • Sunny: Normal lighting, clear visibility
  • Rainy: Wet surfaces, reflective effects
  • Overcast: Soft lighting, no harsh shadows
  • Foggy: Low visibility, blurred targets

Shooting Tips:

  • Capture at least 200-300 images per weather condition
  • Note how weather affects target appearance
  • Consider extreme weather situations

Angle Coverage:

  • Front: 0 degrees, target fully visible
  • Side: 45 degrees, 90 degrees, partial occlusion
  • Top-down: From above, suitable for surveillance scenarios
  • Bottom-up: From below, suitable for special viewpoints

Distance Coverage:

  • Close-up: Target occupies 50%+ of image, clear details
  • Medium range: Target occupies 20-50% of image, common scenario
  • Long range: Target occupies 5-20% of image, small object detection

2. Target Diversity Planning

Size Diversity:

  • Large objects: Occupying 30-80% of image, easy to detect
  • Medium objects: Occupying 10-30% of image, standard detection
  • Small objects: Occupying 1-10% of image, requires high resolution

State Diversity:

  • Stationary: Target at rest, clearly visible
  • Moving: Target in motion, possible blur
  • Partially occluded: 20-50% occluded by other objects
  • Heavily occluded: 50%+ occluded (optional, for robustness training)

Lighting Diversity:

  • Bright: Sufficient lighting, clear contrast
  • Shadow: Partially in shadow, reduced contrast
  • Backlit: Target backlit, clear silhouette but blurred details
  • Harsh light: Overexposed, lost details

3. Equipment Selection and Settings

Smartphone Capture (Recommended for beginners):

Advantages:

  • Portable, capture anytime
  • Auto-focus, simple operation
  • Modern phones have sufficient quality (12MP+)
  • Low cost, no extra equipment needed

Settings:

  • Resolution: Set to maximum (typically 4K or higher)
  • Format: Use JPG, balancing quality and file size
  • Focus: Ensure target is in sharp focus
  • Stability: Use a tripod or stabilizer to avoid shake

Camera Capture (Recommended for professional projects):

Advantages:

  • Higher image quality, richer details
  • More controllable parameters (ISO, aperture, shutter)
  • Suitable for professional projects

Settings:

  • ISO: Keep as low as possible (100-400) to reduce noise
  • Aperture: f/5.6-f/8, balancing depth of field and quality
  • Shutter: 1/250s+, avoiding motion blur
  • White balance: Adjust per scene, maintaining color accuracy

Drone Capture (Suitable for large scenes):

Advantages:

  • Top-down perspective, ideal for surveillance scenarios
  • Covers large areas efficiently
  • Unique viewpoints, adding data diversity

Considerations:

  • Comply with flight regulations
  • Monitor weather conditions (wind, rain)
  • Ensure sufficient battery 4. Shooting Workflow

Preparation Phase (1-2 days):

  1. Create a shooting plan

    • List all scenes to cover
    • Plan shooting routes and schedules
    • Prepare equipment (camera, memory cards, batteries)
  2. Equipment check

    • Check camera/phone battery level
    • Check storage space (recommend at least 100GB)
    • Check lens cleanliness

Shooting Phase (varies by project scale):

  1. Shoot according to plan

    • Strictly follow the scene coverage plan
    • Capture at least 50-100 images per scene
    • Record shooting info (time, location, scene)
  2. Real-time checks

    • Periodically check photo quality
    • Delete blurry or out-of-focus photos
    • Ensure targets are clearly visible
  3. Data backup

    • Back up immediately after each day's shooting
    • Use multiple storage devices
    • Prevent data loss

Organization Phase (after shooting):

  1. Photo screening

    • Delete blurry or out-of-focus photos
    • Delete duplicate photos
    • Keep high-quality photos
  2. Photo naming

    • Use meaningful naming conventions
    • Example: scene_time_weather_001.jpg
    • Facilitates later management and annotation
  3. Data statistics

    • Count photos per scene type
    • Check if scene coverage is complete
    • Supplement missing scenes

Real Cases:

Case 1: Autonomous Driving Road Scenes

An autonomous driving company needed to collect road scene data. The team created a detailed shooting plan:

  • Time: 1 month each for daytime, nighttime, and dusk
  • Weather: 2 weeks each for sunny, rainy, and overcast
  • Locations: 5 different cities, covering urban roads, highways, and rural roads
  • Equipment: 8 vehicle-mounted cameras, shooting simultaneously
  • Result: 50,000 high-quality images collected in 3 months, covering all scenarios

Case 2: Industrial Quality Inspection Product Photography

A factory needed to detect product defects. The team used industrial cameras:

  • Fixed shooting positions for consistency
  • Standard light sources to reduce lighting variation
  • Multiple angles per product (front, side, top)
  • Result: 20,000 product images collected in 1 month, including 5,000 defect samples

Shooting Checklist:

Equipment Preparation:

  • Camera/phone fully charged
  • Sufficient storage space (recommend 100GB+)
  • Clean lens, no smudges
  • Backup batteries and memory cards

Shooting Quality:

  • Targets clear, no blur
  • Accurate focus, no defocus
  • Normal exposure, not over/underexposed
  • Good composition, targets complete

Scene Coverage:

  • Complete time coverage (day/night)
  • Complete weather coverage (sunny/rainy)
  • Complete angle coverage (front/side)
  • Complete distance coverage (close/far)

Data Management:

  • Standardized photo naming
  • Timely data backup
  • Complete shooting info records

Data Source 3: Video Frame Extraction (Efficient Method)

Advantages:

  • Extract frames from video, highly efficient
  • Covers continuous actions
  • Natural scenes

Using TjMakeBot for Extraction:

  1. Upload video file
  2. Set extraction frame rate (e.g., 1fps)
  3. Automatically extract key frames
  4. Directly annotate extracted frames

Tips:

  • Select key frames: Avoid duplicate frames
  • Set appropriate frame rate: 1-5fps is usually sufficient
  • Process multiple videos: Cover different scenes

Data Source 4: Other Sources (Use with Caution)

Considerations:

  • Comply with data usage license agreements
  • Respect intellectual property and copyright
  • Obtain necessary authorization or permissions
  • Do not use copyright-protected content

Data Requirements Checklist:

Clarity:

  • Images are clear, target objects visible
  • Avoid blurry or out-of-focus images
  • Resolution at least 640x640

Target Size:

  • Target objects appropriately sized (recommend 5%-50% of image)
  • Avoid targets too small (< 1%) or too large (> 80%)
  • Small targets require higher resolution

Scene Diversity:

  • Cover different scenes
  • Avoid overfitting
  • Include edge cases

Target Completeness:

  • Annotation targets are complete
  • Avoid severe occlusion (> 50%)
  • Partial occlusion (< 50%) can be annotated

1.3 Data Preprocessing

Data preprocessing is a critical step for ensuring data quality, directly impacting model training effectiveness.

Preprocessing Workflow:

Step 1: Data Cleaning

Remove Low-Quality Images:

Checks:

  • Blurry images: Targets unclear, unidentifiable
  • Out-of-focus images: Focus not on the target
  • Over/underexposed: Severely abnormal exposure
  • Duplicate images: Identical or highly similar
  • Irrelevant images: Don't contain target objects

Automated Cleaning Script:

import cv2
import numpy as np
import os
from PIL import Image
import imagehash

def calculate_blur_score(image_path):
    """Calculate image blur score"""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    laplacian_var = cv2.Laplacian(img, cv2.CV_64F).var()
    return laplacian_var

def find_duplicates(image_dir, threshold=5):
    """Find duplicate images"""
    image_hashes = {}
    duplicates = []

    for filename in os.listdir(image_dir):
        if filename.endswith(('.jpg', '.png')):
            filepath = os.path.join(image_dir, filename)
            img_hash = imagehash.average_hash(Image.open(filepath))

            # Check for similar images
            for existing_file, existing_hash in image_hashes.items():
                if img_hash - existing_hash < threshold:
                    duplicates.append((existing_file, filename))
                    break

            image_hashes[filename] = img_hash

    return duplicates

def clean_dataset(image_dir, blur_threshold=100):
    """Clean the dataset"""
    cleaned_dir = os.path.join(image_dir, 'cleaned')
    os.makedirs(cleaned_dir, exist_ok=True)

    removed_count = 0

    for filename in os.listdir(image_dir):
        if filename.endswith(('.jpg', '.png')):
            filepath = os.path.join(image_dir, filename)

            # Check blur score
            blur_score = calculate_blur_score(filepath)
            if blur_score < blur_threshold:
                print(f"Removing blurry image: {filename} (blur score: {blur_score:.2f})")
                removed_count += 1
                continue

            # Copy to cleaned directory
            import shutil
            shutil.copy(filepath, os.path.join(cleaned_dir, filename))

    print(f"Cleaning complete, removed {removed_count} low-quality images")

# Usage
clean_dataset('./raw_images')

Manual Check:

  • Quickly browse all images
  • Flag obviously problematic images
  • Batch delete

Step 2: Unify Format

Format Selection:

Format Advantages Disadvantages Recommended Scenario
JPG Small files, fast loading Lossy compression Most scenarios (recommended)
PNG Lossless compression, high quality Large files Scenarios requiring high quality

Conversion Script:

from PIL import Image
import os

def convert_format(input_dir, output_dir, target_format='JPG', quality=95):
    """Unify image format"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png', '.bmp', '.tiff')):
            input_path = os.path.join(input_dir, filename)
            output_filename = os.path.splitext(filename)[0] + f'.{target_format.lower()}'
            output_path = os.path.join(output_dir, output_filename)

            # Open and convert
            img = Image.open(input_path)

            # Convert to RGB (if RGBA)
            if img.mode == 'RGBA':
                rgb_img = Image.new('RGB', img.size, (255, 255, 255))
                rgb_img.paste(img, mask=img.split()[3])
                img = rgb_img

            # Save
            if target_format == 'JPG':
                img.save(output_path, 'JPEG', quality=quality)
            else:
                img.save(output_path, target_format)

            print(f"Converted: {filename} -> {output_filename}")

# Usage
convert_format('./raw_images', './formatted_images', 'JPG', quality=95)

Step 3: Unify Dimensions

Size Selection Principles:

YOLO Input Sizes:

  • 640x640: Standard size, balancing speed and precision (recommended)
  • 416x416: Fast detection, suitable for real-time applications
  • 1280x1280: High-precision detection, suitable for small objects

Resizing Methods:

Method 1: Aspect-Ratio-Preserving Resize (Recommended)

from PIL import Image

def resize_with_aspect_ratio(image_path, target_size=640, padding_color=(114, 114, 114)):
    """
    Resize while preserving aspect ratio, padding with gray
    """
    img = Image.open(image_path)
    original_width, original_height = img.size

    # Calculate scale
    scale = min(target_size / original_width, target_size / original_height)
    new_width = int(original_width * scale)
    new_height = int(original_height * scale)

    # Resize image
    img_resized = img.resize((new_width, new_height), Image.Resampling.LANCZOS)

    # Create target-size canvas
    img_padded = Image.new('RGB', (target_size, target_size), padding_color)

    # Calculate centering position
    x_offset = (target_size - new_width) // 2
    y_offset = (target_size - new_height) // 2

    # Paste resized image
    img_padded.paste(img_resized, (x_offset, y_offset))

    return img_padded

# Batch processing
def batch_resize(input_dir, output_dir, target_size=640):
    """Batch resize"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png')):
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, filename)

            img_resized = resize_with_aspect_ratio(input_path, target_size)
            img_resized.save(output_path)
            print(f"Resized: {filename}")

# Usage
batch_resize('./formatted_images', './resized_images', target_size=640)

Method 2: Direct Stretching (Not Recommended)

  • Distorts target shape
  • May cause the model to learn incorrect features
  • Only use when target shape doesn't matter

Step 4: Data Augmentation (Optional)

When to Use Data Augmentation:

  • When data volume is insufficient
  • When you need to improve model generalization
  • When classes are imbalanced

Common Augmentation Methods:

1. Geometric Transforms:

  • Rotation: +/-15 degrees, simulating different angles
  • Flip: Horizontal flip, vertical flip
  • Scale: 0.8-1.2x, simulating different distances
  • Translation: +/-10%, simulating position changes

2. Color Transforms:

  • Brightness adjustment: +/-20%, simulating different lighting
  • Contrast adjustment: +/-20%, enhancing/reducing contrast
  • Saturation adjustment: +/-30%, simulating different environments
  • Hue adjustment: +/-10 degrees, simulating different light sources

3. Noise Addition:

  • Gaussian noise: Simulating sensor noise
  • Salt-and-pepper noise: Simulating transmission errors

Augmentation Script:

from PIL import Image, ImageEnhance
import random
import os

def augment_image(image_path, output_dir, num_augmentations=3):
    """Augment a single image"""
    img = Image.open(image_path)
    base_name = os.path.splitext(os.path.basename(image_path))[0]

    for i in range(num_augmentations):
        # Random rotation
        angle = random.uniform(-15, 15)
        img_rotated = img.rotate(angle, expand=False)

        # Random flip
        if random.random() > 0.5:
            img_rotated = img_rotated.transpose(Image.FLIP_LEFT_RIGHT)

        # Random brightness adjustment
        enhancer = ImageEnhance.Brightness(img_rotated)
        img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))

        # Random contrast adjustment
        enhancer = ImageEnhance.Contrast(img_rotated)
        img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))

        # Save
        output_path = os.path.join(output_dir, f"{base_name}_aug_{i}.jpg")
        img_rotated.save(output_path)
        print(f"Augmented: {base_name}_aug_{i}.jpg")

def batch_augment(input_dir, output_dir, num_augmentations=3):
    """Batch augmentation"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png')):
            input_path = os.path.join(input_dir, filename)
            augment_image(input_path, output_dir, num_augmentations)

# Usage
batch_augment('./resized_images', './augmented_images', num_augmentations=3)

Note: Data augmentation should be performed before annotation, or use a tool that supports automatic annotation coordinate adjustment.

Preprocessing Checklist

Data Cleaning:

  • Remove blurry images
  • Remove out-of-focus images
  • Remove duplicate images
  • Remove irrelevant images

Format Unification:

  • Unified to JPG or PNG format
  • Converted to RGB mode
  • File integrity verified

Size Unification:

  • Resized to target dimensions (e.g., 640x640)
  • Aspect ratio preserved (recommended)
  • Image quality verified

Data Augmentation (optional):

  • Augmentation methods determined
  • Augmentation applied
  • Augmentation results verified

Data Statistics:

  • Final image count tallied
  • Class distribution checked
  • Data quality verified

Step 2: Data Annotation

2.1 Choosing an Annotation Tool

Tool Selection Advice:

Different tools have different characteristics:

  • Free tools: Suitable for budget-limited users, features may be relatively simple
  • Paid tools: Typically more comprehensive features, suitable for enterprise users with budget
  • Selection principle: Choose based on project needs, budget, and technical capability

TjMakeBot Features:

  • Free (basic features)
  • AI chat-based annotation, significantly improving efficiency
  • Supports batch processing
  • Online and ready to use, no installation needed
  • Supports video-to-frame conversion

2.2 Creating Category Labels

Create your categories in TjMakeBot:

Category List Example:
0: car
1: person
2: bicycle
3: motorcycle
4: bus

Naming Conventions:

  • Use lowercase English
  • Avoid spaces and special characters
  • Category names should be clear and unambiguous

2.3 Starting Annotation: Two Methods Explained

Suitable Scenarios:

  • Batch annotation (> 100 images)
  • Standard scenes (common objects)
  • Rapid prototyping
  • Budget-limited projects

Complete Workflow:

Step 1: Upload Images (1 minute)

  • Batch upload all images
  • Recommend testing with 10-20 images first

Step 2: Open AI Assistant (5 seconds)

  • Click the "AI Assistant" button
  • Chat panel opens

Step 3: Enter Instructions (10 seconds)

Basic instruction:
"Please annotate all cars and pedestrians"

Advanced instructions:
"Annotate all vehicles, but exclude motorcycles"
"Annotate all targets in the center area of the image"
"Annotate all cars larger than 100 pixels"

Step 4: AI Auto-Annotation (automatic)

  • AI understands the instruction
  • Automatically identifies targets
  • Generates annotation results

Step 5: Review and Fine-Tune (5-10 minutes per 100 images)

  • Quickly browse annotation results
  • Correct obvious errors
  • Supplement missed annotations

Step 6: Apply to All (1 second)

  • Confirm satisfactory results
  • One-click apply to all images

Advantages:

  • Fast: 1000 images completed in 2-3 hours
  • High accuracy: AI accuracy typically >90%
  • Low cost: Free tool, virtually zero cost
  • High efficiency: Batch processing, dramatically improved efficiency

Real Case:

A student project needed to annotate 2000 images. Using AI chat-based annotation, annotation was completed in 2 days with 95% accuracy. Traditional methods would have taken 2 weeks.

Method 2: Manual Annotation (Suitable for Complex Scenes)

Suitable Scenarios:

  • Complex scenes (AI has difficulty recognizing)
  • Special objects (categories AI hasn't been trained on)
  • High precision requirements (pixel-level precision needed)
  • Small-scale projects (< 100 images)

Complete Workflow:

Step 1: Select Image (5 seconds)

  • Click an image to open the annotation interface

Step 2: Select Category (3 seconds)

  • Choose from the category list
  • Or create a new category

Step 3: Draw Bounding Box (10-30 seconds)

  • Mouse drag to draw a rectangle
  • Drag from top-left to bottom-right
  • Or use keyboard shortcuts

Step 4: Adjust Position and Size (10-20 seconds)

  • Drag the bounding box to move position
  • Drag corner points to adjust size
  • Use arrow keys for fine-tuning

Step 5: Save Annotation (2 seconds)

  • Auto-save
  • Or manual save

Manual Annotation Tips:

Tip 1: Use Keyboard Shortcuts

  • W: Switch tools
  • Delete: Delete selected annotation
  • Arrow keys: Fine-tune position
  • Ctrl+Z: Undo

Tip 2: Precise Adjustment

  • Use zoom to enlarge the image
  • Use crosshairs for precise positioning
  • Multiple fine adjustments for optimal placement

Tip 3: Batch Operations

  • Copy annotations to the next image
  • Batch delete incorrect annotations
  • Batch modify categories

Advantages:

  • High precision: Pixel-level accuracy
  • Flexible: Can handle any scenario
  • Controllable: Full control over the annotation process

Disadvantages:

  • Slow: 2-5 minutes per image
  • Expensive: Requires significant manpower
  • Fatiguing: Long annotation sessions lead to errors

Recommendation: Combine AI-assisted and manual annotation — AI handles standard scenes, manual handles complex scenes.

2.4 Annotation Quality Check: Ensuring Data Quality

Why Is Quality Checking So Important?

A real case:

A project annotated 5000 images, but after training, the model only achieved 70% accuracy. Upon inspection, 15% of the annotation data contained errors. After re-annotation, model accuracy improved to 92%.

Quality Check Checklist:

1. Completeness Check (Most Important)

  • All target objects are annotated
  • No missed objects
  • Partially occluded objects are also annotated

Check Methods:

  • Browse image by image, looking for omissions
  • Use AI-assisted checking (AI can identify omissions)
  • Sampling check (check 1 in every 10)

2. Accuracy Check

  • Bounding boxes precisely cover targets
  • Bounding boxes don't include excessive background (< 10%)
  • Bounding boxes don't miss parts of the target

Check Methods:

  • Check if bounding boxes are tight to target edges
  • Check for obvious deviations
  • Use IoU metrics for evaluation

3. Category Accuracy

  • Category labels are correct
  • No category confusion
  • Edge cases handled correctly

Check Methods:

  • Check each annotation box's category
  • Pay special attention to easily confused categories
  • Standardize edge case handling

4. Consistency Check

  • No duplicate annotations
  • Annotation standards are uniform
  • Different annotators maintain consistent standards

Check Methods:

  • Check for overlapping annotation boxes
  • Compare annotations from different annotators
  • Analyze annotation differences

Quality Metric Standards:

Metric Minimum Standard Recommended Standard Excellent Standard
Annotation Completeness > 90% > 95% > 98%
Bounding Box Accuracy > 85% > 90% > 95%
Category Accuracy > 95% > 98% > 99%
Annotation Consistency > 85% > 90% > 95%

Quality Check Tools:

TjMakeBot Built-in Quality Check:

  • Automatically detects missed annotations
  • Automatically detects duplicate annotations
  • Automatically detects bounding box deviations
  • Generates quality reports

Usage Steps:

  1. After completing annotation, click "Quality Check"
  2. System automatically analyzes annotation quality
  3. Generates quality report
  4. Fix issues based on the report

Quality Improvement Workflow:

First Round (after annotation completion):

  • Quickly browse all images
  • Identify obvious errors
  • Correct erroneous annotations

Second Round (after corrections):

  • Sampling check (20-30%)
  • Detailed bounding box inspection
  • Check category accuracy

Third Round (final confirmation):

  • Expert review
  • Performance testing
  • Final confirmation

Quality Check Time Allocation:

  • Annotation time: 70%
  • Quality checking: 20%
  • Correction time: 10%

Remember: The time invested in quality checking is worthwhile — it prevents costly rework later.

Step 3: Data Format Conversion

Data format conversion is the critical step of converting annotation results into the format required for YOLO training.

3.1 Exporting YOLO Format

Using TjMakeBot Export

Steps:

  1. Select Annotation Data

    • Open the annotation project in TjMakeBot
    • Select all annotated images
    • Or select images of specific categories
  2. Export Settings

    • Click the "Export" button
    • Select "YOLO Format"
    • Choose export options:
      • Include images
      • Include annotation files
      • Maintain directory structure
  3. Download Files

    • Wait for export to complete
    • Download ZIP file
    • Extract to local directory

Export Result Structure:

dataset/
├── images/
│   ├── image001.jpg
│   ├── image002.jpg
│   └── ...
└── labels/
    ├── image001.txt
    ├── image002.txt
    └── ...

Manual Conversion (From Other Formats)

Converting from VOC Format:

import xml.etree.ElementTree as ET
import os

def voc_to_yolo(voc_xml_path, yolo_txt_path, img_width, img_height, class_mapping):
    """
    Convert VOC format to YOLO format
    """
    tree = ET.parse(voc_xml_path)
    root = tree.getroot()

    with open(yolo_txt_path, 'w') as f:
        for obj in root.findall('object'):
            # Get category
            class_name = obj.find('name').text
            class_id = class_mapping[class_name]

            # Get bounding box (VOC format: xmin, ymin, xmax, ymax)
            bbox = obj.find('bndbox')
            xmin = float(bbox.find('xmin').text)
            ymin = float(bbox.find('ymin').text)
            xmax = float(bbox.find('xmax').text)
            ymax = float(bbox.find('ymax').text)

            # Convert to YOLO format
            center_x = ((xmin + xmax) / 2) / img_width
            center_y = ((ymin + ymax) / 2) / img_height
            width = (xmax - xmin) / img_width
            height = (ymax - ymin) / img_height

            # Write to file
            f.write(f"{class_id} {center_x} {center_y} {width} {height}\n")

# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
voc_to_yolo('annotations/image001.xml', 'labels/image001.txt', 1920, 1080, class_mapping)

Converting from COCO Format:

import json
from PIL import Image

def coco_to_yolo(coco_json_path, output_dir, class_mapping):
    """
    Convert COCO format to YOLO format
    """
    with open(coco_json_path, 'r') as f:
        coco_data = json.load(f)

    # Create output directory
    os.makedirs(f'{output_dir}/labels', exist_ok=True)

    # Build image ID to filename mapping
    img_id_to_info = {img['id']: img for img in coco_data['images']}

    # Group annotations by image ID
    annotations_by_img = {}
    for ann in coco_data['annotations']:
        img_id = ann['image_id']
        if img_id not in annotations_by_img:
            annotations_by_img[img_id] = []
        annotations_by_img[img_id].append(ann)

    # Convert annotations for each image
    for img_id, anns in annotations_by_img.items():
        img_info = img_id_to_info[img_id]
        img_width = img_info['width']
        img_height = img_info['height']

        # Create YOLO format file
        label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
        with open(label_file, 'w') as f:
            for ann in anns:
                category_id = ann['category_id']
                class_name = next(cat['name'] for cat in coco_data['categories'] if cat['id'] == category_id)
                class_id = class_mapping.get(class_name, -1)

                if class_id == -1:
                    continue  # Skip unmapped categories

                # COCO format: x, y, width, height (absolute coordinates)
                bbox = ann['bbox']
                x, y, w, h = bbox

                # Convert to YOLO format (normalized)
                center_x = (x + w / 2) / img_width
                center_y = (y + h / 2) / img_height
                norm_w = w / img_width
                norm_h = h / img_height

                f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")

# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
coco_to_yolo('annotations/instances_train2017.json', './yolo_dataset', class_mapping)

3.2 Validating Annotation Files

Validating annotation files is a critical step for ensuring data quality and avoiding errors during training.

Validation Script

Complete Validation Script:

import os
from PIL import Image

def validate_yolo_dataset(dataset_dir):
    """
    Validate a YOLO dataset
    """
    images_dir = os.path.join(dataset_dir, 'images')
    labels_dir = os.path.join(dataset_dir, 'labels')

    errors = []
    warnings = []

    # Get all image files
    image_files = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]

    for img_file in image_files:
        img_path = os.path.join(images_dir, img_file)
        label_file = os.path.splitext(img_file)[0] + '.txt'
        label_path = os.path.join(labels_dir, label_file)

        # Check 1: Does annotation file exist?
        if not os.path.exists(label_path):
            errors.append(f"Missing annotation file: {label_file}")
            continue

        # Check 2: Can the image be opened?
        try:
            img = Image.open(img_path)
            img_width, img_height = img.size
        except Exception as e:
            errors.append(f"Cannot open image: {img_file} - {str(e)}")
            continue

        # Check 3: Read annotation file
        try:
            with open(label_path, 'r') as f:
                lines = f.readlines()
        except Exception as e:
            errors.append(f"Cannot read annotation file: {label_file} - {str(e)}")
            continue

        # Check 4: Validate each line's format
        for line_num, line in enumerate(lines, 1):
            line = line.strip()
            if not line:
                continue

            parts = line.split()

            # Format check: should have 5 numbers
            if len(parts) != 5:
                errors.append(f"{label_file}:{line_num} - Format error, expected 5 numbers, got {len(parts)}")
                continue

            try:
                class_id = int(parts[0])
                center_x = float(parts[1])
                center_y = float(parts[2])
                width = float(parts[3])
                height = float(parts[4])
            except ValueError as e:
                errors.append(f"{label_file}:{line_num} - Cannot parse numbers: {str(e)}")
                continue

            # Check 5: Is class ID valid?
            if class_id < 0:
                errors.append(f"{label_file}:{line_num} - Class ID cannot be negative: {class_id}")

            # Check 6: Are coordinates in 0-1 range?
            if not (0 <= center_x <= 1):
                errors.append(f"{label_file}:{line_num} - center_x out of range: {center_x}")
            if not (0 <= center_y <= 1):
                errors.append(f"{label_file}:{line_num} - center_y out of range: {center_y}")
            if not (0 < width <= 1):
                errors.append(f"{label_file}:{line_num} - width out of range: {width}")
            if not (0 < height <= 1):
                errors.append(f"{label_file}:{line_num} - height out of range: {height}")

            # Check 7: Does bounding box exceed image bounds?
            x_min = center_x - width / 2
            x_max = center_x + width / 2
            y_min = center_y - height / 2
            y_max = center_y + height / 2

            if x_min < 0 or x_max > 1 or y_min < 0 or y_max > 1:
                warnings.append(f"{label_file}:{line_num} - Bounding box exceeds image bounds")

            # Check 8: Is bounding box too small?
            if width < 0.01 or height < 0.01:
                warnings.append(f"{label_file}:{line_num} - Bounding box too small (possible annotation error)")

            # Check 9: Is bounding box too large?
            if width > 0.95 or height > 0.95:
                warnings.append(f"{label_file}:{line_num} - Bounding box too large (possible annotation error)")

    # Output results
    print("=" * 50)
    print("Validation Results")
    print("=" * 50)

    if errors:
        print(f"\nFound {len(errors)} errors:")
        for error in errors[:10]:  # Show first 10 only
            print(f"  - {error}")
        if len(errors) > 10:
            print(f"  ... and {len(errors) - 10} more errors")
    else:
        print("\nNo errors found")

    if warnings:
        print(f"\nFound {len(warnings)} warnings:")
        for warning in warnings[:10]:  # Show first 10 only
            print(f"  - {warning}")
        if len(warnings) > 10:
            print(f"  ... and {len(warnings) - 10} more warnings")
    else:
        print("\nNo warnings found")

    return len(errors) == 0

# Usage
is_valid = validate_yolo_dataset('./dataset')
if is_valid:
    print("\nDataset validation passed, ready to start training")
else:
    print("\nDataset validation failed, please fix errors before training")

Validation Checklist

File Integrity:

  • Every image has a corresponding annotation file
  • Every annotation file has a corresponding image
  • Filenames match (except for extensions)

Format Correctness:

  • Each annotation file line has 5 numbers
  • All numbers are valid floats
  • Class IDs are integers

Coordinate Validity:

  • All coordinate values are in the 0-1 range
  • Bounding boxes don't exceed image bounds
  • Bounding box sizes are reasonable (not too small or too large)

Data Consistency:

  • Class IDs are consecutive (0, 1, 2, ...)
  • No duplicate annotations
  • Annotations match image content

3.3 Creating Dataset Configuration Files

The dataset configuration file is required for YOLO training, defining dataset paths, categories, and other information.

YOLOv8 Configuration File

Standard Format (dataset.yaml):

# Dataset path (relative to this file or absolute path)
path: /path/to/dataset  # Dataset root directory

# Training and validation set paths (relative to path)
train: images/train  # Training set image directory
val: images/val      # Validation set image directory
test: images/test    # Test set image directory (optional)

# Number of categories
nc: 5

# Category names (must correspond to class IDs)
names:
  0: car
  1: person
  2: bicycle
  3: motorcycle
  4: bus

YOLOv5 Configuration File

Standard Format (dataset.yaml):

# Dataset paths
train: /path/to/dataset/images/train
val: /path/to/dataset/images/val
test: /path/to/dataset/images/test  # Optional

# Number of categories
nc: 5

# Category names
names: ['car', 'person', 'bicycle', 'motorcycle', 'bus']

Configuration File Generation Script

Auto-Generation Script:

import os
import yaml

def create_dataset_yaml(dataset_dir, class_names, output_file='dataset.yaml', yolo_version='v8'):
    """
    Auto-generate dataset configuration file
    """
    # Check directory structure
    images_dir = os.path.join(dataset_dir, 'images')
    labels_dir = os.path.join(dataset_dir, 'labels')

    # Check for train/val/test subdirectories
    has_splits = os.path.exists(os.path.join(images_dir, 'train'))

    if yolo_version == 'v8':
        if has_splits:
            config = {
                'path': os.path.abspath(dataset_dir),
                'train': 'images/train',
                'val': 'images/val',
                'nc': len(class_names),
                'names': {i: name for i, name in enumerate(class_names)}
            }

            # If test set exists
            if os.path.exists(os.path.join(images_dir, 'test')):
                config['test'] = 'images/test'
        else:
            # If no splits, use images directory
            config = {
                'path': os.path.abspath(dataset_dir),
                'train': 'images',
                'val': 'images',  # Note: actual use requires splitting
                'nc': len(class_names),
                'names': {i: name for i, name in enumerate(class_names)}
            }
    else:  # YOLOv5
        if has_splits:
            config = {
                'train': os.path.join(os.path.abspath(dataset_dir), 'images', 'train'),
                'val': os.path.join(os.path.abspath(dataset_dir), 'images', 'val'),
                'nc': len(class_names),
                'names': class_names
            }

            if os.path.exists(os.path.join(images_dir, 'test')):
                config['test'] = os.path.join(os.path.abspath(dataset_dir), 'images', 'test')
        else:
            config = {
                'train': os.path.join(os.path.abspath(dataset_dir), 'images'),
                'val': os.path.join(os.path.abspath(dataset_dir), 'images'),
                'nc': len(class_names),
                'names': class_names
            }

    # Save configuration file
    with open(output_file, 'w', encoding='utf-8') as f:
        yaml.dump(config, f, allow_unicode=True, default_flow_style=False)

    print(f"Configuration file generated: {output_file}")
    print("\nConfiguration file contents:")
    print("=" * 50)
    with open(output_file, 'r', encoding='utf-8') as f:
        print(f.read())
    print("=" * 50)

# Usage example
class_names = ['car', 'person', 'bicycle', 'motorcycle', 'bus']
create_dataset_yaml('./dataset', class_names, 'dataset.yaml', yolo_version='v8')

Configuration File Validation

Validation Script:

import yaml
import os

def validate_dataset_yaml(yaml_file, dataset_dir):
    """
    Validate dataset configuration file
    """
    with open(yaml_file, 'r', encoding='utf-8') as f:
        config = yaml.safe_load(f)

    errors = []

    # Check required fields
    required_fields = ['nc', 'names']
    for field in required_fields:
        if field not in config:
            errors.append(f"Missing required field: {field}")

    # Check category count
    if 'nc' in config and 'names' in config:
        if isinstance(config['names'], dict):
            num_names = len(config['names'])
        else:
            num_names = len(config['names'])

        if config['nc'] != num_names:
            errors.append(f"Category count mismatch: nc={config['nc']}, names count={num_names}")

    # Check paths
    if 'path' in config:
        path = config['path']
        if not os.path.isabs(path):
            path = os.path.join(os.path.dirname(yaml_file), path)

        if not os.path.exists(path):
            errors.append(f"Dataset path does not exist: {path}")

    # Check training and validation set paths
    for split in ['train', 'val']:
        if split in config:
            split_path = config[split]
            if 'path' in config:
                full_path = os.path.join(config['path'], split_path)
            else:
                full_path = split_path

            if not os.path.exists(full_path):
                errors.append(f"{split} path does not exist: {full_path}")

    if errors:
        print("Configuration file validation failed:")
        for error in errors:
            print(f"  - {error}")
        return False
    else:
        print("Configuration file validation passed")
        return True

# Usage
validate_dataset_yaml('dataset.yaml', './dataset')

Configuration File Checklist

Basic Configuration:

  • Category count (nc) is correct
  • Category names (names) are complete
  • Class IDs start from 0 consecutively

Path Configuration:

  • Dataset path (path) is correct
  • Training set path (train) exists
  • Validation set path (val) exists
  • Test set path (test) exists (if used)

Format Correctness:

  • YAML format is correct
  • Encoding is UTF-8
  • Indentation is correct (using spaces, not tabs)

Step 4: Dataset Splitting

Dataset splitting is a critical pre-training step. Proper splitting ensures accurate model evaluation.

4.1 Splitting Strategy

Choosing Split Ratios

Standard Split Ratios:

Dataset Size Training Set Validation Set Test Set Notes
Small (< 1000 images) 70% 15% 15% Ensure sufficient training data
Medium (1000-10000 images) 75% 12.5% 12.5% Balance training and evaluation
Large (> 10000 images) 80% 10% 10% Ample training data, sufficient validation

Why Three Sets?

  1. Training Set (Train):

    • Used for model training
    • Model learns data features
    • Typically 70-80%
  2. Validation Set (Validation):

    • Used for hyperparameter tuning
    • Monitors training progress
    • Prevents overfitting
    • Typically 10-15%
  3. Test Set (Test):

    • Used for final evaluation
    • Not involved in training or tuning
    • Reflects true model performance
    • Typically 10-15%

Splitting Principles

1. Random Split (Basic Method)

Suitable Scenarios:

  • Similar data scenes
  • No time series relationships
  • No scene correlations

Method:

  • Randomly shuffle all data
  • Split by ratio
  • Ensure consistent class distribution

2. Stratified Split (Recommended)

Suitable Scenarios:

  • Imbalanced classes
  • Need to ensure consistent class ratios

Method:

  • Split each class separately
  • Each class split at the same ratio
  • Ensure consistent class distribution across train, val, and test sets

3. Scene-Based Split (Advanced Method)

Suitable Scenarios:

  • Data from different scenes
  • Need to test generalization ability
  • Avoid data leakage

Method:

  • Group by scene
  • Data from the same scene stays in the same set
  • Avoid scene overlap between training and test sets

Real Case:

An autonomous driving project had road data from 5 cities. Random splitting could result in both training and test sets containing data from the same city, making test results overly optimistic. The correct approach is to split by city: 3 cities for training, 1 for validation, 1 for testing.

Class Balance Check

Check Script:

import os
from collections import Counter

def check_class_balance(dataset_dir, splits=['train', 'val', 'test']):
    """
    Check class distribution across splits
    """
    results = {}

    for split in splits:
        labels_dir = os.path.join(dataset_dir, 'labels', split)
        if not os.path.exists(labels_dir):
            continue

        class_counts = Counter()
        total_objects = 0

        for label_file in os.listdir(labels_dir):
            if label_file.endswith('.txt'):
                with open(os.path.join(labels_dir, label_file), 'r') as f:
                    for line in f:
                        if line.strip():
                            class_id = int(line.split()[0])
                            class_counts[class_id] += 1
                            total_objects += 1

        results[split] = {
            'class_counts': dict(class_counts),
            'total_objects': total_objects,
            'num_images': len([f for f in os.listdir(labels_dir) if f.endswith('.txt')])
        }

    # Print results
    print("=" * 60)
    print("Class Distribution Statistics")
    print("=" * 60)

    for split, data in results.items():
        print(f"\n{split.upper()} set:")
        print(f"  Image count: {data['num_images']}")
        print(f"  Total objects: {data['total_objects']}")
        print(f"  Class distribution:")

        for class_id in sorted(data['class_counts'].keys()):
            count = data['class_counts'][class_id]
            percentage = count / data['total_objects'] * 100
            print(f"    Class {class_id}: {count} ({percentage:.1f}%)")

    # Check balance
    print("\n" + "=" * 60)
    print("Balance Check")
    print("=" * 60)

    if 'train' in results:
        train_counts = results['train']['class_counts']
        max_count = max(train_counts.values())
        min_count = min(train_counts.values())
        imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')

        print(f"Training set class imbalance ratio: {imbalance_ratio:.2f}")
        if imbalance_ratio > 10:
            print("Warning: Severe class imbalance, recommend balancing data")
        elif imbalance_ratio > 5:
            print("Note: Class imbalance exists, consider balancing")
        else:
            print("Class distribution is relatively balanced")

# Usage
check_class_balance('./dataset')

4.2 Splitting with Scripts

Basic Splitting Script

Simple Random Split:

import os
import shutil
import random

def split_dataset_simple(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
    """
    Simple random dataset split
    """
    # Set random seed for reproducibility
    random.seed(seed)

    images_dir = os.path.join(source_dir, 'images')
    labels_dir = os.path.join(source_dir, 'labels')

    # Get all images
    images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
    random.shuffle(images)

    # Calculate split points
    total = len(images)
    train_end = int(total * train_ratio)
    val_end = train_end + int(total * val_ratio)

    # Split
    train_images = images[:train_end]
    val_images = images[train_end:val_end]
    test_images = images[val_end:]

    print(f"Total images: {total}")
    print(f"Training set: {len(train_images)} ({len(train_images)/total*100:.1f}%)")
    print(f"Validation set: {len(val_images)} ({len(val_images)/total*100:.1f}%)")
    print(f"Test set: {len(test_images)} ({len(test_images)/total*100:.1f}%)")

    # Copy files
    for split, img_list in [('train', train_images),
                            ('val', val_images),
                            ('test', test_images)]:
        split_images_dir = os.path.join(source_dir, 'images', split)
        split_labels_dir = os.path.join(source_dir, 'labels', split)

        os.makedirs(split_images_dir, exist_ok=True)
        os.makedirs(split_labels_dir, exist_ok=True)

        for img in img_list:
            # Copy image
            src_img = os.path.join(images_dir, img)
            dst_img = os.path.join(split_images_dir, img)
            shutil.copy(src_img, dst_img)

            # Copy annotation
            label_name = os.path.splitext(img)[0] + '.txt'
            src_label = os.path.join(labels_dir, label_name)
            dst_label = os.path.join(split_labels_dir, label_name)

            if os.path.exists(src_label):
                shutil.copy(src_label, dst_label)
            else:
                print(f"Warning: Annotation file missing: {label_name}")

    print("\nDataset split complete")

# Usage
split_dataset_simple('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Stratified Split by Class:

import os
import shutil
import random
from collections import defaultdict

def split_dataset_stratified(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
    """
    Stratified dataset split (by class)
    """
    random.seed(seed)

    images_dir = os.path.join(source_dir, 'images')
    labels_dir = os.path.join(source_dir, 'labels')

    # Group images by class
    images_by_class = defaultdict(list)

    for img_file in os.listdir(images_dir):
        if img_file.endswith(('.jpg', '.png')):
            label_file = os.path.splitext(img_file)[0] + '.txt'
            label_path = os.path.join(labels_dir, label_file)

            if os.path.exists(label_path):
                # Read annotation file, get classes
                with open(label_path, 'r') as f:
                    classes = set()
                    for line in f:
                        if line.strip():
                            class_id = int(line.split()[0])
                            classes.add(class_id)

                    # If image contains multiple classes, use the dominant class
                    if classes:
                        main_class = max(classes, key=lambda c: sum(1 for line in open(label_path) if line.strip() and int(line.split()[0]) == c))
                        images_by_class[main_class].append(img_file)

    # Split each class separately
    train_images = []
    val_images = []
    test_images = []

    for class_id, images in images_by_class.items():
        random.shuffle(images)

        total = len(images)
        train_end = int(total * train_ratio)
        val_end = train_end + int(total * val_ratio)

        train_images.extend(images[:train_end])
        val_images.extend(images[train_end:val_end])
        test_images.extend(images[val_end:])

        print(f"Class {class_id}: total={total}, train={train_end}, val={val_end-train_end}, test={total-val_end}")

    # Shuffle final lists
    random.shuffle(train_images)
    random.shuffle(val_images)
    random.shuffle(test_images)

    print(f"\nFinal split results:")
    print(f"Training set: {len(train_images)}")
    print(f"Validation set: {len(val_images)}")
    print(f"Test set: {len(test_images)}")

    # Copy files
    for split, img_list in [('train', train_images),
                            ('val', val_images),
                            ('test', test_images)]:
        split_images_dir = os.path.join(source_dir, 'images', split)
        split_labels_dir = os.path.join(source_dir, 'labels', split)

        os.makedirs(split_images_dir, exist_ok=True)
        os.makedirs(split_labels_dir, exist_ok=True)

        for img in img_list:
            # Copy image
            shutil.copy(os.path.join(images_dir, img),
                       os.path.join(split_images_dir, img))

            # Copy annotation
            label_name = os.path.splitext(img)[0] + '.txt'
            src_label = os.path.join(labels_dir, label_name)
            dst_label = os.path.join(split_labels_dir, label_name)

            if os.path.exists(src_label):
                shutil.copy(src_label, dst_label)

    print("\nStratified split complete")

# Usage
split_dataset_stratified('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Post-Split Validation

Validation Script:

def verify_split(dataset_dir):
    """
    Verify dataset split results
    """
    splits = ['train', 'val', 'test']

    for split in splits:
        images_dir = os.path.join(dataset_dir, 'images', split)
        labels_dir = os.path.join(dataset_dir, 'labels', split)

        if not os.path.exists(images_dir):
            print(f"{split} set image directory does not exist")
            continue

        images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
        labels = [f for f in os.listdir(labels_dir) if f.endswith('.txt')]

        # Check if images and annotations match
        missing_labels = []
        for img in images:
            label_name = os.path.splitext(img)[0] + '.txt'
            if label_name not in labels:
                missing_labels.append(label_name)

        if missing_labels:
            print(f"{split} set has {len(missing_labels)} images missing annotation files")
        else:
            print(f"{split} set: {len(images)} images, {len(labels)} annotation files, all matched")

# Usage
verify_split('./dataset')

Split Checklist

Pre-Split Preparation:

  • All images are annotated
  • Annotation files are validated
  • Data is cleaned

Split Process:

  • Random seed used for reproducibility
  • Stratified split by class (recommended)
  • Consistent class distribution maintained

Post-Split Validation:

  • Images and annotation files match
  • Class distribution checked per split
  • Split ratios match expectations

Directory Structure:

  • train/val/test subdirectories created
  • Images and annotation files correctly copied
  • Clear directory structure

Step 5: Model Training

Model training is the process of converting annotated data into a usable model, requiring proper parameter configuration and training process monitoring.

5.1 Installing the YOLO Environment

Why Choose YOLOv8?

  • Latest version, best performance
  • Simple installation, one command
  • Friendly API, easy to use
  • Comprehensive documentation, active community

Installation Steps:

1. Basic Installation:

# Install ultralytics (includes YOLOv8)
pip install ultralytics

# Verify installation
python -c "from ultralytics import YOLO; print('YOLOv8 installed successfully')"

2. GPU Support (Optional but Highly Recommended):

# Check if CUDA is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# If CUDA is not available, install CPU version
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

3. Dependency Check:

# Check key dependencies
pip list | grep -E "torch|ultralytics|opencv|pillow"

Environment Requirements:

  • Python 3.8+
  • PyTorch 1.8+
  • CUDA 11.0+ (for GPU training, optional)

YOLOv5 Installation (Alternative)

Installation Steps:

# Clone repository
git clone https://github.com/ultralytics/yolov5
cd yolov5

# Install dependencies
pip install -r requirements.txt

# Verify installation
python detect.py --help

Dependency Requirements:

  • Python 3.7+
  • PyTorch 1.7+
  • Other dependencies in requirements.txt

5.2 Training Configuration

YOLOv8 Training Configuration Details

Complete Training Script:

from ultralytics import YOLO
import torch

# Check device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Load pre-trained model
# Model selection:
# - yolov8n.pt: nano (smallest, fastest)
# - yolov8s.pt: small (small, fast)
# - yolov8m.pt: medium (balanced)
# - yolov8l.pt: large (high precision)
# - yolov8x.pt: xlarge (highest precision)
model = YOLO('yolov8n.pt')  # Choose based on needs

# Training configuration
results = model.train(
    # Dataset configuration
    data='dataset.yaml',      # Dataset config file path

    # Training parameters
    epochs=100,               # Training epochs (recommend: 100-300)
    imgsz=640,                # Input image size (640/416/1280)
    batch=16,                 # Batch size (adjust based on GPU memory)
    device=device,            # Device ('cuda'/'cpu'/'0,1' for multi-GPU)

    # Optimizer parameters
    lr0=0.01,                 # Initial learning rate
    lrf=0.01,                 # Final learning rate (lr0 * lrf)
    momentum=0.937,           # Momentum
    weight_decay=0.0005,     # Weight decay

    # Data augmentation
    hsv_h=0.015,             # Hue augmentation
    hsv_s=0.7,               # Saturation augmentation
    hsv_v=0.4,               # Value augmentation
    degrees=0.0,             # Rotation angle
    translate=0.1,           # Translation
    scale=0.5,               # Scale
    flipud=0.0,             # Vertical flip probability
    fliplr=0.5,             # Horizontal flip probability
    mosaic=1.0,             # Mosaic augmentation probability
    mixup=0.0,              # MixUp augmentation probability

    # Training settings
    patience=50,             # Early stopping patience (epochs without improvement)
    save=True,               # Save checkpoints
    save_period=10,          # Save every N epochs
    val=True,                # Validate during training
    plots=True,              # Generate training curve plots

    # Project settings
    project='runs/detect',    # Project directory
    name='my_model',         # Experiment name
    exist_ok=True,           # Allow overwriting existing experiments
    pretrained=True,         # Use pre-trained weights
    optimizer='SGD',         # Optimizer (SGD/Adam/AdamW)
    verbose=True,            # Verbose output
    seed=0,                  # Random seed
    deterministic=True,      # Deterministic training
    single_cls=False,        # Single class mode
    rect=False,              # Rectangular training
    cos_lr=False,            # Cosine learning rate schedule
    close_mosaic=10,         # Disable Mosaic for last N epochs
    resume=False,            # Resume training
    amp=True,                # Automatic mixed precision
    fraction=1.0,            # Fraction of dataset to use
    profile=False,           # Performance profiling
    freeze=None,             # Freeze layers (e.g., freeze=10 freezes first 10 layers)
)

# After training
print("Training complete!")
print(f"Best model saved at: {results.save_dir}")

Key Parameter Details

1. Model Selection:

Model Parameters Speed Precision Use Case
yolov8n 3.2M Fastest Lower Real-time detection, edge devices
yolov8s 11.2M Fast Medium Balance speed and precision
yolov8m 25.9M Medium Higher Production environment (recommended)
yolov8l 43.7M Slower High High precision requirements
yolov8x 68.2M Slowest Highest Research, maximum precision

Selection Advice:

  • Beginners: yolov8n (quick validation)
  • Production: yolov8m (balanced)
  • High precision: yolov8l or yolov8x

2. Batch Size:

GPU Memory vs Batch Size:

GPU Memory Recommended Batch Size (640x640)
4GB 4-8
6GB 8-12
8GB 12-16
12GB 16-24
16GB+ 24-32

Adjustment Method:

  • If out of memory, reduce batch or imgsz
  • If memory is sufficient, larger batch improves training stability

3. Learning Rate (lr0):

Learning Rate Selection:

  • Default: 0.01 (SGD optimizer)
  • Small datasets: 0.001-0.005
  • Large datasets: 0.01-0.02
  • Fine-tuning: 0.0001-0.001

Learning Rate Scheduling:

  • Cosine annealing: cos_lr=True, learning rate follows cosine curve
  • Linear decay: Default, learning rate decreases linearly

4. Training Epochs:

Epoch Recommendations:

  • Small datasets (< 1000 images): 200-300 epochs
  • Medium datasets (1000-10000 images): 100-200 epochs
  • Large datasets (> 10000 images): 50-100 epochs

Early Stopping:

  • patience=50: Stops if validation performance doesn't improve for 50 epochs
  • Prevents overfitting, saves training time

YOLOv5 Training Configuration

Training Script:

import torch
from pathlib import Path

# Set paths
data_yaml = 'dataset.yaml'
weights = 'yolov5s.pt'  # Pre-trained weights
epochs = 100
batch_size = 16
img_size = 640
device = '0' if torch.cuda.is_available() else 'cpu'

# Training command (via command line)
# python train.py --data dataset.yaml --weights yolov5s.pt --epochs 100 --batch-size 16 --img 640 --device 0

5.3 Training Monitoring

Key Metrics Explained

1. mAP (Mean Average Precision):

mAP50:

  • Average precision at IoU threshold=0.5
  • Measures overall model performance
  • Target: > 0.5 (50%)

mAP50-95:

  • Average precision across IoU thresholds from 0.5 to 0.95
  • Stricter evaluation standard
  • Target: > 0.3 (30%)

2. Precision:

  • Proportion of true positives among predicted positives
  • Measures false positive rate
  • Target: > 0.8 (80%)

3. Recall:

  • Proportion of true positives correctly predicted
  • Measures miss rate
  • Target: > 0.8 (80%)

4. Loss:

Training Loss (train/box_loss):

  • Bounding box loss on training set
  • Should continuously decrease

Validation Loss (val/box_loss):

  • Bounding box loss on validation set
  • Should decrease; if it increases, indicates overfitting

Training Process Monitoring

Real-Time Monitoring:

# Training automatically generates:
# - Training curve plots (results.png)
# - Confusion matrix (confusion_matrix.png)
# - Validation results (val_batch*.jpg)
# - Training logs (results.csv)

Viewing Training Logs:

import pandas as pd
import matplotlib.pyplot as plt

# Read training logs
df = pd.read_csv('runs/detect/my_model/results.csv')

# Plot training curves
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(df['epoch'], df['train/box_loss'], label='Train Loss')
plt.plot(df['epoch'], df['val/box_loss'], label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Loss Curve')

plt.subplot(1, 3, 2)
plt.plot(df['epoch'], df['metrics/mAP50(B)'], label='mAP50')
plt.xlabel('Epoch')
plt.ylabel('mAP50')
plt.legend()
plt.title('mAP50 Curve')

plt.subplot(1, 3, 3)
plt.plot(df['epoch'], df['metrics/precision(B)'], label='Precision')
plt.plot(df['epoch'], df['metrics/recall(B)'], label='Recall')
plt.xlabel('Epoch')
plt.ylabel('Score')
plt.legend()
plt.title('Precision & Recall')

plt.tight_layout()
plt.savefig('training_curves.png')
plt.show()

Training Tips and Best Practices

1. Learning Rate Adjustment Strategy:

Warm-up:

  • Use a smaller learning rate for the first few epochs
  • Helps stabilize training
  • YOLOv8 supports this by default

Learning Rate Decay:

  • Use cosine annealing: cos_lr=True
  • Or linear decay: default

2. Data Augmentation Strategy:

Basic Augmentation (enabled by default):

  • Horizontal flip: fliplr=0.5
  • Color augmentation: hsv_h/s/v
  • Mosaic: mosaic=1.0

Advanced Augmentation (optional):

  • MixUp: mixup=0.15 (for small datasets)
  • Rotation: degrees=10 (if target orientation doesn't matter)

3. Early Stopping:

Settings:

patience=50  # Stop if validation performance doesn't improve for 50 epochs

Benefits:

  • Prevents overfitting
  • Saves training time
  • Automatically selects the best model

4. Model Checkpoints:

Auto-Save:

  • Best model automatically saved each epoch
  • Saved at: runs/detect/my_model/weights/best.pt

Manual Save:

# Save at any point during training
model.save('my_checkpoint.pt')

Resume Training:

# Resume training from checkpoint
model = YOLO('runs/detect/my_model/weights/last.pt')
model.train(resume=True)

Training Problem Diagnosis

Problem 1: Loss Not Decreasing

Possible Causes:

  • Learning rate too high or too low
  • Poor data quality
  • Inappropriate model selection

Solutions:

  • Adjust learning rate (try 0.001-0.01)
  • Check data quality
  • Try a larger model

Problem 2: Overfitting (Training loss decreasing, validation loss increasing)

Possible Causes:

  • Insufficient data
  • Model too large
  • Insufficient data augmentation

Solutions:

  • Increase data volume
  • Use a smaller model
  • Increase data augmentation
  • Use dropout or regularization

Problem 3: Training Too Slow

Possible Causes:

  • Training on CPU
  • Batch size too small
  • Image size too large

Solutions:

  • Use GPU training
  • Increase batch size
  • Reduce image size (e.g., 640 -> 416)

Training Checklist

Pre-Training Preparation:

  • Dataset split (train/val/test)
  • Dataset config file (dataset.yaml) correct
  • Environment installed (YOLOv8/YOLOv5)
  • GPU available (if using GPU)

Training Configuration:

  • Appropriate model size selected
  • Batch size set based on GPU memory
  • Learning rate set reasonably
  • Sufficient training epochs

Training Monitoring:

  • Real-time training log review
  • Loss curve monitoring
  • mAP curve monitoring
  • Validation set performance check

Training Optimization:

  • Early stopping enabled
  • Checkpoints saved
  • Hyperparameters tuned
  • Training curves analyzed

Step 6: Model Evaluation and Optimization

Model evaluation is the critical step for validating model performance, and optimization is the ongoing process of improving it.

6.1 Evaluating the Model

Basic Evaluation

YOLOv8 Evaluation Script:

from ultralytics import YOLO

# Load trained model
model = YOLO('runs/detect/my_model/weights/best.pt')

# Evaluate on validation set
metrics = model.val(data='dataset.yaml', split='val')

# Print key metrics
print("=" * 50)
print("Model Evaluation Results")
print("=" * 50)
print(f"mAP50: {metrics.box.map50:.4f}")
print(f"mAP50-95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall: {metrics.box.mr:.4f}")
print("=" * 50)

# Evaluate on test set (if exists)
if os.path.exists('dataset/images/test'):
    test_metrics = model.val(data='dataset.yaml', split='test')
    print("\nTest set evaluation results:")
    print(f"mAP50: {test_metrics.box.map50:.4f}")
    print(f"mAP50-95: {test_metrics.box.map:.4f}")

Detailed Evaluation Metrics

1. Per-Class Evaluation:

# Get detailed metrics for each class
for i, class_name in enumerate(model.names.values()):
    print(f"\nClass {i} ({class_name}):")
    print(f"  Precision: {metrics.box.p[i]:.4f}")
    print(f"  Recall: {metrics.box.r[i]:.4f}")
    print(f"  mAP50: {metrics.box.ap50[i]:.4f}")
    print(f"  mAP50-95: {metrics.box.ap[i]:.4f}")

2. Confusion Matrix Analysis:

# View confusion matrix (auto-generated in results directory)
# File location: runs/detect/my_model/confusion_matrix.png
# Analysis:
# - Diagonal: Correct classifications
# - Off-diagonal: Misclassifications
# - Identify easily confused class pairs

3. Visualizing Detection Results:

# Visualize detection results on test images
results = model('dataset/images/test', save=True, conf=0.25)

# View detection results
for result in results:
    # Get detection boxes
    boxes = result.boxes
    # Get classes
    classes = boxes.cls
    # Get confidence scores
    confidences = boxes.conf

    print(f"Detected {len(boxes)} objects")
    for i in range(len(boxes)):
        class_name = model.names[int(classes[i])]
        conf = confidences[i]
        print(f"  {class_name}: {conf:.2f}")

Performance Benchmarks

Performance Evaluation Standards:

Application Scenario mAP50 Target mAP50-95 Target Notes
Quick Prototype > 0.5 > 0.3 Validate ideas
Production Environment > 0.7 > 0.5 Real-world application
High-Precision Application > 0.9 > 0.7 Critical applications

Real Case:

An industrial quality inspection project:

  • Initial model: mAP50=0.65, couldn't meet production requirements
  • After optimization: mAP50=0.85, met production standards
  • Optimization methods: Improved data quality, increased data volume, tuned hyperparameters

6.2 Common Problems and Solutions

Problem Diagnosis Workflow

1. Low Accuracy (mAP < 0.5)

Diagnosis Steps:

# 1. Check data quality
# - Are annotations accurate?
# - Is data balanced?
# - Are scenes diverse?

# 2. Check model training
# - Is loss decreasing normally?
# - Is training sufficient?
# - Is learning rate appropriate?

# 3. Check model selection
# - Is the model too small?
# - Do you need a larger model?

Solutions:

  • Improve data quality: Re-check annotations, correct errors
  • Increase data volume: Collect more high-quality data
  • Use a larger model: Upgrade from yolov8n to yolov8m
  • Tune hyperparameters: Learning rate, batch size, etc.

2. Overfitting (Low training loss, high validation loss)

Diagnosis:

# Check training curves
# - train/box_loss continuously decreasing
# - val/box_loss first decreasing then increasing
# - High training mAP, low validation mAP

Solutions:

  • Increase data volume: Collect more data
  • Data augmentation: Enable more augmentation
  • Use a smaller model: Reduce model complexity
  • Regularization: Increase dropout or weight decay
  • Early stopping: Use early stopping mechanism

3. High Miss Rate (Low Recall)

Diagnosis:

# Check per-class recall
for i, class_name in enumerate(model.names.values()):
    recall = metrics.box.r[i]
    if recall < 0.7:
        print(f"Warning: {class_name} recall is low: {recall:.2f}")

Possible Causes:

  • Imbalanced data (some classes have few samples)
  • Small object detection difficulty
  • Threshold set too high

Solutions:

  • Balance data: Increase minority class samples
  • Lower confidence threshold: conf=0.15-0.25
  • Use higher resolution: imgsz=1280
  • Data augmentation: Target small object augmentation

4. High False Positive Rate (Low Precision)

Diagnosis:

# Check per-class precision
for i, class_name in enumerate(model.names.values()):
    precision = metrics.box.p[i]
    if precision < 0.7:
        print(f"Warning: {class_name} precision is low: {precision:.2f}")

Possible Causes:

  • Insufficient negative samples
  • High class similarity
  • Threshold set too low

Solutions:

  • Add negative samples: Include images without targets
  • Raise confidence threshold: conf=0.3-0.5
  • Refine categories: Distinguish similar classes
  • Post-processing optimization: Adjust NMS threshold

5. Training Too Slow or Not Converging

Diagnosis:

# Check training process
# - Is loss decreasing?
# - Is learning rate appropriate?
# - Is GPU utilization high?

Solutions:

  • Use GPU: Ensure GPU training
  • Adjust batch size: Based on GPU memory
  • Adjust learning rate: Try different learning rates
  • Check data: Ensure correct data format

Problem-Solution Reference Table

Problem Symptoms Possible Causes Solutions
Low accuracy mAP < 0.5 Poor data quality, insufficient data Improve data quality, increase data
Overfitting Good on train, poor on val Insufficient data, model too large More data, smaller model, augmentation
High miss rate Recall < 0.7 Imbalanced data, high threshold Balance data, lower threshold
High false positives Precision < 0.7 Insufficient negatives, low threshold Add negatives, raise threshold
Slow training Long training time CPU training, small batch Use GPU, increase batch
Not converging Loss not decreasing Wrong learning rate, data issues Adjust learning rate, check data

6.3 Model Optimization

Optimization Strategies

1. Data Optimization

Increase Data Volume:

  • Collect more high-quality data
  • Use data augmentation (rotation, flip, brightness, etc.)
  • Supplement from public datasets

Improve Data Quality:

  • Re-check annotations, correct errors
  • Standardize annotation criteria
  • Balance class data

Data Augmentation Script:

# Using YOLOv8's built-in data augmentation
# Automatically applied during training, no manual processing needed
# Adjustable via parameters:
model.train(
    hsv_h=0.015,    # Hue augmentation
    hsv_s=0.7,     # Saturation augmentation
    hsv_v=0.4,     # Value augmentation
    degrees=10,    # Rotation angle
    translate=0.1, # Translation
    scale=0.5,     # Scale
    mosaic=1.0,    # Mosaic augmentation
    mixup=0.15,    # MixUp augmentation
)

2. Hyperparameter Optimization

Learning Rate Optimization:

# Try different learning rates
learning_rates = [0.001, 0.005, 0.01, 0.02]

for lr in learning_rates:
    model = YOLO('yolov8n.pt')
    results = model.train(
        data='dataset.yaml',
        epochs=50,
        lr0=lr,
        name=f'lr_{lr}',
    )
    print(f"LR={lr}, mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")

Batch Size Optimization:

# Adjust batch size based on GPU memory
# Larger batches are generally more stable but require more memory
batch_sizes = [8, 16, 32]

for batch in batch_sizes:
    model = YOLO('yolov8n.pt')
    results = model.train(
        data='dataset.yaml',
        epochs=50,
        batch=batch,
        name=f'batch_{batch}',
    )

3. Model Selection Optimization

Model Size Comparison:

# Test different model sizes
models = ['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']

for model_name in models:
    model = YOLO(model_name)
    results = model.train(
        data='dataset.yaml',
        epochs=100,
        name=model_name.replace('.pt', ''),
    )
    print(f"{model_name}: mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")

4. Post-Processing Optimization

Adjusting Confidence Threshold:

# Default threshold is 0.25, adjustable based on needs
# Higher threshold: fewer false positives, but may increase misses
# Lower threshold: fewer misses, but may increase false positives

# Adjust during inference
results = model('test_image.jpg', conf=0.3)  # Raise threshold
results = model('test_image.jpg', conf=0.15)  # Lower threshold

Adjusting NMS Threshold:

# NMS (Non-Maximum Suppression) removes duplicate detections
# iou parameter controls NMS IoU threshold
# Higher iou: stricter NMS, fewer duplicate detections
# Lower iou: more lenient NMS, may keep more detection boxes

results = model('test_image.jpg', iou=0.45)  # Default is 0.7

5. Model Ensemble

Multi-Model Voting:

from ultralytics import YOLO
import numpy as np

# Load multiple models
models = [
    YOLO('runs/detect/model1/weights/best.pt'),
    YOLO('runs/detect/model2/weights/best.pt'),
    YOLO('runs/detect/model3/weights/best.pt'),
]

# Predict on the same image
image = 'test_image.jpg'
predictions = [model(image, conf=0.25) for model in models]

# Voting or averaging (simplified example)
# Real applications require more sophisticated ensemble strategies

Optimization Checklist

Data Optimization:

  • Sufficient data volume
  • High data quality
  • Balanced classes
  • Diverse scenes

Training Optimization:

  • Appropriate learning rate
  • Reasonable batch size
  • Sufficient training epochs
  • Data augmentation enabled

Model Optimization:

  • Appropriate model size
  • Pre-trained weights used
  • Different models tried

Post-Processing Optimization:

  • Appropriate confidence threshold
  • Appropriate NMS threshold
  • Model ensemble considered

Performance Evaluation:

  • mAP meets target
  • Precision and Recall balanced
  • Per-class performance balanced
  • Real-world application results satisfactory

Accelerate Dataset Creation with TjMakeBot

TjMakeBot's Advantages:

  1. AI Chat-Based Annotation

    • Natural language instructions, fast annotation
    • Supports batch processing
    • High accuracy
  2. Video-to-Frame Feature

    • Extract frames from video
    • Custom frame rate
    • Batch processing
  3. Multi-Format Support

    • YOLO format export
    • VOC, COCO format support
    • Convenient format conversion
  4. Free (Basic Features)

    • No usage limits
    • No feature restrictions
    • Online and ready to use

Start Using TjMakeBot to Create YOLO Datasets for Free ->

Conclusion

Creating a high-quality YOLO dataset is the foundation for model success. By choosing the right tools, following practical methods, and continuously optimizing, you can create high-quality datasets and train excellent models.

Remember: Data quality > Model architecture. Investing time in data yields significant returns.


Legal Disclaimer: The content of this article is for reference only and does not constitute any legal, commercial, or technical advice. When using any tools or methods, please comply with applicable laws and regulations, respect intellectual property rights, and obtain necessary authorizations. All company names, product names, and trademarks mentioned in this article are the property of their respective owners.

About the Author: The TjMakeBot team focuses on AI data annotation tool development, helping developers quickly create high-quality YOLO datasets.

Keywords: YOLO dataset, object detection, YOLO annotation, YOLOv8, YOLOv5, dataset creation, image annotation, TjMakeBot