YOLO Dataset Complete Guide: From Zero to Model Training

Introduction: Why Is YOLO So Popular?

"I want to build an object detection project with YOLO, but I don't know where to start..."

This is a real struggle for many AI developers. YOLO (You Only Look Once) is one of the most widely used algorithms in object detection. From YOLOv1 to the latest YOLOv10, the YOLO series has achieved a strong balance between speed and accuracy.

YOLO Application Scenarios:

Autonomous Driving: Real-time detection of vehicles, pedestrians, and traffic signs
Industrial Quality Inspection: Rapid detection of product defects
Medical Imaging: Assisting doctors in identifying lesions
Retail Analytics: Product recognition and inventory management
Security Surveillance: Real-time monitoring and anomaly detection

YOLO's Advantages:

Fast: Can process video streams in real time
Accurate: Achieves a good balance between speed and accuracy
Easy to use: Comprehensive tools and documentation
Active community: Abundant tutorials and examples

But the first hurdle many developers face when using YOLO is: How do you create a high-quality YOLO dataset?

Today, we'll walk you through creating a complete YOLO dataset from scratch, all the way to successful model training. Whether you're a beginner or an experienced developer, you'll find practical methods and tips in this article.

What Is a YOLO Dataset?

YOLO Data Format

YOLO uses a concise text format to store annotation information:

File Structure:

dataset/
├── images/
│   ├── train/
│   │   ├── image001.jpg
│   │   ├── image002.jpg
│   │   └── ...
│   └── val/
│       ├── image101.jpg
│       └── ...
└── labels/
    ├── train/
    │   ├── image001.txt
    │   ├── image002.txt
    │   └── ...
    └── val/
        ├── image101.txt
        └── ...

Annotation File Format (image001.txt):

class_id center_x center_y width height
0 0.5 0.5 0.3 0.4
1 0.2 0.3 0.1 0.2

Format Description:

class_id: Category ID (starting from 0)
center_x, center_y: Normalized coordinates of the bounding box center (0-1)
width, height: Normalized width and height of the bounding box (0-1)

Key point: YOLO uses normalized coordinates — all coordinate values are between 0 and 1.

YOLO Version Differences

Different YOLO versions have slightly different dataset format requirements:

Version	Format Requirements	Special Notes
YOLOv5	Standard format	Supports custom class counts
YOLOv8	Standard format	Ultralytics format recommended
YOLOv9	Standard format	Compatible with YOLOv5 format
YOLOv10	Standard format	Latest version, best performance

Good news: All YOLO versions use the same data format — your dataset is universally compatible!

Step 1: Data Collection and Preparation

1.1 Define Dataset Requirements

Before you begin, clarifying your requirements is the first step to success. A clear requirements plan can save you significant time and cost.

Requirements Analysis Checklist

1. Target Category Definition

Define Detection Targets:

List all object categories to detect
Define boundaries for each category (what counts, what doesn't)
Consider category hierarchy (e.g., vehicle -> car, truck, bus)

Real Case:

A traffic monitoring project initially defined only one "vehicle" category. After training, they found the model couldn't distinguish cars from trucks. After subdividing into "car," "truck," "bus," and "motorcycle," model accuracy improved by 15%.

Category Count Recommendations:

Simple projects: 1-5 categories (suitable for beginners)
Medium projects: 5-20 categories (common applications)
Complex projects: 20+ categories (requires more data and annotation time)

2. Data Scale Planning

Data Volume Estimates:

Project Type	Min Images Per Class	Recommended Images	Total Images (5 classes)
Quick Prototype	100-200	500	2,500
Production Application	1,000	3,000	15,000
High-Precision Application	5,000	10,000	50,000

Factors Affecting Data Volume:

Number of categories: More categories require more data
Scene complexity: Complex scenes need more data
Precision requirements: High precision demands more high-quality data
Class balance: Ensure relatively balanced data across categories (ratio no more than 10:1)

Real Case:

An industrial quality inspection project needed to detect 10 defect types. Normal products had 10,000 images, but defect samples only had 500. Through active defect sample collection and data augmentation, each defect category eventually reached 2,000 samples, and model accuracy improved from 75% to 92%.

3. Scene Diversity Planning

Scene Coverage Dimensions:

Time Dimension:

Daytime, nighttime, dusk, dawn
Different seasons (spring, summer, fall, winter)
Different time periods (morning, noon, evening)

Weather Dimension:

Sunny, rainy, snowy, foggy
Different lighting conditions (bright light, shadows, backlight)

Environment Dimension:

Indoor, outdoor
Urban, rural, highway
Different background complexity levels

Target State Dimension:

Stationary, moving
Complete, partially occluded
Different angles (front, side, back)

Scene Diversity Checklist:

Cover at least 3-5 major scenarios
Include edge cases (extreme situations)
Avoid overly uniform scenes (prone to overfitting)
Ensure consistent scene distribution between training and test sets 4. Image Quality Requirements

Resolution Requirements:

Application Scenario	Minimum Resolution	Recommended Resolution	Notes
Small Object Detection	1280x1280	1920x1920+	Higher resolution needed for small targets
Standard Detection	640x640	1280x1280	YOLO default input size
Fast Detection	416x416	640x640	Speed priority, acceptable precision

Image Quality Checks:

Clarity: Target objects clearly visible, no blur
Contrast: Obvious contrast between target and background
Color: True colors, no severe distortion
Exposure: Normal exposure, not overexposed or underexposed
Format: Unified format (JPG or PNG), avoid format inconsistency

5. Budget and Timeline Planning

Time Estimates (for 5 classes, 1000 images each):

Phase	Time Estimate	Notes
Data Collection	1-2 weeks	Varies by data source
Data Annotation	2-4 weeks	Can be shortened to 1 week with AI assistance
Quality Check	3-5 days	Multiple review rounds
Format Conversion	1 day	Automated processing
Total	4-7 weeks	Can be shortened to 2-3 weeks with AI assistance

Cost Estimates (for 5 classes, 1000 images each):

Approach	Annotation Cost	Tool Cost	Total Cost
Pure Manual Annotation	$8,000-12,000	$0	$8,000-12,000
AI-Assisted Annotation	$1,600-2,400	$0 (free tools)	$1,600-2,400
Savings	80%	-	80%

Requirements Document Template:

# YOLO Dataset Requirements Document

## Project Information
- Project Name: [Project Name]
- Application Scenario: [Scenario Description]
- Target Accuracy: [Target mAP Value]

## Category Definitions
1. [Category 1]: [Detailed Definition]
2. [Category 2]: [Detailed Definition]
...

## Data Scale
- Number of Categories: [N]
- Images Per Category: [M]
- Total Images: [N x M]

## Scene Requirements
- Time: [Daytime/Nighttime/All Day]
- Weather: [Sunny/Rainy/All Weather]
- Environment: [Indoor/Outdoor/Mixed]

## Quality Requirements
- Resolution: [Minimum Resolution]
- Annotation Precision: [IoU Requirement]
- Category Accuracy: [Accuracy Requirement]

## Timeline
- Start Date: [Date]
- Completion Date: [Date]
- Milestones: [Key Checkpoints]

## Budget
- Annotation Cost: [Budget]
- Tool Cost: [Budget]
- Total Budget: [Total Budget]

1.2 Collecting Image Data: A Complete Guide to Data Sources

Data Source 1: Public Datasets (Ideal for Quick Starts)

Public datasets are the go-to choice for quickly starting a project, especially suitable for learning and prototyping.

Major Public Dataset Comparison:

Dataset	Classes	Images	Annotations	Features	Use Cases
COCO	80	330K	2.5M	High quality, precise annotations	General object detection
Open Images	600	9M	36M	Many classes, large volume	Large-scale training
ImageNet	1000	14M	-	Classification dataset	Pre-trained models
Pascal VOC	20	11K	27K	Classic dataset	Learning and research
Cityscapes	30	25K	-	Urban street scenes	Autonomous driving

COCO Dataset Details:

Download Methods:

# Method 1: Official download
# Visit https://cocodataset.org/#download
# Download train2017.zip, val2017.zip, annotations_trainval2017.zip

# Method 2: Using API
from pycocotools.coco import COCO
import requests

# Download images and annotations

Category List (partial):

People: person
Vehicles: car, truck, bus, motorcycle, bicycle
Animals: cat, dog, horse, cow, elephant
Furniture: chair, couch, bed, table
Electronics: laptop, mouse, keyboard, cell phone

Converting to YOLO Format:

Using a Python Script:

from pycocotools.coco import COCO
import json
import os
from PIL import Image

def coco_to_yolo(coco_annotation_file, output_dir):
    """
    Convert COCO format to YOLO format
    """
    coco = COCO(coco_annotation_file)

    # Create output directories
    os.makedirs(f'{output_dir}/images', exist_ok=True)
    os.makedirs(f'{output_dir}/labels', exist_ok=True)

    # Get all image IDs
    img_ids = coco.getImgIds()

    for img_id in img_ids:
        # Get image info
        img_info = coco.loadImgs(img_id)[0]
        img_width = img_info['width']
        img_height = img_info['height']

        # Get all annotations for this image
        ann_ids = coco.getAnnIds(imgIds=img_id)
        anns = coco.loadAnns(ann_ids)

        # Create YOLO format annotation file
        label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
        with open(label_file, 'w') as f:
            for ann in anns:
                # Get category ID (YOLO starts from 0)
                class_id = ann['category_id'] - 1  # COCO starts from 1

                # Get bounding box (COCO format: x, y, width, height)
                bbox = ann['bbox']
                x, y, w, h = bbox

                # Convert to YOLO format (normalized center coordinates and dimensions)
                center_x = (x + w / 2) / img_width
                center_y = (y + h / 2) / img_height
                norm_w = w / img_width
                norm_h = h / img_height

                # Write to file
                f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")

        # Copy image
        # ... (copy image to images directory)

# Usage
coco_to_yolo('annotations/instances_train2017.json', 'yolo_dataset')

Advantages:

Large volume, high quality
Precise annotations, professionally reviewed
Free to use, no copyright issues
Community support, abundant tutorials
Ideal for quick starts and prototyping

Disadvantages:

May not match your specific application scenario
Categories may not be granular enough
Scenes may not be diverse enough
Requires filtering and format conversion

Usage Recommendations:

Suitable for quickly validating ideas
Suitable as pre-training data
Suitable for learning YOLO
Not suitable for production (unless it perfectly matches your scenario) Data Source 2: Self-Captured (Recommended for Specific Scenarios)

Self-captured data is the most reliable source, giving you full control over data quality and scene coverage.

Shooting Plan Development:

1. Scene Coverage Plan

Time Coverage:

Daytime: Morning (8am-12pm), Afternoon (12pm-6pm)
Nighttime: Evening (6pm-8pm), Late night (8pm-12am)
Special times: Dusk, dawn, harsh midday light

Shooting Tips:

Capture at least 100-200 images per time period
Ensure scene diversity across different time periods
Record shooting time and lighting conditions

Weather Coverage:

Sunny: Normal lighting, clear visibility
Rainy: Wet surfaces, reflective effects
Overcast: Soft lighting, no harsh shadows
Foggy: Low visibility, blurred targets

Shooting Tips:

Capture at least 200-300 images per weather condition
Note how weather affects target appearance
Consider extreme weather situations

Angle Coverage:

Front: 0 degrees, target fully visible
Side: 45 degrees, 90 degrees, partial occlusion
Top-down: From above, suitable for surveillance scenarios
Bottom-up: From below, suitable for special viewpoints

Distance Coverage:

Close-up: Target occupies 50%+ of image, clear details
Medium range: Target occupies 20-50% of image, common scenario
Long range: Target occupies 5-20% of image, small object detection

2. Target Diversity Planning

Size Diversity:

Large objects: Occupying 30-80% of image, easy to detect
Medium objects: Occupying 10-30% of image, standard detection
Small objects: Occupying 1-10% of image, requires high resolution

State Diversity:

Stationary: Target at rest, clearly visible
Moving: Target in motion, possible blur
Partially occluded: 20-50% occluded by other objects
Heavily occluded: 50%+ occluded (optional, for robustness training)

Lighting Diversity:

Bright: Sufficient lighting, clear contrast
Shadow: Partially in shadow, reduced contrast
Backlit: Target backlit, clear silhouette but blurred details
Harsh light: Overexposed, lost details

3. Equipment Selection and Settings

Smartphone Capture (Recommended for beginners):

Advantages:

Portable, capture anytime
Auto-focus, simple operation
Modern phones have sufficient quality (12MP+)
Low cost, no extra equipment needed

Settings:

Resolution: Set to maximum (typically 4K or higher)
Format: Use JPG, balancing quality and file size
Focus: Ensure target is in sharp focus
Stability: Use a tripod or stabilizer to avoid shake

Camera Capture (Recommended for professional projects):

Advantages:

Higher image quality, richer details
More controllable parameters (ISO, aperture, shutter)
Suitable for professional projects

Settings:

ISO: Keep as low as possible (100-400) to reduce noise
Aperture: f/5.6-f/8, balancing depth of field and quality
Shutter: 1/250s+, avoiding motion blur
White balance: Adjust per scene, maintaining color accuracy

Drone Capture (Suitable for large scenes):

Advantages:

Top-down perspective, ideal for surveillance scenarios
Covers large areas efficiently
Unique viewpoints, adding data diversity

Considerations:

Comply with flight regulations
Monitor weather conditions (wind, rain)
Ensure sufficient battery 4. Shooting Workflow

Preparation Phase (1-2 days):

Create a shooting plan
- List all scenes to cover
- Plan shooting routes and schedules
- Prepare equipment (camera, memory cards, batteries)
Equipment check
- Check camera/phone battery level
- Check storage space (recommend at least 100GB)
- Check lens cleanliness

Shooting Phase (varies by project scale):

Shoot according to plan
- Strictly follow the scene coverage plan
- Capture at least 50-100 images per scene
- Record shooting info (time, location, scene)
Real-time checks
- Periodically check photo quality
- Delete blurry or out-of-focus photos
- Ensure targets are clearly visible
Data backup
- Back up immediately after each day's shooting
- Use multiple storage devices
- Prevent data loss

Organization Phase (after shooting):

Photo screening
- Delete blurry or out-of-focus photos
- Delete duplicate photos
- Keep high-quality photos
Photo naming
- Use meaningful naming conventions
- Example: scene_time_weather_001.jpg
- Facilitates later management and annotation
Data statistics
- Count photos per scene type
- Check if scene coverage is complete
- Supplement missing scenes

Real Cases:

Case 1: Autonomous Driving Road Scenes

An autonomous driving company needed to collect road scene data. The team created a detailed shooting plan:

Time: 1 month each for daytime, nighttime, and dusk

Weather: 2 weeks each for sunny, rainy, and overcast

Locations: 5 different cities, covering urban roads, highways, and rural roads

Equipment: 8 vehicle-mounted cameras, shooting simultaneously

Result: 50,000 high-quality images collected in 3 months, covering all scenarios

Case 2: Industrial Quality Inspection Product Photography

A factory needed to detect product defects. The team used industrial cameras:

Fixed shooting positions for consistency

Standard light sources to reduce lighting variation

Multiple angles per product (front, side, top)

Result: 20,000 product images collected in 1 month, including 5,000 defect samples

Shooting Checklist:

Equipment Preparation:

Camera/phone fully charged
Sufficient storage space (recommend 100GB+)
Clean lens, no smudges
Backup batteries and memory cards

Shooting Quality:

Targets clear, no blur
Accurate focus, no defocus
Normal exposure, not over/underexposed
Good composition, targets complete

Scene Coverage:

Complete time coverage (day/night)
Complete weather coverage (sunny/rainy)
Complete angle coverage (front/side)
Complete distance coverage (close/far)

Data Management:

Standardized photo naming
Timely data backup
Complete shooting info records

Data Source 3: Video Frame Extraction (Efficient Method)

Advantages:

Extract frames from video, highly efficient
Covers continuous actions
Natural scenes

Using TjMakeBot for Extraction:

Upload video file
Set extraction frame rate (e.g., 1fps)
Automatically extract key frames
Directly annotate extracted frames

Tips:

Select key frames: Avoid duplicate frames
Set appropriate frame rate: 1-5fps is usually sufficient
Process multiple videos: Cover different scenes

Data Source 4: Other Sources (Use with Caution)

Considerations:

Comply with data usage license agreements
Respect intellectual property and copyright
Obtain necessary authorization or permissions
Do not use copyright-protected content

Data Requirements Checklist:

Clarity:

Images are clear, target objects visible
Avoid blurry or out-of-focus images
Resolution at least 640x640

Target Size:

Target objects appropriately sized (recommend 5%-50% of image)
Avoid targets too small (< 1%) or too large (> 80%)
Small targets require higher resolution

Scene Diversity:

Cover different scenes
Avoid overfitting
Include edge cases

Target Completeness:

Annotation targets are complete
Avoid severe occlusion (> 50%)
Partial occlusion (< 50%) can be annotated

1.3 Data Preprocessing

Data preprocessing is a critical step for ensuring data quality, directly impacting model training effectiveness.

Preprocessing Workflow:

Step 1: Data Cleaning

Remove Low-Quality Images:

Checks:

Blurry images: Targets unclear, unidentifiable
Out-of-focus images: Focus not on the target
Over/underexposed: Severely abnormal exposure
Duplicate images: Identical or highly similar
Irrelevant images: Don't contain target objects

Automated Cleaning Script:

import cv2
import numpy as np
import os
from PIL import Image
import imagehash

def calculate_blur_score(image_path):
    """Calculate image blur score"""
    img = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
    laplacian_var = cv2.Laplacian(img, cv2.CV_64F).var()
    return laplacian_var

def find_duplicates(image_dir, threshold=5):
    """Find duplicate images"""
    image_hashes = {}
    duplicates = []

    for filename in os.listdir(image_dir):
        if filename.endswith(('.jpg', '.png')):
            filepath = os.path.join(image_dir, filename)
            img_hash = imagehash.average_hash(Image.open(filepath))

            # Check for similar images
            for existing_file, existing_hash in image_hashes.items():
                if img_hash - existing_hash < threshold:
                    duplicates.append((existing_file, filename))
                    break

            image_hashes[filename] = img_hash

    return duplicates

def clean_dataset(image_dir, blur_threshold=100):
    """Clean the dataset"""
    cleaned_dir = os.path.join(image_dir, 'cleaned')
    os.makedirs(cleaned_dir, exist_ok=True)

    removed_count = 0

    for filename in os.listdir(image_dir):
        if filename.endswith(('.jpg', '.png')):
            filepath = os.path.join(image_dir, filename)

            # Check blur score
            blur_score = calculate_blur_score(filepath)
            if blur_score < blur_threshold:
                print(f"Removing blurry image: {filename} (blur score: {blur_score:.2f})")
                removed_count += 1
                continue

            # Copy to cleaned directory
            import shutil
            shutil.copy(filepath, os.path.join(cleaned_dir, filename))

    print(f"Cleaning complete, removed {removed_count} low-quality images")

# Usage
clean_dataset('./raw_images')

Manual Check:

Quickly browse all images
Flag obviously problematic images
Batch delete

Step 2: Unify Format

Format Selection:

Format	Advantages	Disadvantages	Recommended Scenario
JPG	Small files, fast loading	Lossy compression	Most scenarios (recommended)
PNG	Lossless compression, high quality	Large files	Scenarios requiring high quality

Conversion Script:

from PIL import Image
import os

def convert_format(input_dir, output_dir, target_format='JPG', quality=95):
    """Unify image format"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png', '.bmp', '.tiff')):
            input_path = os.path.join(input_dir, filename)
            output_filename = os.path.splitext(filename)[0] + f'.{target_format.lower()}'
            output_path = os.path.join(output_dir, output_filename)

            # Open and convert
            img = Image.open(input_path)

            # Convert to RGB (if RGBA)
            if img.mode == 'RGBA':
                rgb_img = Image.new('RGB', img.size, (255, 255, 255))
                rgb_img.paste(img, mask=img.split()[3])
                img = rgb_img

            # Save
            if target_format == 'JPG':
                img.save(output_path, 'JPEG', quality=quality)
            else:
                img.save(output_path, target_format)

            print(f"Converted: {filename} -> {output_filename}")

# Usage
convert_format('./raw_images', './formatted_images', 'JPG', quality=95)

Step 3: Unify Dimensions

Size Selection Principles:

YOLO Input Sizes:

640x640: Standard size, balancing speed and precision (recommended)
416x416: Fast detection, suitable for real-time applications
1280x1280: High-precision detection, suitable for small objects

Resizing Methods:

Method 1: Aspect-Ratio-Preserving Resize (Recommended)

from PIL import Image

def resize_with_aspect_ratio(image_path, target_size=640, padding_color=(114, 114, 114)):
    """
    Resize while preserving aspect ratio, padding with gray
    """
    img = Image.open(image_path)
    original_width, original_height = img.size

    # Calculate scale
    scale = min(target_size / original_width, target_size / original_height)
    new_width = int(original_width * scale)
    new_height = int(original_height * scale)

    # Resize image
    img_resized = img.resize((new_width, new_height), Image.Resampling.LANCZOS)

    # Create target-size canvas
    img_padded = Image.new('RGB', (target_size, target_size), padding_color)

    # Calculate centering position
    x_offset = (target_size - new_width) // 2
    y_offset = (target_size - new_height) // 2

    # Paste resized image
    img_padded.paste(img_resized, (x_offset, y_offset))

    return img_padded

# Batch processing
def batch_resize(input_dir, output_dir, target_size=640):
    """Batch resize"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png')):
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, filename)

            img_resized = resize_with_aspect_ratio(input_path, target_size)
            img_resized.save(output_path)
            print(f"Resized: {filename}")

# Usage
batch_resize('./formatted_images', './resized_images', target_size=640)

Method 2: Direct Stretching (Not Recommended)

Distorts target shape
May cause the model to learn incorrect features
Only use when target shape doesn't matter

Step 4: Data Augmentation (Optional)

When to Use Data Augmentation:

When data volume is insufficient
When you need to improve model generalization
When classes are imbalanced

Common Augmentation Methods:

1. Geometric Transforms:

Rotation: +/-15 degrees, simulating different angles
Flip: Horizontal flip, vertical flip
Scale: 0.8-1.2x, simulating different distances
Translation: +/-10%, simulating position changes

2. Color Transforms:

Brightness adjustment: +/-20%, simulating different lighting
Contrast adjustment: +/-20%, enhancing/reducing contrast
Saturation adjustment: +/-30%, simulating different environments
Hue adjustment: +/-10 degrees, simulating different light sources

3. Noise Addition:

Gaussian noise: Simulating sensor noise
Salt-and-pepper noise: Simulating transmission errors

Augmentation Script:

from PIL import Image, ImageEnhance
import random
import os

def augment_image(image_path, output_dir, num_augmentations=3):
    """Augment a single image"""
    img = Image.open(image_path)
    base_name = os.path.splitext(os.path.basename(image_path))[0]

    for i in range(num_augmentations):
        # Random rotation
        angle = random.uniform(-15, 15)
        img_rotated = img.rotate(angle, expand=False)

        # Random flip
        if random.random() > 0.5:
            img_rotated = img_rotated.transpose(Image.FLIP_LEFT_RIGHT)

        # Random brightness adjustment
        enhancer = ImageEnhance.Brightness(img_rotated)
        img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))

        # Random contrast adjustment
        enhancer = ImageEnhance.Contrast(img_rotated)
        img_rotated = enhancer.enhance(random.uniform(0.8, 1.2))

        # Save
        output_path = os.path.join(output_dir, f"{base_name}_aug_{i}.jpg")
        img_rotated.save(output_path)
        print(f"Augmented: {base_name}_aug_{i}.jpg")

def batch_augment(input_dir, output_dir, num_augmentations=3):
    """Batch augmentation"""
    os.makedirs(output_dir, exist_ok=True)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.jpg', '.png')):
            input_path = os.path.join(input_dir, filename)
            augment_image(input_path, output_dir, num_augmentations)

# Usage
batch_augment('./resized_images', './augmented_images', num_augmentations=3)

Note: Data augmentation should be performed before annotation, or use a tool that supports automatic annotation coordinate adjustment.

Preprocessing Checklist

Data Cleaning:

Remove blurry images
Remove out-of-focus images
Remove duplicate images
Remove irrelevant images

Format Unification:

Unified to JPG or PNG format
Converted to RGB mode
File integrity verified

Size Unification:

Resized to target dimensions (e.g., 640x640)
Aspect ratio preserved (recommended)
Image quality verified

Data Augmentation (optional):

Augmentation methods determined
Augmentation applied
Augmentation results verified

Data Statistics:

Final image count tallied
Class distribution checked
Data quality verified

Step 2: Data Annotation

2.1 Choosing an Annotation Tool

Tool Selection Advice:

Different tools have different characteristics:

Free tools: Suitable for budget-limited users, features may be relatively simple
Paid tools: Typically more comprehensive features, suitable for enterprise users with budget
Selection principle: Choose based on project needs, budget, and technical capability

TjMakeBot Features:

Free (basic features)
AI chat-based annotation, significantly improving efficiency
Supports batch processing
Online and ready to use, no installation needed
Supports video-to-frame conversion

2.2 Creating Category Labels

Create your categories in TjMakeBot:

Category List Example:
0: car
1: person
2: bicycle
3: motorcycle
4: bus

Naming Conventions:

Use lowercase English
Avoid spaces and special characters
Category names should be clear and unambiguous

2.3 Starting Annotation: Two Methods Explained

Method 1: AI Chat-Based Annotation (Highly Recommended)

Suitable Scenarios:

Batch annotation (> 100 images)
Standard scenes (common objects)
Rapid prototyping
Budget-limited projects

Complete Workflow:

Step 1: Upload Images (1 minute)

Batch upload all images
Recommend testing with 10-20 images first

Step 2: Open AI Assistant (5 seconds)

Click the "AI Assistant" button
Chat panel opens

Step 3: Enter Instructions (10 seconds)

Basic instruction:
"Please annotate all cars and pedestrians"

Advanced instructions:
"Annotate all vehicles, but exclude motorcycles"
"Annotate all targets in the center area of the image"
"Annotate all cars larger than 100 pixels"

Step 4: AI Auto-Annotation (automatic)

AI understands the instruction
Automatically identifies targets
Generates annotation results

Step 5: Review and Fine-Tune (5-10 minutes per 100 images)

Quickly browse annotation results
Correct obvious errors
Supplement missed annotations

Step 6: Apply to All (1 second)

Confirm satisfactory results
One-click apply to all images

Advantages:

Fast: 1000 images completed in 2-3 hours
High accuracy: AI accuracy typically >90%
Low cost: Free tool, virtually zero cost
High efficiency: Batch processing, dramatically improved efficiency

Real Case:

A student project needed to annotate 2000 images. Using AI chat-based annotation, annotation was completed in 2 days with 95% accuracy. Traditional methods would have taken 2 weeks.

Method 2: Manual Annotation (Suitable for Complex Scenes)

Suitable Scenarios:

Complex scenes (AI has difficulty recognizing)
Special objects (categories AI hasn't been trained on)
High precision requirements (pixel-level precision needed)
Small-scale projects (< 100 images)

Complete Workflow:

Step 1: Select Image (5 seconds)

Click an image to open the annotation interface

Step 2: Select Category (3 seconds)

Choose from the category list
Or create a new category

Step 3: Draw Bounding Box (10-30 seconds)

Mouse drag to draw a rectangle
Drag from top-left to bottom-right
Or use keyboard shortcuts

Step 4: Adjust Position and Size (10-20 seconds)

Drag the bounding box to move position
Drag corner points to adjust size
Use arrow keys for fine-tuning

Step 5: Save Annotation (2 seconds)

Auto-save
Or manual save

Manual Annotation Tips:

Tip 1: Use Keyboard Shortcuts

W: Switch tools
Delete: Delete selected annotation
Arrow keys: Fine-tune position
Ctrl+Z: Undo

Tip 2: Precise Adjustment

Use zoom to enlarge the image
Use crosshairs for precise positioning
Multiple fine adjustments for optimal placement

Tip 3: Batch Operations

Copy annotations to the next image
Batch delete incorrect annotations
Batch modify categories

Advantages:

High precision: Pixel-level accuracy
Flexible: Can handle any scenario
Controllable: Full control over the annotation process

Disadvantages:

Slow: 2-5 minutes per image
Expensive: Requires significant manpower
Fatiguing: Long annotation sessions lead to errors

Recommendation: Combine AI-assisted and manual annotation — AI handles standard scenes, manual handles complex scenes.

2.4 Annotation Quality Check: Ensuring Data Quality

Why Is Quality Checking So Important?

A real case:

A project annotated 5000 images, but after training, the model only achieved 70% accuracy. Upon inspection, 15% of the annotation data contained errors. After re-annotation, model accuracy improved to 92%.

Quality Check Checklist:

1. Completeness Check (Most Important)

All target objects are annotated
No missed objects
Partially occluded objects are also annotated

Check Methods:

Browse image by image, looking for omissions
Use AI-assisted checking (AI can identify omissions)
Sampling check (check 1 in every 10)

2. Accuracy Check

Bounding boxes precisely cover targets
Bounding boxes don't include excessive background (< 10%)
Bounding boxes don't miss parts of the target

Check Methods:

Check if bounding boxes are tight to target edges
Check for obvious deviations
Use IoU metrics for evaluation

3. Category Accuracy

Category labels are correct
No category confusion
Edge cases handled correctly

Check Methods:

Check each annotation box's category
Pay special attention to easily confused categories
Standardize edge case handling

4. Consistency Check

No duplicate annotations
Annotation standards are uniform
Different annotators maintain consistent standards

Check Methods:

Check for overlapping annotation boxes
Compare annotations from different annotators
Analyze annotation differences

Quality Metric Standards:

Metric	Minimum Standard	Recommended Standard	Excellent Standard
Annotation Completeness	> 90%	> 95%	> 98%
Bounding Box Accuracy	> 85%	> 90%	> 95%
Category Accuracy	> 95%	> 98%	> 99%
Annotation Consistency	> 85%	> 90%	> 95%

Quality Check Tools:

TjMakeBot Built-in Quality Check:

Automatically detects missed annotations
Automatically detects duplicate annotations
Automatically detects bounding box deviations
Generates quality reports

Usage Steps:

After completing annotation, click "Quality Check"
System automatically analyzes annotation quality
Generates quality report
Fix issues based on the report

Quality Improvement Workflow:

First Round (after annotation completion):

Quickly browse all images
Identify obvious errors
Correct erroneous annotations

Second Round (after corrections):

Sampling check (20-30%)
Detailed bounding box inspection
Check category accuracy

Third Round (final confirmation):

Expert review
Performance testing
Final confirmation

Quality Check Time Allocation:

Annotation time: 70%
Quality checking: 20%
Correction time: 10%

Remember: The time invested in quality checking is worthwhile — it prevents costly rework later.

Step 3: Data Format Conversion

Data format conversion is the critical step of converting annotation results into the format required for YOLO training.

3.1 Exporting YOLO Format

Using TjMakeBot Export

Steps:

Select Annotation Data
- Open the annotation project in TjMakeBot
- Select all annotated images
- Or select images of specific categories
Export Settings
- Click the "Export" button
- Select "YOLO Format"
- Choose export options:
  - Include images
  - Include annotation files
  - Maintain directory structure
Download Files
- Wait for export to complete
- Download ZIP file
- Extract to local directory

Export Result Structure:

dataset/
├── images/
│   ├── image001.jpg
│   ├── image002.jpg
│   └── ...
└── labels/
    ├── image001.txt
    ├── image002.txt
    └── ...

Manual Conversion (From Other Formats)

Converting from VOC Format:

import xml.etree.ElementTree as ET
import os

def voc_to_yolo(voc_xml_path, yolo_txt_path, img_width, img_height, class_mapping):
    """
    Convert VOC format to YOLO format
    """
    tree = ET.parse(voc_xml_path)
    root = tree.getroot()

    with open(yolo_txt_path, 'w') as f:
        for obj in root.findall('object'):
            # Get category
            class_name = obj.find('name').text
            class_id = class_mapping[class_name]

            # Get bounding box (VOC format: xmin, ymin, xmax, ymax)
            bbox = obj.find('bndbox')
            xmin = float(bbox.find('xmin').text)
            ymin = float(bbox.find('ymin').text)
            xmax = float(bbox.find('xmax').text)
            ymax = float(bbox.find('ymax').text)

            # Convert to YOLO format
            center_x = ((xmin + xmax) / 2) / img_width
            center_y = ((ymin + ymax) / 2) / img_height
            width = (xmax - xmin) / img_width
            height = (ymax - ymin) / img_height

            # Write to file
            f.write(f"{class_id} {center_x} {center_y} {width} {height}\n")

# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
voc_to_yolo('annotations/image001.xml', 'labels/image001.txt', 1920, 1080, class_mapping)

Converting from COCO Format:

import json
from PIL import Image

def coco_to_yolo(coco_json_path, output_dir, class_mapping):
    """
    Convert COCO format to YOLO format
    """
    with open(coco_json_path, 'r') as f:
        coco_data = json.load(f)

    # Create output directory
    os.makedirs(f'{output_dir}/labels', exist_ok=True)

    # Build image ID to filename mapping
    img_id_to_info = {img['id']: img for img in coco_data['images']}

    # Group annotations by image ID
    annotations_by_img = {}
    for ann in coco_data['annotations']:
        img_id = ann['image_id']
        if img_id not in annotations_by_img:
            annotations_by_img[img_id] = []
        annotations_by_img[img_id].append(ann)

    # Convert annotations for each image
    for img_id, anns in annotations_by_img.items():
        img_info = img_id_to_info[img_id]
        img_width = img_info['width']
        img_height = img_info['height']

        # Create YOLO format file
        label_file = f"{output_dir}/labels/{img_info['file_name'].replace('.jpg', '.txt')}"
        with open(label_file, 'w') as f:
            for ann in anns:
                category_id = ann['category_id']
                class_name = next(cat['name'] for cat in coco_data['categories'] if cat['id'] == category_id)
                class_id = class_mapping.get(class_name, -1)

                if class_id == -1:
                    continue  # Skip unmapped categories

                # COCO format: x, y, width, height (absolute coordinates)
                bbox = ann['bbox']
                x, y, w, h = bbox

                # Convert to YOLO format (normalized)
                center_x = (x + w / 2) / img_width
                center_y = (y + h / 2) / img_height
                norm_w = w / img_width
                norm_h = h / img_height

                f.write(f"{class_id} {center_x} {center_y} {norm_w} {norm_h}\n")

# Usage
class_mapping = {'car': 0, 'person': 1, 'bicycle': 2}
coco_to_yolo('annotations/instances_train2017.json', './yolo_dataset', class_mapping)

3.2 Validating Annotation Files

Validating annotation files is a critical step for ensuring data quality and avoiding errors during training.

Validation Script

Complete Validation Script:

import os
from PIL import Image

def validate_yolo_dataset(dataset_dir):
    """
    Validate a YOLO dataset
    """
    images_dir = os.path.join(dataset_dir, 'images')
    labels_dir = os.path.join(dataset_dir, 'labels')

    errors = []
    warnings = []

    # Get all image files
    image_files = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]

    for img_file in image_files:
        img_path = os.path.join(images_dir, img_file)
        label_file = os.path.splitext(img_file)[0] + '.txt'
        label_path = os.path.join(labels_dir, label_file)

        # Check 1: Does annotation file exist?
        if not os.path.exists(label_path):
            errors.append(f"Missing annotation file: {label_file}")
            continue

        # Check 2: Can the image be opened?
        try:
            img = Image.open(img_path)
            img_width, img_height = img.size
        except Exception as e:
            errors.append(f"Cannot open image: {img_file} - {str(e)}")
            continue

        # Check 3: Read annotation file
        try:
            with open(label_path, 'r') as f:
                lines = f.readlines()
        except Exception as e:
            errors.append(f"Cannot read annotation file: {label_file} - {str(e)}")
            continue

        # Check 4: Validate each line's format
        for line_num, line in enumerate(lines, 1):
            line = line.strip()
            if not line:
                continue

            parts = line.split()

            # Format check: should have 5 numbers
            if len(parts) != 5:
                errors.append(f"{label_file}:{line_num} - Format error, expected 5 numbers, got {len(parts)}")
                continue

            try:
                class_id = int(parts[0])
                center_x = float(parts[1])
                center_y = float(parts[2])
                width = float(parts[3])
                height = float(parts[4])
            except ValueError as e:
                errors.append(f"{label_file}:{line_num} - Cannot parse numbers: {str(e)}")
                continue

            # Check 5: Is class ID valid?
            if class_id < 0:
                errors.append(f"{label_file}:{line_num} - Class ID cannot be negative: {class_id}")

            # Check 6: Are coordinates in 0-1 range?
            if not (0 <= center_x <= 1):
                errors.append(f"{label_file}:{line_num} - center_x out of range: {center_x}")
            if not (0 <= center_y <= 1):
                errors.append(f"{label_file}:{line_num} - center_y out of range: {center_y}")
            if not (0 < width <= 1):
                errors.append(f"{label_file}:{line_num} - width out of range: {width}")
            if not (0 < height <= 1):
                errors.append(f"{label_file}:{line_num} - height out of range: {height}")

            # Check 7: Does bounding box exceed image bounds?
            x_min = center_x - width / 2
            x_max = center_x + width / 2
            y_min = center_y - height / 2
            y_max = center_y + height / 2

            if x_min < 0 or x_max > 1 or y_min < 0 or y_max > 1:
                warnings.append(f"{label_file}:{line_num} - Bounding box exceeds image bounds")

            # Check 8: Is bounding box too small?
            if width < 0.01 or height < 0.01:
                warnings.append(f"{label_file}:{line_num} - Bounding box too small (possible annotation error)")

            # Check 9: Is bounding box too large?
            if width > 0.95 or height > 0.95:
                warnings.append(f"{label_file}:{line_num} - Bounding box too large (possible annotation error)")

    # Output results
    print("=" * 50)
    print("Validation Results")
    print("=" * 50)

    if errors:
        print(f"\nFound {len(errors)} errors:")
        for error in errors[:10]:  # Show first 10 only
            print(f"  - {error}")
        if len(errors) > 10:
            print(f"  ... and {len(errors) - 10} more errors")
    else:
        print("\nNo errors found")

    if warnings:
        print(f"\nFound {len(warnings)} warnings:")
        for warning in warnings[:10]:  # Show first 10 only
            print(f"  - {warning}")
        if len(warnings) > 10:
            print(f"  ... and {len(warnings) - 10} more warnings")
    else:
        print("\nNo warnings found")

    return len(errors) == 0

# Usage
is_valid = validate_yolo_dataset('./dataset')
if is_valid:
    print("\nDataset validation passed, ready to start training")
else:
    print("\nDataset validation failed, please fix errors before training")

Validation Checklist

File Integrity:

Every image has a corresponding annotation file
Every annotation file has a corresponding image
Filenames match (except for extensions)

Format Correctness:

Each annotation file line has 5 numbers
All numbers are valid floats
Class IDs are integers

Coordinate Validity:

All coordinate values are in the 0-1 range
Bounding boxes don't exceed image bounds
Bounding box sizes are reasonable (not too small or too large)

Data Consistency:

Class IDs are consecutive (0, 1, 2, ...)
No duplicate annotations
Annotations match image content

3.3 Creating Dataset Configuration Files

The dataset configuration file is required for YOLO training, defining dataset paths, categories, and other information.

YOLOv8 Configuration File

Standard Format (dataset.yaml):

# Dataset path (relative to this file or absolute path)
path: /path/to/dataset  # Dataset root directory

# Training and validation set paths (relative to path)
train: images/train  # Training set image directory
val: images/val      # Validation set image directory
test: images/test    # Test set image directory (optional)

# Number of categories
nc: 5

# Category names (must correspond to class IDs)
names:
  0: car
  1: person
  2: bicycle
  3: motorcycle
  4: bus

YOLOv5 Configuration File

Standard Format (dataset.yaml):

# Dataset paths
train: /path/to/dataset/images/train
val: /path/to/dataset/images/val
test: /path/to/dataset/images/test  # Optional

# Number of categories
nc: 5

# Category names
names: ['car', 'person', 'bicycle', 'motorcycle', 'bus']

Configuration File Generation Script

Auto-Generation Script:

import os
import yaml

def create_dataset_yaml(dataset_dir, class_names, output_file='dataset.yaml', yolo_version='v8'):
    """
    Auto-generate dataset configuration file
    """
    # Check directory structure
    images_dir = os.path.join(dataset_dir, 'images')
    labels_dir = os.path.join(dataset_dir, 'labels')

    # Check for train/val/test subdirectories
    has_splits = os.path.exists(os.path.join(images_dir, 'train'))

    if yolo_version == 'v8':
        if has_splits:
            config = {
                'path': os.path.abspath(dataset_dir),
                'train': 'images/train',
                'val': 'images/val',
                'nc': len(class_names),
                'names': {i: name for i, name in enumerate(class_names)}
            }

            # If test set exists
            if os.path.exists(os.path.join(images_dir, 'test')):
                config['test'] = 'images/test'
        else:
            # If no splits, use images directory
            config = {
                'path': os.path.abspath(dataset_dir),
                'train': 'images',
                'val': 'images',  # Note: actual use requires splitting
                'nc': len(class_names),
                'names': {i: name for i, name in enumerate(class_names)}
            }
    else:  # YOLOv5
        if has_splits:
            config = {
                'train': os.path.join(os.path.abspath(dataset_dir), 'images', 'train'),
                'val': os.path.join(os.path.abspath(dataset_dir), 'images', 'val'),
                'nc': len(class_names),
                'names': class_names
            }

            if os.path.exists(os.path.join(images_dir, 'test')):
                config['test'] = os.path.join(os.path.abspath(dataset_dir), 'images', 'test')
        else:
            config = {
                'train': os.path.join(os.path.abspath(dataset_dir), 'images'),
                'val': os.path.join(os.path.abspath(dataset_dir), 'images'),
                'nc': len(class_names),
                'names': class_names
            }

    # Save configuration file
    with open(output_file, 'w', encoding='utf-8') as f:
        yaml.dump(config, f, allow_unicode=True, default_flow_style=False)

    print(f"Configuration file generated: {output_file}")
    print("\nConfiguration file contents:")
    print("=" * 50)
    with open(output_file, 'r', encoding='utf-8') as f:
        print(f.read())
    print("=" * 50)

# Usage example
class_names = ['car', 'person', 'bicycle', 'motorcycle', 'bus']
create_dataset_yaml('./dataset', class_names, 'dataset.yaml', yolo_version='v8')

Configuration File Validation

Validation Script:

import yaml
import os

def validate_dataset_yaml(yaml_file, dataset_dir):
    """
    Validate dataset configuration file
    """
    with open(yaml_file, 'r', encoding='utf-8') as f:
        config = yaml.safe_load(f)

    errors = []

    # Check required fields
    required_fields = ['nc', 'names']
    for field in required_fields:
        if field not in config:
            errors.append(f"Missing required field: {field}")

    # Check category count
    if 'nc' in config and 'names' in config:
        if isinstance(config['names'], dict):
            num_names = len(config['names'])
        else:
            num_names = len(config['names'])

        if config['nc'] != num_names:
            errors.append(f"Category count mismatch: nc={config['nc']}, names count={num_names}")

    # Check paths
    if 'path' in config:
        path = config['path']
        if not os.path.isabs(path):
            path = os.path.join(os.path.dirname(yaml_file), path)

        if not os.path.exists(path):
            errors.append(f"Dataset path does not exist: {path}")

    # Check training and validation set paths
    for split in ['train', 'val']:
        if split in config:
            split_path = config[split]
            if 'path' in config:
                full_path = os.path.join(config['path'], split_path)
            else:
                full_path = split_path

            if not os.path.exists(full_path):
                errors.append(f"{split} path does not exist: {full_path}")

    if errors:
        print("Configuration file validation failed:")
        for error in errors:
            print(f"  - {error}")
        return False
    else:
        print("Configuration file validation passed")
        return True

# Usage
validate_dataset_yaml('dataset.yaml', './dataset')

Configuration File Checklist

Basic Configuration:

Category count (nc) is correct
Category names (names) are complete
Class IDs start from 0 consecutively

Path Configuration:

Dataset path (path) is correct
Training set path (train) exists
Validation set path (val) exists
Test set path (test) exists (if used)

Format Correctness:

YAML format is correct
Encoding is UTF-8
Indentation is correct (using spaces, not tabs)

Step 4: Dataset Splitting

Dataset splitting is a critical pre-training step. Proper splitting ensures accurate model evaluation.

4.1 Splitting Strategy

Choosing Split Ratios

Standard Split Ratios:

Dataset Size	Training Set	Validation Set	Test Set	Notes
Small (< 1000 images)	70%	15%	15%	Ensure sufficient training data
Medium (1000-10000 images)	75%	12.5%	12.5%	Balance training and evaluation
Large (> 10000 images)	80%	10%	10%	Ample training data, sufficient validation

Why Three Sets?

Training Set (Train):
- Used for model training
- Model learns data features
- Typically 70-80%
Validation Set (Validation):
- Used for hyperparameter tuning
- Monitors training progress
- Prevents overfitting
- Typically 10-15%
Test Set (Test):
- Used for final evaluation
- Not involved in training or tuning
- Reflects true model performance
- Typically 10-15%

Splitting Principles

1. Random Split (Basic Method)

Suitable Scenarios:

Similar data scenes
No time series relationships
No scene correlations

Method:

Randomly shuffle all data
Split by ratio
Ensure consistent class distribution

2. Stratified Split (Recommended)

Suitable Scenarios:

Imbalanced classes
Need to ensure consistent class ratios

Method:

Split each class separately
Each class split at the same ratio
Ensure consistent class distribution across train, val, and test sets

3. Scene-Based Split (Advanced Method)

Suitable Scenarios:

Data from different scenes
Need to test generalization ability
Avoid data leakage

Method:

Group by scene
Data from the same scene stays in the same set
Avoid scene overlap between training and test sets

Real Case:

An autonomous driving project had road data from 5 cities. Random splitting could result in both training and test sets containing data from the same city, making test results overly optimistic. The correct approach is to split by city: 3 cities for training, 1 for validation, 1 for testing.

Class Balance Check

Check Script:

import os
from collections import Counter

def check_class_balance(dataset_dir, splits=['train', 'val', 'test']):
    """
    Check class distribution across splits
    """
    results = {}

    for split in splits:
        labels_dir = os.path.join(dataset_dir, 'labels', split)
        if not os.path.exists(labels_dir):
            continue

        class_counts = Counter()
        total_objects = 0

        for label_file in os.listdir(labels_dir):
            if label_file.endswith('.txt'):
                with open(os.path.join(labels_dir, label_file), 'r') as f:
                    for line in f:
                        if line.strip():
                            class_id = int(line.split()[0])
                            class_counts[class_id] += 1
                            total_objects += 1

        results[split] = {
            'class_counts': dict(class_counts),
            'total_objects': total_objects,
            'num_images': len([f for f in os.listdir(labels_dir) if f.endswith('.txt')])
        }

    # Print results
    print("=" * 60)
    print("Class Distribution Statistics")
    print("=" * 60)

    for split, data in results.items():
        print(f"\n{split.upper()} set:")
        print(f"  Image count: {data['num_images']}")
        print(f"  Total objects: {data['total_objects']}")
        print(f"  Class distribution:")

        for class_id in sorted(data['class_counts'].keys()):
            count = data['class_counts'][class_id]
            percentage = count / data['total_objects'] * 100
            print(f"    Class {class_id}: {count} ({percentage:.1f}%)")

    # Check balance
    print("\n" + "=" * 60)
    print("Balance Check")
    print("=" * 60)

    if 'train' in results:
        train_counts = results['train']['class_counts']
        max_count = max(train_counts.values())
        min_count = min(train_counts.values())
        imbalance_ratio = max_count / min_count if min_count > 0 else float('inf')

        print(f"Training set class imbalance ratio: {imbalance_ratio:.2f}")
        if imbalance_ratio > 10:
            print("Warning: Severe class imbalance, recommend balancing data")
        elif imbalance_ratio > 5:
            print("Note: Class imbalance exists, consider balancing")
        else:
            print("Class distribution is relatively balanced")

# Usage
check_class_balance('./dataset')

4.2 Splitting with Scripts

Basic Splitting Script

Simple Random Split:

import os
import shutil
import random

def split_dataset_simple(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
    """
    Simple random dataset split
    """
    # Set random seed for reproducibility
    random.seed(seed)

    images_dir = os.path.join(source_dir, 'images')
    labels_dir = os.path.join(source_dir, 'labels')

    # Get all images
    images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
    random.shuffle(images)

    # Calculate split points
    total = len(images)
    train_end = int(total * train_ratio)
    val_end = train_end + int(total * val_ratio)

    # Split
    train_images = images[:train_end]
    val_images = images[train_end:val_end]
    test_images = images[val_end:]

    print(f"Total images: {total}")
    print(f"Training set: {len(train_images)} ({len(train_images)/total*100:.1f}%)")
    print(f"Validation set: {len(val_images)} ({len(val_images)/total*100:.1f}%)")
    print(f"Test set: {len(test_images)} ({len(test_images)/total*100:.1f}%)")

    # Copy files
    for split, img_list in [('train', train_images),
                            ('val', val_images),
                            ('test', test_images)]:
        split_images_dir = os.path.join(source_dir, 'images', split)
        split_labels_dir = os.path.join(source_dir, 'labels', split)

        os.makedirs(split_images_dir, exist_ok=True)
        os.makedirs(split_labels_dir, exist_ok=True)

        for img in img_list:
            # Copy image
            src_img = os.path.join(images_dir, img)
            dst_img = os.path.join(split_images_dir, img)
            shutil.copy(src_img, dst_img)

            # Copy annotation
            label_name = os.path.splitext(img)[0] + '.txt'
            src_label = os.path.join(labels_dir, label_name)
            dst_label = os.path.join(split_labels_dir, label_name)

            if os.path.exists(src_label):
                shutil.copy(src_label, dst_label)
            else:
                print(f"Warning: Annotation file missing: {label_name}")

    print("\nDataset split complete")

# Usage
split_dataset_simple('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Stratified Splitting Script (Recommended)

Stratified Split by Class:

import os
import shutil
import random
from collections import defaultdict

def split_dataset_stratified(source_dir, train_ratio=0.7, val_ratio=0.15, test_ratio=0.15, seed=42):
    """
    Stratified dataset split (by class)
    """
    random.seed(seed)

    images_dir = os.path.join(source_dir, 'images')
    labels_dir = os.path.join(source_dir, 'labels')

    # Group images by class
    images_by_class = defaultdict(list)

    for img_file in os.listdir(images_dir):
        if img_file.endswith(('.jpg', '.png')):
            label_file = os.path.splitext(img_file)[0] + '.txt'
            label_path = os.path.join(labels_dir, label_file)

            if os.path.exists(label_path):
                # Read annotation file, get classes
                with open(label_path, 'r') as f:
                    classes = set()
                    for line in f:
                        if line.strip():
                            class_id = int(line.split()[0])
                            classes.add(class_id)

                    # If image contains multiple classes, use the dominant class
                    if classes:
                        main_class = max(classes, key=lambda c: sum(1 for line in open(label_path) if line.strip() and int(line.split()[0]) == c))
                        images_by_class[main_class].append(img_file)

    # Split each class separately
    train_images = []
    val_images = []
    test_images = []

    for class_id, images in images_by_class.items():
        random.shuffle(images)

        total = len(images)
        train_end = int(total * train_ratio)
        val_end = train_end + int(total * val_ratio)

        train_images.extend(images[:train_end])
        val_images.extend(images[train_end:val_end])
        test_images.extend(images[val_end:])

        print(f"Class {class_id}: total={total}, train={train_end}, val={val_end-train_end}, test={total-val_end}")

    # Shuffle final lists
    random.shuffle(train_images)
    random.shuffle(val_images)
    random.shuffle(test_images)

    print(f"\nFinal split results:")
    print(f"Training set: {len(train_images)}")
    print(f"Validation set: {len(val_images)}")
    print(f"Test set: {len(test_images)}")

    # Copy files
    for split, img_list in [('train', train_images),
                            ('val', val_images),
                            ('test', test_images)]:
        split_images_dir = os.path.join(source_dir, 'images', split)
        split_labels_dir = os.path.join(source_dir, 'labels', split)

        os.makedirs(split_images_dir, exist_ok=True)
        os.makedirs(split_labels_dir, exist_ok=True)

        for img in img_list:
            # Copy image
            shutil.copy(os.path.join(images_dir, img),
                       os.path.join(split_images_dir, img))

            # Copy annotation
            label_name = os.path.splitext(img)[0] + '.txt'
            src_label = os.path.join(labels_dir, label_name)
            dst_label = os.path.join(split_labels_dir, label_name)

            if os.path.exists(src_label):
                shutil.copy(src_label, dst_label)

    print("\nStratified split complete")

# Usage
split_dataset_stratified('./dataset', train_ratio=0.7, val_ratio=0.15, test_ratio=0.15)

Post-Split Validation

Validation Script:

def verify_split(dataset_dir):
    """
    Verify dataset split results
    """
    splits = ['train', 'val', 'test']

    for split in splits:
        images_dir = os.path.join(dataset_dir, 'images', split)
        labels_dir = os.path.join(dataset_dir, 'labels', split)

        if not os.path.exists(images_dir):
            print(f"{split} set image directory does not exist")
            continue

        images = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.png'))]
        labels = [f for f in os.listdir(labels_dir) if f.endswith('.txt')]

        # Check if images and annotations match
        missing_labels = []
        for img in images:
            label_name = os.path.splitext(img)[0] + '.txt'
            if label_name not in labels:
                missing_labels.append(label_name)

        if missing_labels:
            print(f"{split} set has {len(missing_labels)} images missing annotation files")
        else:
            print(f"{split} set: {len(images)} images, {len(labels)} annotation files, all matched")

# Usage
verify_split('./dataset')

Split Checklist

Pre-Split Preparation:

All images are annotated
Annotation files are validated
Data is cleaned

Split Process:

Random seed used for reproducibility
Stratified split by class (recommended)
Consistent class distribution maintained

Post-Split Validation:

Images and annotation files match
Class distribution checked per split
Split ratios match expectations

Directory Structure:

train/val/test subdirectories created
Images and annotation files correctly copied
Clear directory structure

Step 5: Model Training

Model training is the process of converting annotated data into a usable model, requiring proper parameter configuration and training process monitoring.

5.1 Installing the YOLO Environment

YOLOv8 Installation (Recommended)

Why Choose YOLOv8?

Latest version, best performance
Simple installation, one command
Friendly API, easy to use
Comprehensive documentation, active community

Installation Steps:

1. Basic Installation:

# Install ultralytics (includes YOLOv8)
pip install ultralytics

# Verify installation
python -c "from ultralytics import YOLO; print('YOLOv8 installed successfully')"

2. GPU Support (Optional but Highly Recommended):

# Check if CUDA is available
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"

# If CUDA is not available, install CPU version
# pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

3. Dependency Check:

# Check key dependencies
pip list | grep -E "torch|ultralytics|opencv|pillow"

Environment Requirements:

Python 3.8+
PyTorch 1.8+
CUDA 11.0+ (for GPU training, optional)

YOLOv5 Installation (Alternative)

Installation Steps:

# Clone repository
git clone https://github.com/ultralytics/yolov5
cd yolov5

# Install dependencies
pip install -r requirements.txt

# Verify installation
python detect.py --help

Dependency Requirements:

Python 3.7+
PyTorch 1.7+
Other dependencies in requirements.txt

5.2 Training Configuration

YOLOv8 Training Configuration Details

Complete Training Script:

from ultralytics import YOLO
import torch

# Check device
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using device: {device}")

# Load pre-trained model
# Model selection:
# - yolov8n.pt: nano (smallest, fastest)
# - yolov8s.pt: small (small, fast)
# - yolov8m.pt: medium (balanced)
# - yolov8l.pt: large (high precision)
# - yolov8x.pt: xlarge (highest precision)
model = YOLO('yolov8n.pt')  # Choose based on needs

# Training configuration
results = model.train(
    # Dataset configuration
    data='dataset.yaml',      # Dataset config file path

    # Training parameters
    epochs=100,               # Training epochs (recommend: 100-300)
    imgsz=640,                # Input image size (640/416/1280)
    batch=16,                 # Batch size (adjust based on GPU memory)
    device=device,            # Device ('cuda'/'cpu'/'0,1' for multi-GPU)

    # Optimizer parameters
    lr0=0.01,                 # Initial learning rate
    lrf=0.01,                 # Final learning rate (lr0 * lrf)
    momentum=0.937,           # Momentum
    weight_decay=0.0005,     # Weight decay

    # Data augmentation
    hsv_h=0.015,             # Hue augmentation
    hsv_s=0.7,               # Saturation augmentation
    hsv_v=0.4,               # Value augmentation
    degrees=0.0,             # Rotation angle
    translate=0.1,           # Translation
    scale=0.5,               # Scale
    flipud=0.0,             # Vertical flip probability
    fliplr=0.5,             # Horizontal flip probability
    mosaic=1.0,             # Mosaic augmentation probability
    mixup=0.0,              # MixUp augmentation probability

    # Training settings
    patience=50,             # Early stopping patience (epochs without improvement)
    save=True,               # Save checkpoints
    save_period=10,          # Save every N epochs
    val=True,                # Validate during training
    plots=True,              # Generate training curve plots

    # Project settings
    project='runs/detect',    # Project directory
    name='my_model',         # Experiment name
    exist_ok=True,           # Allow overwriting existing experiments
    pretrained=True,         # Use pre-trained weights
    optimizer='SGD',         # Optimizer (SGD/Adam/AdamW)
    verbose=True,            # Verbose output
    seed=0,                  # Random seed
    deterministic=True,      # Deterministic training
    single_cls=False,        # Single class mode
    rect=False,              # Rectangular training
    cos_lr=False,            # Cosine learning rate schedule
    close_mosaic=10,         # Disable Mosaic for last N epochs
    resume=False,            # Resume training
    amp=True,                # Automatic mixed precision
    fraction=1.0,            # Fraction of dataset to use
    profile=False,           # Performance profiling
    freeze=None,             # Freeze layers (e.g., freeze=10 freezes first 10 layers)
)

# After training
print("Training complete!")
print(f"Best model saved at: {results.save_dir}")

Key Parameter Details

1. Model Selection:

Model	Parameters	Speed	Precision	Use Case
yolov8n	3.2M	Fastest	Lower	Real-time detection, edge devices
yolov8s	11.2M	Fast	Medium	Balance speed and precision
yolov8m	25.9M	Medium	Higher	Production environment (recommended)
yolov8l	43.7M	Slower	High	High precision requirements
yolov8x	68.2M	Slowest	Highest	Research, maximum precision

Selection Advice:

Beginners: yolov8n (quick validation)
Production: yolov8m (balanced)
High precision: yolov8l or yolov8x

2. Batch Size:

GPU Memory vs Batch Size:

GPU Memory	Recommended Batch Size (640x640)
4GB	4-8
6GB	8-12
8GB	12-16
12GB	16-24
16GB+	24-32

Adjustment Method:

If out of memory, reduce batch or imgsz
If memory is sufficient, larger batch improves training stability

3. Learning Rate (lr0):

Learning Rate Selection:

Default: 0.01 (SGD optimizer)
Small datasets: 0.001-0.005
Large datasets: 0.01-0.02
Fine-tuning: 0.0001-0.001

Learning Rate Scheduling:

Cosine annealing: cos_lr=True, learning rate follows cosine curve
Linear decay: Default, learning rate decreases linearly

4. Training Epochs:

Epoch Recommendations:

Small datasets (< 1000 images): 200-300 epochs
Medium datasets (1000-10000 images): 100-200 epochs
Large datasets (> 10000 images): 50-100 epochs

Early Stopping:

patience=50: Stops if validation performance doesn't improve for 50 epochs
Prevents overfitting, saves training time

YOLOv5 Training Configuration

Training Script:

import torch
from pathlib import Path

# Set paths
data_yaml = 'dataset.yaml'
weights = 'yolov5s.pt'  # Pre-trained weights
epochs = 100
batch_size = 16
img_size = 640
device = '0' if torch.cuda.is_available() else 'cpu'

# Training command (via command line)
# python train.py --data dataset.yaml --weights yolov5s.pt --epochs 100 --batch-size 16 --img 640 --device 0

5.3 Training Monitoring

Key Metrics Explained

1. mAP (Mean Average Precision):

mAP50:

Average precision at IoU threshold=0.5
Measures overall model performance
Target: > 0.5 (50%)

mAP50-95:

Average precision across IoU thresholds from 0.5 to 0.95
Stricter evaluation standard
Target: > 0.3 (30%)

2. Precision:

Proportion of true positives among predicted positives
Measures false positive rate
Target: > 0.8 (80%)

3. Recall:

Proportion of true positives correctly predicted
Measures miss rate
Target: > 0.8 (80%)

4. Loss:

Training Loss (train/box_loss):

Bounding box loss on training set
Should continuously decrease

Validation Loss (val/box_loss):

Bounding box loss on validation set
Should decrease; if it increases, indicates overfitting

Training Process Monitoring

Real-Time Monitoring:

# Training automatically generates:
# - Training curve plots (results.png)
# - Confusion matrix (confusion_matrix.png)
# - Validation results (val_batch*.jpg)
# - Training logs (results.csv)

Viewing Training Logs:

import pandas as pd
import matplotlib.pyplot as plt

# Read training logs
df = pd.read_csv('runs/detect/my_model/results.csv')

# Plot training curves
plt.figure(figsize=(12, 4))

plt.subplot(1, 3, 1)
plt.plot(df['epoch'], df['train/box_loss'], label='Train Loss')
plt.plot(df['epoch'], df['val/box_loss'], label='Val Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Loss Curve')

plt.subplot(1, 3, 2)
plt.plot(df['epoch'], df['metrics/mAP50(B)'], label='mAP50')
plt.xlabel('Epoch')
plt.ylabel('mAP50')
plt.legend()
plt.title('mAP50 Curve')

plt.subplot(1, 3, 3)
plt.plot(df['epoch'], df['metrics/precision(B)'], label='Precision')
plt.plot(df['epoch'], df['metrics/recall(B)'], label='Recall')
plt.xlabel('Epoch')
plt.ylabel('Score')
plt.legend()
plt.title('Precision & Recall')

plt.tight_layout()
plt.savefig('training_curves.png')
plt.show()

Training Tips and Best Practices

1. Learning Rate Adjustment Strategy:

Warm-up:

Use a smaller learning rate for the first few epochs
Helps stabilize training
YOLOv8 supports this by default

Learning Rate Decay:

Use cosine annealing: cos_lr=True
Or linear decay: default

2. Data Augmentation Strategy:

Basic Augmentation (enabled by default):

Horizontal flip: fliplr=0.5
Color augmentation: hsv_h/s/v
Mosaic: mosaic=1.0

Advanced Augmentation (optional):

MixUp: mixup=0.15 (for small datasets)
Rotation: degrees=10 (if target orientation doesn't matter)

3. Early Stopping:

Settings:

patience=50  # Stop if validation performance doesn't improve for 50 epochs

Benefits:

Prevents overfitting
Saves training time
Automatically selects the best model

4. Model Checkpoints:

Auto-Save:

Best model automatically saved each epoch
Saved at: runs/detect/my_model/weights/best.pt

Manual Save:

# Save at any point during training
model.save('my_checkpoint.pt')

Resume Training:

# Resume training from checkpoint
model = YOLO('runs/detect/my_model/weights/last.pt')
model.train(resume=True)

Training Problem Diagnosis

Problem 1: Loss Not Decreasing

Possible Causes:

Learning rate too high or too low
Poor data quality
Inappropriate model selection

Solutions:

Adjust learning rate (try 0.001-0.01)
Check data quality
Try a larger model

Problem 2: Overfitting (Training loss decreasing, validation loss increasing)

Possible Causes:

Insufficient data
Model too large
Insufficient data augmentation

Solutions:

Increase data volume
Use a smaller model
Increase data augmentation
Use dropout or regularization

Problem 3: Training Too Slow

Possible Causes:

Training on CPU
Batch size too small
Image size too large

Solutions:

Use GPU training
Increase batch size
Reduce image size (e.g., 640 -> 416)

Training Checklist

Pre-Training Preparation:

Dataset split (train/val/test)
Dataset config file (dataset.yaml) correct
Environment installed (YOLOv8/YOLOv5)
GPU available (if using GPU)

Training Configuration:

Appropriate model size selected
Batch size set based on GPU memory
Learning rate set reasonably
Sufficient training epochs

Training Monitoring:

Real-time training log review
Loss curve monitoring
mAP curve monitoring
Validation set performance check

Training Optimization:

Early stopping enabled
Checkpoints saved
Hyperparameters tuned
Training curves analyzed

Step 6: Model Evaluation and Optimization

Model evaluation is the critical step for validating model performance, and optimization is the ongoing process of improving it.

6.1 Evaluating the Model

Basic Evaluation

YOLOv8 Evaluation Script:

from ultralytics import YOLO

# Load trained model
model = YOLO('runs/detect/my_model/weights/best.pt')

# Evaluate on validation set
metrics = model.val(data='dataset.yaml', split='val')

# Print key metrics
print("=" * 50)
print("Model Evaluation Results")
print("=" * 50)
print(f"mAP50: {metrics.box.map50:.4f}")
print(f"mAP50-95: {metrics.box.map:.4f}")
print(f"Precision: {metrics.box.mp:.4f}")
print(f"Recall: {metrics.box.mr:.4f}")
print("=" * 50)

# Evaluate on test set (if exists)
if os.path.exists('dataset/images/test'):
    test_metrics = model.val(data='dataset.yaml', split='test')
    print("\nTest set evaluation results:")
    print(f"mAP50: {test_metrics.box.map50:.4f}")
    print(f"mAP50-95: {test_metrics.box.map:.4f}")

Detailed Evaluation Metrics

1. Per-Class Evaluation:

# Get detailed metrics for each class
for i, class_name in enumerate(model.names.values()):
    print(f"\nClass {i} ({class_name}):")
    print(f"  Precision: {metrics.box.p[i]:.4f}")
    print(f"  Recall: {metrics.box.r[i]:.4f}")
    print(f"  mAP50: {metrics.box.ap50[i]:.4f}")
    print(f"  mAP50-95: {metrics.box.ap[i]:.4f}")

2. Confusion Matrix Analysis:

# View confusion matrix (auto-generated in results directory)
# File location: runs/detect/my_model/confusion_matrix.png
# Analysis:
# - Diagonal: Correct classifications
# - Off-diagonal: Misclassifications
# - Identify easily confused class pairs

3. Visualizing Detection Results:

# Visualize detection results on test images
results = model('dataset/images/test', save=True, conf=0.25)

# View detection results
for result in results:
    # Get detection boxes
    boxes = result.boxes
    # Get classes
    classes = boxes.cls
    # Get confidence scores
    confidences = boxes.conf

    print(f"Detected {len(boxes)} objects")
    for i in range(len(boxes)):
        class_name = model.names[int(classes[i])]
        conf = confidences[i]
        print(f"  {class_name}: {conf:.2f}")

Performance Benchmarks

Performance Evaluation Standards:

Application Scenario	mAP50 Target	mAP50-95 Target	Notes
Quick Prototype	> 0.5	> 0.3	Validate ideas
Production Environment	> 0.7	> 0.5	Real-world application
High-Precision Application	> 0.9	> 0.7	Critical applications

Real Case:

An industrial quality inspection project:

Initial model: mAP50=0.65, couldn't meet production requirements

After optimization: mAP50=0.85, met production standards

Optimization methods: Improved data quality, increased data volume, tuned hyperparameters

6.2 Common Problems and Solutions

Problem Diagnosis Workflow

1. Low Accuracy (mAP < 0.5)

Diagnosis Steps:

# 1. Check data quality
# - Are annotations accurate?
# - Is data balanced?
# - Are scenes diverse?

# 2. Check model training
# - Is loss decreasing normally?
# - Is training sufficient?
# - Is learning rate appropriate?

# 3. Check model selection
# - Is the model too small?
# - Do you need a larger model?

Solutions:

Improve data quality: Re-check annotations, correct errors
Increase data volume: Collect more high-quality data
Use a larger model: Upgrade from yolov8n to yolov8m
Tune hyperparameters: Learning rate, batch size, etc.

2. Overfitting (Low training loss, high validation loss)

Diagnosis:

# Check training curves
# - train/box_loss continuously decreasing
# - val/box_loss first decreasing then increasing
# - High training mAP, low validation mAP

Solutions:

Increase data volume: Collect more data
Data augmentation: Enable more augmentation
Use a smaller model: Reduce model complexity
Regularization: Increase dropout or weight decay
Early stopping: Use early stopping mechanism

3. High Miss Rate (Low Recall)

Diagnosis:

# Check per-class recall
for i, class_name in enumerate(model.names.values()):
    recall = metrics.box.r[i]
    if recall < 0.7:
        print(f"Warning: {class_name} recall is low: {recall:.2f}")

Possible Causes:

Imbalanced data (some classes have few samples)
Small object detection difficulty
Threshold set too high

Solutions:

Balance data: Increase minority class samples
Lower confidence threshold: conf=0.15-0.25
Use higher resolution: imgsz=1280
Data augmentation: Target small object augmentation

4. High False Positive Rate (Low Precision)

Diagnosis:

# Check per-class precision
for i, class_name in enumerate(model.names.values()):
    precision = metrics.box.p[i]
    if precision < 0.7:
        print(f"Warning: {class_name} precision is low: {precision:.2f}")

Possible Causes:

Insufficient negative samples
High class similarity
Threshold set too low

Solutions:

Add negative samples: Include images without targets
Raise confidence threshold: conf=0.3-0.5
Refine categories: Distinguish similar classes
Post-processing optimization: Adjust NMS threshold

5. Training Too Slow or Not Converging

Diagnosis:

# Check training process
# - Is loss decreasing?
# - Is learning rate appropriate?
# - Is GPU utilization high?

Solutions:

Use GPU: Ensure GPU training
Adjust batch size: Based on GPU memory
Adjust learning rate: Try different learning rates
Check data: Ensure correct data format

Problem-Solution Reference Table

Problem	Symptoms	Possible Causes	Solutions
Low accuracy	mAP < 0.5	Poor data quality, insufficient data	Improve data quality, increase data
Overfitting	Good on train, poor on val	Insufficient data, model too large	More data, smaller model, augmentation
High miss rate	Recall < 0.7	Imbalanced data, high threshold	Balance data, lower threshold
High false positives	Precision < 0.7	Insufficient negatives, low threshold	Add negatives, raise threshold
Slow training	Long training time	CPU training, small batch	Use GPU, increase batch
Not converging	Loss not decreasing	Wrong learning rate, data issues	Adjust learning rate, check data

6.3 Model Optimization

Optimization Strategies

1. Data Optimization

Increase Data Volume:

Collect more high-quality data
Use data augmentation (rotation, flip, brightness, etc.)
Supplement from public datasets

Improve Data Quality:

Re-check annotations, correct errors
Standardize annotation criteria
Balance class data

Data Augmentation Script:

# Using YOLOv8's built-in data augmentation
# Automatically applied during training, no manual processing needed
# Adjustable via parameters:
model.train(
    hsv_h=0.015,    # Hue augmentation
    hsv_s=0.7,     # Saturation augmentation
    hsv_v=0.4,     # Value augmentation
    degrees=10,    # Rotation angle
    translate=0.1, # Translation
    scale=0.5,     # Scale
    mosaic=1.0,    # Mosaic augmentation
    mixup=0.15,    # MixUp augmentation
)

2. Hyperparameter Optimization

Learning Rate Optimization:

# Try different learning rates
learning_rates = [0.001, 0.005, 0.01, 0.02]

for lr in learning_rates:
    model = YOLO('yolov8n.pt')
    results = model.train(
        data='dataset.yaml',
        epochs=50,
        lr0=lr,
        name=f'lr_{lr}',
    )
    print(f"LR={lr}, mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")

Batch Size Optimization:

# Adjust batch size based on GPU memory
# Larger batches are generally more stable but require more memory
batch_sizes = [8, 16, 32]

for batch in batch_sizes:
    model = YOLO('yolov8n.pt')
    results = model.train(
        data='dataset.yaml',
        epochs=50,
        batch=batch,
        name=f'batch_{batch}',
    )

3. Model Selection Optimization

Model Size Comparison:

# Test different model sizes
models = ['yolov8n.pt', 'yolov8s.pt', 'yolov8m.pt']

for model_name in models:
    model = YOLO(model_name)
    results = model.train(
        data='dataset.yaml',
        epochs=100,
        name=model_name.replace('.pt', ''),
    )
    print(f"{model_name}: mAP50={results.results_dict['metrics/mAP50(B)']:.4f}")

4. Post-Processing Optimization

Adjusting Confidence Threshold:

# Default threshold is 0.25, adjustable based on needs
# Higher threshold: fewer false positives, but may increase misses
# Lower threshold: fewer misses, but may increase false positives

# Adjust during inference
results = model('test_image.jpg', conf=0.3)  # Raise threshold
results = model('test_image.jpg', conf=0.15)  # Lower threshold

Adjusting NMS Threshold:

# NMS (Non-Maximum Suppression) removes duplicate detections
# iou parameter controls NMS IoU threshold
# Higher iou: stricter NMS, fewer duplicate detections
# Lower iou: more lenient NMS, may keep more detection boxes

results = model('test_image.jpg', iou=0.45)  # Default is 0.7

5. Model Ensemble

Multi-Model Voting:

from ultralytics import YOLO
import numpy as np

# Load multiple models
models = [
    YOLO('runs/detect/model1/weights/best.pt'),
    YOLO('runs/detect/model2/weights/best.pt'),
    YOLO('runs/detect/model3/weights/best.pt'),
]

# Predict on the same image
image = 'test_image.jpg'
predictions = [model(image, conf=0.25) for model in models]

# Voting or averaging (simplified example)
# Real applications require more sophisticated ensemble strategies

Optimization Checklist

Data Optimization:

Sufficient data volume
High data quality
Balanced classes
Diverse scenes

Training Optimization:

Appropriate learning rate
Reasonable batch size
Sufficient training epochs
Data augmentation enabled

Model Optimization:

Appropriate model size
Pre-trained weights used
Different models tried

Post-Processing Optimization:

Appropriate confidence threshold
Appropriate NMS threshold
Model ensemble considered

Performance Evaluation:

mAP meets target
Precision and Recall balanced
Per-class performance balanced
Real-world application results satisfactory

Accelerate Dataset Creation with TjMakeBot

TjMakeBot's Advantages:

AI Chat-Based Annotation
- Natural language instructions, fast annotation
- Supports batch processing
- High accuracy
Video-to-Frame Feature
- Extract frames from video
- Custom frame rate
- Batch processing
Multi-Format Support
- YOLO format export
- VOC, COCO format support
- Convenient format conversion
Free (Basic Features)
- No usage limits
- No feature restrictions
- Online and ready to use

Start Using TjMakeBot to Create YOLO Datasets for Free ->

Conclusion

Creating a high-quality YOLO dataset is the foundation for model success. By choosing the right tools, following practical methods, and continuously optimizing, you can create high-quality datasets and train excellent models.

Remember: Data quality > Model architecture. Investing time in data yields significant returns.

Legal Disclaimer: The content of this article is for reference only and does not constitute any legal, commercial, or technical advice. When using any tools or methods, please comply with applicable laws and regulations, respect intellectual property rights, and obtain necessary authorizations. All company names, product names, and trademarks mentioned in this article are the property of their respective owners.

About the Author: The TjMakeBot team focuses on AI data annotation tool development, helping developers quickly create high-quality YOLO datasets.