Zero-Annotation Segmentation via YOLO-Guided SAM Prompt Automation

A fully automated plant segmentation pipeline that combines YOLO detection with a two-stage SAM prompt selection strategy, outperforming supervised baselines while requiring zero manual labels.

Manual segmentation labels are expensive to obtain in large field trials. This project develops an annotation-free segmentation pipeline that leverages YOLO detections to generate structured prompts for the Segment Anything Model (SAM), enabling scalable, high-quality plant segmentation without human annotations.

Overview of the annotation-free segmentation pipeline — Pipeline visualization where YOLO prompts SAM to generate dense plant masks.

Example annotation-free segmentation masks — Sample mask quality under heavy occlusion and mixed lighting.

Methodology

The pipeline turns bounding boxes into effective prompts through a two-stage selection process:

YOLO-based object localization. Use YOLO to detect plants and produce bounding boxes as initial regions of interest for segmentation.
Two-stage prompt selection. Within each box, automatically select point or box prompts for SAM in two stages, refining from coarse region proposals to more precise seeds.
SAM-driven segmentation. Run SAM with the automatically generated prompts to obtain high-quality plant masks without any manual labeling.
Post-processing for consistency. Apply simple geometric and morphological filters to enforce mask consistency across frames or plots when needed.

Results & Impact

The zero-annotation segmentation pipeline achieves about 9.2% IoU improvement over strong supervised baselines trained on manually labeled data, while removing the labeling bottleneck entirely.

This work illustrates how foundation models like SAM, when paired with task-specific detection and prompt engineering, can replace expensive human annotation in agricultural computer vision.