Lightweight Detection with Mamba and Frequency-Domain Channel Mixing

A specialized detection model for crop ripeness staging that integrates Mamba-based state space modeling with frequency-domain channel mixing to approach YOLOv12-L performance with a fraction of the parameters and computation.

This project targets efficient object detection for in-field fruit and ripeness detection under resource constraints. We design a lightweight detection model that incorporates a Mamba block and a frequency-domain channel mixer, achieving strong accuracy with significantly reduced model size and FLOPs compared with a large YOLOv12-L baseline.

Overall lightweight detection architecture
Overall detector topology showing the Mamba-enhanced backbone.
Details of the Mamba block
Mamba block detail highlighting long-range dependency modeling.
Field deployment scan visualization
Scanning Mechanism: The left and middle panels represent continuous scanning in Mamba YOLO, while the right panel illustrates the tree topology-based scan.

Methodology

The architecture focuses on balancing accuracy and efficiency:

Results & Impact

The proposed model reaches about 93% of YOLOv12-L’s performance while using roughly 17% fewer parameters and 46% less computation, making it well-suited for deployment on edge devices or field robots.

This work demonstrates that combining modern sequence modeling (Mamba) with frequency-domain reasoning can yield compact yet powerful detectors for agricultural applications.