Lightweight Detection with Mamba and Frequency-Domain Channel Mixing

A specialized detection model for crop ripeness staging that integrates Mamba-based state space modeling with frequency-domain channel mixing to approach YOLOv12-L performance with a fraction of the parameters and computation.

This project targets efficient object detection for in-field fruit and ripeness detection under resource constraints. We design a lightweight detection model that incorporates a Mamba block and a frequency-domain channel mixer, achieving strong accuracy with significantly reduced model size and FLOPs compared with a large YOLOv12-L baseline.

Overall lightweight detection architecture — Overall detector topology showing the Mamba-enhanced backbone.

Details of the Mamba block — Mamba block detail highlighting long-range dependency modeling.

Field deployment scan visualization — Scanning Mechanism: The left and middle panels represent continuous scanning in Mamba YOLO, while the right panel illustrates the tree topology-based scan.

Methodology

The architecture focuses on balancing accuracy and efficiency:

Mamba-based state space block. Replace parts of the convolutional backbone or neck with a Mamba block to capture long-range dependencies in a parameter- and compute-efficient way.
Frequency-domain channel mixing. Apply frequency-domain transforms to feature maps and perform channel mixing to emphasize informative bands related to fruit texture and boundaries.
Task-specific training. Train the model for crop ripeness staging in real fields, with augmentations tailored to illumination changes, occlusion, and clutter.
Efficiency-aware evaluation. Measure both detection accuracy and resource usage against YOLOv12-L to quantify the trade-offs.

Results & Impact

The proposed model reaches about 93% of YOLOv12-L’s performance while using roughly 17% fewer parameters and 46% less computation, making it well-suited for deployment on edge devices or field robots.

This work demonstrates that combining modern sequence modeling (Mamba) with frequency-domain reasoning can yield compact yet powerful detectors for agricultural applications.