Real-Time 6-DoF Fruit Pose Estimation for Quadruped Robots
A real-time detection and 6-DoF pose estimation pipeline based on VGGT to localize fruits in 3D, enabling quadruped robots to perform precise manipulation in the field.
To enable autonomous harvesting and inspection, field robots need accurate 3D localization of fruits under cluttered and dynamic conditions. This project designs a perception stack for a quadruped platform that detects fruits and estimates their full 6-DoF poses in real time using a VGGT-based backbone.
Methodology
The pipeline is tailored for deployment on a moving quadruped robot:
- Detection front-end. Run a real-time detector on onboard cameras to identify candidate fruit instances and crop regions of interest for pose estimation.
- VGGT-based pose estimation. Use VGGT to infer 3D structure and camera geometry, and derive 6-DoF fruit poses in the robot’s coordinate frame.
- Temporal smoothing. Apply simple temporal filtering to stabilize pose estimates across frames as the quadruped moves through the field.
- Integration with manipulation. Expose fruit pose estimates to downstream planning and control modules to support grasping and interaction.
Results & Impact
The system delivers real-time, 3D-accurate fruit localization suitable for quadruped manipulation tasks in strawberry fields, bridging high-level 3D perception and low-level robotic control.
This work connects emerging transformer-based 3D perception with practical agricultural robotics applications.