Multi-View Transformer for Lightweight 3D Object Representation

A transformer-based architecture that fuses RGB and depth views to predict compact superquadric parameters, offering an efficient alternative to heavy 3D representations for object modeling and segmentation.

High-fidelity 3D representations such as Gaussian splatting provide excellent visual quality, but can be computationally heavy for large-scale deployment. This project explores a lightweight alternative: representing agricultural objects with superquadrics predicted by a multi-view RGB–Depth transformer, enabling efficient downstream tasks like 3D segmentation and tracking.

Methodology

The pipeline transforms multi-view image streams into compact parametric 3D shapes:

Results & Impact

The transformer–superquadric framework provides a lightweight 3D representation that is easy to store, manipulate, and integrate into robotics or analytics pipelines, while still capturing essential object geometry.

This work points toward scalable 3D modeling solutions that balance fidelity and efficiency for real-world agricultural applications.