Multimodal Yield Forecasting with Vision–Weather–Growth Fusion

A multimodal time-series model that integrates vision, weather, phenological growth dynamics, and spatial layout encoding to produce accurate and robust yield forecasts for precision agriculture.

Yield is driven by a combination of plant status, environmental conditions, and spatial management. In this project, we build a multimodal forecasting model that combines image-derived features, weather data, growth-stage information, and spatial layout representations to predict yield in strawberry production.

Multimodal forecasting pipeline
Pipeline overview showing how vision, weather, phenology, and spatial cues are fused.

Methodology

The architecture treats yield prediction as a multimodal sequence modeling problem:

Results & Impact

The multimodal model improves yield forecasting accuracy and stability compared with unimodal, vision-only baselines, especially under cross-season evaluation.

Yield time series fit
Sample time-series forecast versus ground truth illustrating calibrated predictions.

This work underscores the importance of combining visual, environmental, temporal, and spatial cues when building predictive models for precision agriculture.