IR/RGB Contrastive Fusion for Robust Automatic Target Recognition

A multimodal automatic target recognition system that uses frozen DINOv2 backbones for IR and RGB, contrastive learning for embedding alignment, and Bayesian decision fusion for robust classification.

Automatic target recognition (ATR) benefits from combining infrared and RGB sensing, but fusing the modalities effectively is non-trivial. In this project, we build an ATR pipeline that learns IR and RGB embeddings via contrastive learning on top of frozen DINOv2 backbones, and then fuses classifier outputs with a Bayesian network for robust decisions.

IR/RGB contrastive fusion pipeline overview
Contrastive training aligns IR/RGB embeddings before Bayesian decision fusion.

Methodology

The design separates representation learning, classification, and decision fusion:

Results & Impact

The contrastive fusion approach demonstrates measurable gains in classification performance and robustness, particularly in conditions where one modality alone is unreliable.

This project showcases how frozen foundation models, contrastive learning, and probabilistic fusion can be combined into a practical multimodal ATR system.

Read the detailed report: Multimodal ATR Technical Report (PDF).