Skip to yearly menu bar Skip to main content


Poster
in
Workshop: 6th Robot Learning Workshop: Pretraining, Fine-Tuning, and Generalization with Large Scale Models

Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models

Tsun-Hsuan Johnson Wang · Alaa Maalouf · Wei Xiao · Yutong Ban · Alexander Amini · Guy Rosman · Sertac Karaman · Daniela Rus

Keywords: [ End-to-end Autonomous Driving; Generalization; Foundation Models ]


Abstract:

As autonomous driving technology matures, end-to-end methodologies have emerged as a leading strategy, promising seamless integration from perception to control via deep learning. However, existing systems grapple with challenges such as unexpected open set environments and the complexity of black-box models. At the same time, the evolution of deep learning introduces larger, multimodal foundational models, offering multi-modal visual and textual understanding. In this paper, we harness these multimodal foundation models to enhance the robustness and adaptability of autonomous driving systems. We introduce a method to extract nuanced spatial features from transformers and the incorporation of latent space simulation for improved training and policy debugging. We use pixel/patch-aligned feature descriptors to expand foundational model capabilities to create an end-to-end multimodal driving model, demonstrating unparalleled results in diverse tests. Our solution combines language with visual perception and achieves significantly greater robustness on out-of-distribution situations. Check our video at https://drive.google.com/file/d/1B8N7mUVsECkGfrEFRKJpgsOD8LXbcG9y/view?usp=sharing.

Chat is not available.