Poster
in
Workshop: Adaptive Foundation Models: Evolving AI for Personalized and Efficient Learning
OmniPredict: GPT-4o Enhanced Multi-modal Pedestrian Crossing Intention Prediction
Je-Seok Ham · Jia Huang · Peng Jiang · Jinyoung Moon · Yongjin Kwon · Srikanth Saripalli · Changick Kim
Pedestrian crossing intention prediction is a crucial component for ensuring safety and responsible navigation in urban autonomous driving systems. Traditional methods, which have relied on vision-based models, struggle to generalize to unseen driving scenarios due to their dependence on training data. Multimodal Large Language Models (MLLMs) offer a novel approach to these challenges through their advanced reasoning capabilities. In this paper, we introduce OmniPredict, the first study to evaluate GPT-4o(mni), a cutting-edge MLLM, for predicting pedestrian crossing intentions. Using the JAAD dataset, our model achieved 67% prediction accuracy in a zero-shot setting, outperforming the performance of existing state-of-the-art MLLM methods by 17.5% without the need for additional data or retraining. By integrating diverse contextual modalities and finely tuned prompts, our approach enhances prediction reliability and reduces uncertainty. This demonstrates that our method contributes to improving prediction performance, thereby advancing safer driving environments.