Pedestrian Road-Crossing Prediction and Safety Enhancement Using Fine-tuned VideoLLaMA2 


Vol. 14,  No. 1, pp. 32-40, Jan.  2025
https://doi.org/10.3745/TKIPS.2025.14.1.32


PDF
  Abstract

This study proposes a pedestrian road-crossing prediction method utilizing the state-of-the-art Multimodal Large Language Model (MLLM), VideoLLaMA2, to enhance the safety of pedestrians, the most vulnerable road users in traffic accidents. With approximately 50 million traffic accidents occurring annually, many involving pedestrians, the need for systems predicting pedestrian road-crossing behavior is critical. To overcome the limitations of traditional machine learning and deep learning approaches that heavily rely on training datasets, this study constructs a QA dataset based on the JAAD dataset and fine-tunes VideoLLaMA2 to improve performance. VideoLLaMA2 leverages a Spatial-Temporal Convolution (STC) Connector to enhance the accuracy and interpretability of pedestrian behavior predictions. The fine-tuned model achieves a 2% improvement in accuracy (ACC 60%) and a high F1 score (0.63) compared to the non-fine-tuned model. Ablation studies further demonstrate the importance of input features such as Scene Context, Bounding Box coordinates, and Local Context, while larger pedestrian sizes within the frame show a positive correlation with predictive performance. Qualitative results reveal that the model can predict pedestrian behavior in an interpretable manner, while highlighting the need for improved data quality in some error cases. This study contributes to enhancing the reliability and safety of autonomous vehicles and traffic safety systems, and it suggests future directions for validating the model across diverse driving environments and strengthening real-time responsiveness.

  Statistics


  Cite this article

[IEEE Style]

S. H. Kim, J. Ham, J. Moon, "Pedestrian Road-Crossing Prediction and Safety Enhancement Using Fine-tuned VideoLLaMA2," The Transactions of the Korea Information Processing Society, vol. 14, no. 1, pp. 32-40, 2025. DOI: https://doi.org/10.3745/TKIPS.2025.14.1.32.

[ACM Style]

Sung Hun Kim, Je-Seok Ham, and Jinyoung Moon. 2025. Pedestrian Road-Crossing Prediction and Safety Enhancement Using Fine-tuned VideoLLaMA2. The Transactions of the Korea Information Processing Society, 14, 1, (2025), 32-40. DOI: https://doi.org/10.3745/TKIPS.2025.14.1.32.