Multimodal Hand Gesture Classification for the Human-Car Interaction

The recent spread of low-cost and high-quality RGB-D and infrared sensors has supported the development of Natural User Interfaces (NUIs) in which the interaction is carried without the use of physical devices such as keyboards and mouse. In this paper, we propose a NUI based on dynamic hand gestures, acquired with RGB, depth and infrared sensors. The system is developed for the challenging automotive context, aiming at reducing the driver’s distraction during the driving activity. Specifically, the proposed framework is based on a multimodal combination of Convolutional Neural Networks whose input is represented by depth and infrared images, achieving a good level of light invariance, a key element in vision-based in-car systems. We test our system on a recent multimodal dataset collected in a realistic automotive setting, placing the sensors in an innovative point of view, i.e., in the tunnel console looking upwards. The dataset consists of a great amount of labelled frames containing 12 dynamic gestures performed by multiple subjects, making it suitable for deep learning-based approaches. In addition, we test the system on a different well-known public dataset, created for the interaction between the driver and the car. Experimental results on both datasets reveal the efficacy and the real-time performance of the proposed method.

[1]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[2]  Jim P. Stimpson,et al.  Trends in fatalities from distracted driving in the United States, 1999 to 2008. , 2010, American journal of public health.

[3]  Mohan M. Trivedi,et al.  Hand Gesture Recognition in Real Time for Automotive Interfaces: A Multimodal Vision-Based Approach and Evaluations , 2014, IEEE Transactions on Intelligent Transportation Systems.

[4]  A J McKnight,et al.  The effect of cellular phone use upon driver attention. , 1993, Accident; analysis and prevention.

[5]  Graham W. Taylor,et al.  Deep Multimodal Learning: A Survey on Recent Advances and Trends , 2017, IEEE Signal Processing Magazine.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Paul M. Salmon,et al.  Examining the relationship between driver distraction and driving errors: a discussion of theory, studies and methods , 2012 .

[8]  Ling Shao,et al.  Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Moshe Eizenman,et al.  An on-road assessment of cognitive distraction: impacts on drivers' visual behavior and braking performance. , 2007, Accident; analysis and prevention.

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Keiichi Uchimura,et al.  Driver Inattention Monitoring System for Intelligent Vehicles: A Review , 2009, IEEE Transactions on Intelligent Transportation Systems.

[12]  M. A. Recarte,et al.  Mental workload while driving: effects on visual search, discrimination, and decision making. , 2003, Journal of experimental psychology. Applied.

[13]  H. Robbins A Stochastic Approximation Method , 1951 .

[14]  Xiangyang Zhu,et al.  Hand Gesture Recognition and Finger Angle Estimation via Wrist-Worn Modified Barometric Pressure Sensing , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.