论文信息 - Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture

Recurrent Neural Networks for driver activity anticipation via sensory-fusion architecture

Anticipating the future actions of a human is a widely studied problem in robotics that requires spatio-temporal reasoning. In this work we propose a deep learning approach for anticipation in sensory-rich robotics applications. We introduce a sensory-fusion architecture which jointly learns to anticipate and fuse information from multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We train our architecture in a sequence-to-sequence prediction manner, and it explicitly learns to predict the future given only a partial temporal context. We further introduce a novel loss layer for anticipation which prevents over-fitting and encourages early anticipation. We use our architecture to anticipate driving maneuvers several seconds before they happen on a natural driving data set of 1180 miles. The context for maneuver anticipation comes from multiple sensors installed on the vehicle. Our approach shows significant improvement over the state-of-the-art in maneuver anticipation by increasing the precision from 77.4% to 90.5% and recall from 71.2% to 87.4%.

[1] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[2] Ashutosh Saxena,et al. Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[3] Michael S. Ryoo,et al. Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[4] Erich Elsen,et al. Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[5] Yoshua Bengio,et al. Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.

[6] Hema Swetha Koppula,et al. Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[7] Wolfram Burgard,et al. Learning Motion Patterns of People for Compliant Robot Motion , 2005, Int. J. Robotics Res..

[8] Ruzena Bajcsy,et al. Safe semi-autonomous control with enhanced driver modeling , 2012, 2012 American Control Conference (ACC).

[9] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[10] Ruzena Bajcsy,et al. Improved driver modeling for human-in-the-loop vehicular control , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[11] Martial Hebert,et al. Activity Forecasting , 2012, ECCV.

[12] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13] Mohan M. Trivedi,et al. Looking-in and looking-out vision for Urban Intelligent Assistance: Estimation of driver attentive state and dynamic surround for safe merging and braking , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[14] Hema Swetha Koppula,et al. Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[16] Dmitry Berenson,et al. Human-robot collaborative manipulation planning using early prediction of human motion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17] Pablo Lardelli-Claret,et al. The influence of passengers on the risk of the driver causing a car collision in Spain. Analysis of collisions from 1990 to 1999. , 2004, Accident; analysis and prevention.

[18] Ashutosh Saxena,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[19] Bernhard Schölkopf,et al. Probabilistic movement modeling for intention inference in human–robot interaction , 2013, Int. J. Robotics Res..

[20] Hema Swetha Koppula,et al. Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Trevor Darrell,et al. Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Juhan Nam,et al. Multimodal Deep Learning , 2011, ICML.

[23] Peter Robinson,et al. Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[24] Luke Fletcher,et al. Correlating driver gaze with the road scene for driver assistance systems , 2005, Robotics Auton. Syst..

[25] Ruzena Bajcsy,et al. Semiautonomous Vehicular Control Using Driver Modeling , 2014, IEEE Transactions on Intelligent Transportation Systems.

[26] Siddhartha S. Srinivasa,et al. Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.

[27] Cordelia Schmid,et al. Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[28] Wolfram Burgard,et al. Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[29] Yoshua Bengio,et al. On the Expressive Power of Deep Architectures , 2011, ALT.

[30] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[31] Kevin Lee,et al. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[32] Anup Doshi,et al. Lane change intent prediction for driver assistance: On-road design and evaluation , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[33] Yong Du,et al. Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Lars Petersson,et al. Vision in and out of Vehicles , 2003, IEEE Intell. Syst..

[35] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[36] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Siddhartha S. Srinivasa,et al. Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38] Maria E. Jabon,et al. Facial expression analysis for predicting unsafe driving behavior , 2011, IEEE Pervasive Computing.

[39] Mohan M. Trivedi,et al. On-road prediction of driver's intent with multimodal sensory cues , 2011, IEEE Pervasive Computing.

[40] Razvan Pascanu,et al. On the difficulty of training recurrent neural networks , 2012, ICML.

[41] Tarak Gandhi,et al. Looking-In and Looking-Out of a Vehicle: Computer-Vision-Based Enhanced Vehicle Safety , 2007, IEEE Transactions on Intelligent Transportation Systems.

[42] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.