Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture

Advanced Driver Assistance Systems (ADAS) have made driving safer over the last decade. They prepare vehicles for unsafe road conditions and alert drivers if they perform a dangerous maneuver. However, many accidents are unavoidable because by the time drivers are alerted, it is already too late. Anticipating maneuvers beforehand can alert drivers before they perform the maneuver and also give ADAS more time to avoid or prepare for the danger. In this work we propose a vehicular sensor-rich platform and learning algorithms for maneuver anticipation. For this purpose we equip a car with cameras, Global Positioning System (GPS), and a computing device to capture the driving context from both inside and outside of the car. In order to anticipate maneuvers, we propose a sensory-fusion deep learning architecture which jointly learns to anticipate and fuse multiple sensory streams. Our architecture consists of Recurrent Neural Networks (RNNs) that use Long Short-Term Memory (LSTM) units to capture long temporal dependencies. We propose a novel training procedure which allows the network to predict the future given only a partial temporal context. We introduce a diverse data set with 1180 miles of natural freeway and city driving, and show that we can anticipate maneuvers 3.5 seconds before they occur in real-time with a precision and recall of 90.5\% and 87.4\% respectively.

[1]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Zhengyou Zhang,et al.  A Survey of Recent Advances in Face Detection , 2010 .

[3]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[4]  Mathias Perrollaz,et al.  Learning-based approach for online lane change intention prediction , 2013, 2013 IEEE Intelligent Vehicles Symposium (IV).

[5]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Ruzena Bajcsy,et al.  Semiautonomous Vehicular Control Using Driver Modeling , 2014, IEEE Transactions on Intelligent Transportation Systems.

[8]  Alex Pentland,et al.  Graphical models for driver behavior recognition in a SmartCar , 2000, Proceedings of the IEEE Intelligent Vehicles Symposium 2000 (Cat. No.00TH8511).

[9]  Ashutosh Saxena,et al.  Beyond Geometric Path Planning: Learning Context-Driven Trajectory Preferences via Sub-optimal Feedback , 2016, ISRR.

[10]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[12]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[13]  Kevin Lee,et al.  Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions , 2014, Int. J. Robotics Res..

[14]  Anup Doshi,et al.  Lane change intent prediction for driver assistance: On-road design and evaluation , 2011, 2011 IEEE Intelligent Vehicles Symposium (IV).

[15]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[16]  Christoph Stiller,et al.  Driver intent inference at urban intersections using the intelligent driver model , 2012, 2012 IEEE Intelligent Vehicles Symposium.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  Maria E. Jabon,et al.  Facial expression analysis for predicting unsafe driving behavior , 2011, IEEE Pervasive Computing.

[19]  Klaus C. J. Dietmayer,et al.  Continuous Driver Intention Recognition with Hidden Markov Models , 2008, 2008 11th International IEEE Conference on Intelligent Transportation Systems.

[20]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[21]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[22]  Peter Robinson,et al.  Constrained Local Neural Fields for Robust Facial Landmark Detection in the Wild , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[23]  Luke Fletcher,et al.  Correlating driver gaze with the road scene for driver assistance systems , 2005, Robotics Auton. Syst..

[24]  Siddhartha S. Srinivasa,et al.  Generating Legible Motion , 2013, Robotics: Science and Systems.

[25]  Harm de Vries,et al.  RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .

[26]  Ruzena Bajcsy,et al.  Improved driver modeling for human-in-the-loop vehicular control , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Hema Swetha Koppula,et al.  Learning Spatio-Temporal Structure from RGB-D Videos for Human Activity Detection and Anticipation , 2013, ICML.

[28]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[29]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[30]  Yang Wang,et al.  A dynamic conditional random field model for object segmentation in image sequences , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Mohan M. Trivedi,et al.  Looking-in and looking-out vision for Urban Intelligent Assistance: Estimation of driver attentive state and dynamic surround for safe merging and braking , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[33]  Yong Du,et al.  Hierarchical recurrent neural network for skeleton based action recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Rachid Alami,et al.  A Human Aware Mobile Robot Motion Planner , 2007, IEEE Transactions on Robotics.

[35]  Ruzena Bajcsy,et al.  Safe semi-autonomous control with enhanced driver modeling , 2012, 2012 American Control Conference (ACC).

[36]  Hema Swetha Koppula,et al.  Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Wolfram Burgard,et al.  Learning Motion Patterns of People for Compliant Robot Motion , 2005, Int. J. Robotics Res..

[38]  Siddhartha S. Srinivasa,et al.  Formalizing Assistive Teleoperation , 2012, Robotics: Science and Systems.

[39]  Dmitry Berenson,et al.  Human-robot collaborative manipulation planning using early prediction of human motion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  Siddhartha S. Srinivasa,et al.  Planning-based prediction for pedestrians , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[41]  Markus Enzweiler,et al.  Will this car change the lane? - Turn signal recognition in the frequency domain , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[42]  Bernhard Schölkopf,et al.  Probabilistic movement modeling for intention inference in human–robot interaction , 2013, Int. J. Robotics Res..

[43]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Pablo Lardelli-Claret,et al.  The influence of passengers on the risk of the driver causing a car collision in Spain. Analysis of collisions from 1990 to 1999. , 2004, Accident; analysis and prevention.

[45]  Jiri Matas,et al.  Forward-Backward Error: Automatic Detection of Tracking Failures , 2010, 2010 20th International Conference on Pattern Recognition.

[46]  Hema Swetha Koppula,et al.  Learning human activities and object affordances from RGB-D videos , 2012, Int. J. Robotics Res..

[47]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[48]  Amaury Nègre,et al.  Probabilistic Analysis of Dynamic Scenes and Collision Risks Assessment to Improve Driving Safety , 2011, IEEE Intelligent Transportation Systems Magazine.

[49]  Tomohiro Yamamura,et al.  A Driver Behavior Recognition Method Based on a Driver Model Framework , 2000 .

[50]  Thorsten Joachims,et al.  Learning preferences for manipulation tasks from online coactive feedback , 2015, Int. J. Robotics Res..

[51]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[52]  Mohan M. Trivedi,et al.  On-road prediction of driver's intent with multimodal sensory cues , 2011, IEEE Pervasive Computing.

[53]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[54]  Fernando De la Torre,et al.  Supervised Descent Method for Solving Nonlinear Least Squares Problems in Computer Vision , 2014, ArXiv.

[55]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[56]  Sebastian Thrun,et al.  Probabilistic robotics , 2002, CACM.

[57]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[58]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[59]  Wolfram Burgard,et al.  Feature-Based Prediction of Trajectories for Socially Compliant Navigation , 2012, Robotics: Science and Systems.

[60]  Simon Baker,et al.  Active Appearance Models Revisited , 2004, International Journal of Computer Vision.

[61]  Reinhard Klette,et al.  Look at the Driver, Look at the Road: No Distraction! No Accident! , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Tarak Gandhi,et al.  Looking-In and Looking-Out of a Vehicle: Computer-Vision-Based Enhanced Vehicle Safety , 2007, IEEE Transactions on Intelligent Transportation Systems.

[63]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[64]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[65]  Lars Petersson,et al.  Vision in and out of Vehicles , 2003, IEEE Intell. Syst..

[66]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[67]  Yoshua Bengio,et al.  On the Expressive Power of Deep Architectures , 2011, ALT.

[68]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[69]  Ashutosh Saxena,et al.  Robobarista: Object Part Based Transfer of Manipulation Trajectories from Crowd-Sourcing in 3D Pointclouds , 2015, ISRR.

[70]  Erich Elsen,et al.  Deep Speech: Scaling up end-to-end speech recognition , 2014, ArXiv.

[71]  Jayesh K. Gupta,et al.  PlanIt: A crowdsourcing approach for learning to plan paths from large scale preference feedback , 2014, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[72]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.