Bio-LSTM: A Biomechanically Inspired Recurrent Neural Network for 3-D Pedestrian Pose and Gait Prediction

In applications, such as autonomous driving, it is important to understand, infer, and anticipate the intention and future behavior of pedestrians. This ability allows vehicles to avoid collisions and improve ride safety and quality. This letter proposes a biomechanically inspired recurrent neural network that can predict the location and three-dimensional (3-D) articulated body pose of pedestrians in a global coordinate frame, given 3-D poses and locations estimated in prior frames with inaccuracy. The proposed network is able to predict poses and global locations for multiple pedestrians simultaneously, for pedestrians up to 45 m from the cameras (urban intersection scale). The outputs of the proposed network are full-body 3-D meshes represented in skinned multi-person linear model parameters. The proposed approach relies on a novel objective function that incorporates the periodicity of human walking (gait), the mirror symmetry of the human body, and the change of ground reaction forces in a human gait cycle. This letter presents prediction results on the PedX dataset, a large-scale, in-the-wild data set collected at real urban intersections with heavy pedestrian traffic. Results show that the proposed network can successfully learn the characteristics of pedestrian gait and produce accurate and consistent 3-D pose predictions.

[1]  Anoop Cherian,et al.  Human Pose Forecasting via Deep Markov Models , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[2]  Danica Kragic,et al.  Anticipating Many Futures: Online Human Motion Prediction and Generation for Human-Robot Interaction , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[3]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  N. Troje,et al.  Person identification from biological motion: Effects of structural and kinematic cues , 2005, Perception & psychophysics.

[5]  Peter V. Gehler,et al.  Unite the People: Closing the Loop Between 3D and 2D Human Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Christopher L. Vaughan,et al.  Dynamics of human gait , 1992 .

[7]  Ignas Budvytis,et al.  Indirect deep structured learning for 3D human body shape and pose prediction , 2017, BMVC.

[8]  Christina Hui-Chan,et al.  Sudden turn during walking is impaired in people with Parkinson’s disease , 2008, Experimental Brain Research.

[9]  T. Shajina,et al.  Human Gait Recognition and Classification Using Time Series Shapelets , 2012, 2012 International Conference on Advances in Computing and Communications.

[10]  Iasonas Kokkinos,et al.  DensePose: Dense Human Pose Estimation in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[12]  M. Mert Ankarali,et al.  Variability, Symmetry, and Dynamics in Human Rhythmic Motor Control , 2015 .

[13]  Seunghoon Hong,et al.  Decomposing Motion and Content for Natural Video Sequence Prediction , 2017, ICLR.

[14]  Yongxiang Zhao,et al.  A unified follow-the-leader model for vehicle, bicycle and pedestrian traffic , 2017 .

[15]  Vertical position as a cue to pictorial depth: Height in the picture plane versus distance to the horizon , 2010, Attention, perception & psychophysics.

[16]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Xiaogang Liu,et al.  Human gait modeling and gait analysis based on Kinect , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[18]  Shane M. McClinton,et al.  Gait deviations associated with plantar heel pain: A systematic review , 2017, Clinical biomechanics.

[19]  Zhi Yan,et al.  3DOF Pedestrian Trajectory Prediction Learned from Long-Term Autonomous Mobile Robot Deployment Data , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[20]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Wolfram Burgard,et al.  MINERVA: a second-generation museum tour-guide robot , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[22]  Song Zheng,et al.  An Improved Moving Object Detection Algorithm Based on Frame Difference and Edge Detection , 2007, Fourth International Conference on Image and Graphics (ICIG 2007).

[23]  Katja D. Mombaur,et al.  Synthesis of full-body 3-D human gait using optimal control methods , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Xiaowei Zhou,et al.  Learning to Estimate 3D Human Pose and Shape from a Single Color Image , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Bodo Rosenhahn,et al.  Supplementary Material to: Recovering Accurate 3D Human Pose in The Wild Using IMUs and a Moving Camera , 2018 .

[26]  Ling Shao,et al.  DAP3D-Net: Where, what and how actions occur in videos? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27]  Mark S. Nixon,et al.  Markerless Human Gait Analysis via Image Sequences , 2003 .

[28]  Antoni B. Chan,et al.  3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network , 2014, ACCV.

[29]  Francesc Moreno-Noguer,et al.  A Joint Model for 2D and 3D Pose Estimation from a Single Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Wolfram Burgard,et al.  3D Human Pose Estimation in RGBD Images for Robotic Task Learning , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Gabriel Kreiman,et al.  Deep Predictive Coding Networks for Video Prediction and Unsupervised Learning , 2016, ICLR.

[32]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Lin Sun,et al.  Lattice Long Short-Term Memory for Human Action Recognition , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Chung Choo Chung,et al.  Sequence-to-Sequence Prediction of Vehicle Trajectory via LSTM Encoder-Decoder Architecture , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[35]  Sudeep Sarkar,et al.  Improved gait recognition by gait dynamics normalization , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[37]  Martial Hebert,et al.  The Pose Knows: Video Forecasting by Generating Pose Futures , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Kyle B. Reed,et al.  Comparing Gait with Multiple Physical Asymmetries Using Consolidated Metrics , 2018, Front. Neurorobot..

[39]  Nojun Kwak,et al.  3D Human Pose Estimation Using Convolutional Neural Networks with 2D Pose Information , 2016, ECCV Workshops.

[40]  Silvio Savarese,et al.  Structural-RNN: Deep Learning on Spatio-Temporal Graphs , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Antonio Torralba,et al.  Anticipating Visual Representations from Unlabeled Video , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jitendra Malik,et al.  Recurrent Network Models for Human Dynamics , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Ling Guan,et al.  Analysis of Human Gait Bilateral Symmetry for Functional Assessment after an Orthopaedic Surgery , 2009, ICIAR.

[44]  Noah J. Cowan,et al.  Walking dynamics are symmetric (enough) , 2014, Journal of The Royal Society Interface.

[45]  David A. Winter,et al.  Human balance and posture control during standing and walking , 1995 .

[46]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[47]  Dinesh Manocha,et al.  PORCA: Modeling and Planning for Autonomous Driving Among Many Pedestrians , 2018, IEEE Robotics and Automation Letters.

[48]  Jun Zhao,et al.  Global Correlation Based Ground Plane Estimation Using V-Disparity Image , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[49]  Matthew Johnson-Roberson,et al.  PedX: Benchmark Dataset for Metric 3-D Pose Estimation of Pedestrians in Complex Urban Intersections , 2018, IEEE Robotics and Automation Letters.

[50]  Michael J. Black,et al.  On Human Motion Prediction Using Recurrent Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Scott Cohen,et al.  Forecasting Human Dynamics from Static Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Dario Pavllo,et al.  QuaterNet: A Quaternion-based Recurrent Model for Human Motion , 2018, BMVC.

[53]  Masayoshi Tomizuka,et al.  Smooth and continuous human gait phase detection based on foot pressure patterns , 2008, 2008 IEEE International Conference on Robotics and Automation.

[54]  Markus Vincze,et al.  Towards detection of orthogonal planes in monocular images of indoor environments , 2008, 2008 IEEE International Conference on Robotics and Automation.

[55]  Peter V. Gehler,et al.  Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image , 2016, ECCV.