Tree Memory Networks for Modelling Long-term Temporal Dependencies

Abstract In the domain of sequence modelling, Recurrent Neural Networks (RNN) have been capable of achieving impressive results in a variety of application areas including visual question answering, part-of-speech tagging and machine translation. However this success in modelling short term dependencies has not successfully transitioned to application areas such as trajectory prediction, which require capturing both short term and long term relationships. In this paper, we propose a Tree Memory Network (TMN) for jointly modelling both long term relationships between multiple sequences and short term relationships within a sequence, in sequence-to-sequence mapping problems. The proposed network architecture is composed of an input module, controller and a memory module. In contrast to related literature which models the memory as a sequence of historical states, we model the memory as a recursive tree structure. This structure more effectively captures temporal dependencies across both short and long term time periods through its hierarchical structure. We demonstrate the effectiveness and flexibility of the proposed TMN in two practical problems: aircraft trajectory modelling and pedestrian trajectory modelling in a surveillance setting. In both cases the proposed approach outperforms the current state-of-the-art. Furthermore, we perform an in depth analysis on the evolution of the memory module content over time and provide visual evidence on how the proposed TMN is able to map both short and long term relationships efficiently via a hierarchical structure.

[1]  Rafael E. Banchs A Principled Approach to Context-Aware Machine Translation , 2014, HyTra@EACL.

[2]  Sridha Sridharan,et al.  Discovering methods of scoring in soccer using tracking data , 2015, KDD 2015.

[3]  Hong Yu,et al.  Neural Tree Indexers for Text Understanding , 2016, EACL.

[4]  Heinz Erzberger,et al.  Conflict Probability Estimation for Free Flight , 1997 .

[5]  W. Eric L. Grimson,et al.  Trajectory analysis and semantic region modeling using a nonparametric Bayesian model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Sridha Sridharan,et al.  Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection , 2017, Neural Networks.

[7]  Hong Yu,et al.  Neural Semantic Encoders , 2016, EACL.

[8]  Sridha Sridharan,et al.  Tracking by Prediction: A Deep Generative Model for Mutli-person Localisation and Tracking , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[9]  Sridha Sridharan,et al.  Unusual Scene Detection Using Distributed Behaviour Model and Sparse Representation , 2012, 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance.

[10]  John Lygeros,et al.  A probabilistic approach to aircraft conflict detection , 2000, IEEE Trans. Intell. Transp. Syst..

[11]  Xiaogang Wang,et al.  Learning Collective Crowd Behaviors with Dynamic Pedestrian-Agents , 2014, International Journal of Computer Vision.

[12]  Xinlei Chen,et al.  Learning a Recurrent Visual Representation for Image Caption Generation , 2014, ArXiv.

[13]  Hanan Samet,et al.  Aircraft Trajectory Prediction Made Easy with Predictive Analytics , 2016, KDD.

[14]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[15]  Sridha Sridharan,et al.  Two Stream LSTM: A Deep Fusion Framework for Human Action Recognition , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  E. Bradford,et al.  Using Aircraft Radar Tracks to Estimate Winds Aloft , 2007 .

[17]  Shuicheng Yan,et al.  Interpretable Structure-Evolving LSTM , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xuanjing Huang,et al.  Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification , 2016, EMNLP.

[19]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[20]  Max Mulder,et al.  A Machine Learning Approach to Trajectory Prediction , 2013 .

[21]  Barry E. Schwartz,et al.  WIND PREDICTION ACCURACY FOR AIR TRAFFIC MANAGEMENT DECISION SUPPORT TOOLS , 2000 .

[22]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[23]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[24]  Yulei Rao,et al.  A deep learning framework for financial time series using stacked autoencoders and long-short term memory , 2017, PloS one.

[25]  Sridha Sridharan,et al.  Deep Decision Trees for Discriminative Dictionary Learning with Adversarial Multi-agent Trajectories , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[26]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[28]  Thomas G. Dietterich,et al.  Active lmitation learning: formal and practical reductions to I.I.D. learning , 2014, J. Mach. Learn. Res..

[29]  Martial Hebert,et al.  Learning and Predicting Moving Object Trajectory: A Piecewise Trajectory Segment Approach , 2006 .

[30]  Luis E. Ortiz,et al.  Who are you with and where are you going? , 2011, CVPR 2011.

[31]  C. C. Lefas,et al.  Three-dimensional tracking using on-board measurements , 1991 .

[32]  Luc Van Gool,et al.  Improving Data Association by Joint Modeling of Pedestrian Trajectories and Groupings , 2010, ECCV.

[33]  Hongyu Guo,et al.  Long Short-Term Memory Over Tree Structures , 2015, ArXiv.

[34]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[35]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[36]  Jean-Marc Odobez,et al.  Extracting and locating temporal motifs in video scenes using a hierarchical non parametric Bayesian model , 2011, CVPR 2011.

[37]  Sridha Sridharan,et al.  Task Specific Visual Saliency Prediction with Memory Augmented Conditional Generative Adversarial Networks , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Shaogang Gong,et al.  A Markov Clustering Topic Model for mining behaviour in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Sridha Sridharan,et al.  Going Deeper: Autonomous Steering with Neural Memory Networks , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[40]  Xinlei Chen,et al.  Mind's eye: A recurrent visual representation for image caption generation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Fang Chen,et al.  An Efficient Approach for Multi-Sentence Compression , 2016, ACML.

[42]  Silvio Savarese,et al.  Social LSTM: Human Trajectory Prediction in Crowded Spaces , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jianxiong Xiao,et al.  DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[45]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[46]  Zhen-Hua Ling,et al.  Enhancing and Combining Sequential and Tree LSTM for Natural Language Inference , 2016, ArXiv.

[47]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[48]  Daniel Jurafsky,et al.  A Hierarchical Neural Autoencoder for Paragraphs and Documents , 2015, ACL.

[49]  Mario Fritz,et al.  A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input , 2014, NIPS.

[50]  Diana Liang,et al.  IMPROVING TRAJECTORY FORECASTING THROUGH ADAPTIVE FILTERING TECHNIQUES , 2003 .

[51]  Heinz Erzberger,et al.  Conflict probability estimation for free flight , 1997 .

[52]  Barbara Majecka,et al.  Statistical models of pedestrian behaviour in the Forum , 2009 .

[53]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[54]  S. State,et al.  A METHODOLOGY FOR AUTOMATED TRAJECTORY PREDICTION ANALYSIS , 2004 .

[55]  Tadahiro Taniguchi,et al.  Visualization of Driving Behavior Based on Hidden Feature Extraction by Using Deep Learning , 2017, IEEE Transactions on Intelligent Transportation Systems.

[56]  Xiaogang Wang,et al.  Understanding pedestrian behaviors from stationary crowd groups , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Shuicheng Yan,et al.  Semantic Object Parsing with Graph LSTM , 2016, ECCV.

[58]  Helbing,et al.  Social force model for pedestrian dynamics. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[59]  Aaron McFadyen,et al.  Terminal airspace modelling for unmanned aircraft systems integration , 2016, 2016 International Conference on Unmanned Aircraft Systems (ICUAS).

[60]  Lee F. Winder Hazard avoidance alerting with Markov decision processes , 2004 .

[61]  Quoc V. Le,et al.  Neural Programmer: Inducing Latent Programs with Gradient Descent , 2015, ICLR.

[62]  Hema Swetha Koppula,et al.  Anticipating Human Activities Using Object Affordances for Reactive Robotic Response , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Robert D. Oaks,et al.  Implementation and Metrics for a Trajectory Prediction Validation Methodology , 2007 .

[64]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.