Continuous Timescale Long-Short Term Memory Neural Network for Human Intent Understanding

Understanding of human intention by observing a series of human actions has been a challenging task. In order to do so, we need to analyze longer sequences of human actions related with intentions and extract the context from the dynamic features. The multiple timescales recurrent neural network (MTRNN) model, which is believed to be a kind of solution, is a useful tool for recording and regenerating a continuous signal for dynamic tasks. However, the conventional MTRNN suffers from the vanishing gradient problem which renders it impossible to be used for longer sequence understanding. To address this problem, we propose a new model named Continuous Timescale Long-Short Term Memory (CTLSTM) in which we inherit the multiple timescales concept into the Long-Short Term Memory (LSTM) recurrent neural network (RNN) that addresses the vanishing gradient problem. We design an additional recurrent connection in the LSTM cell outputs to produce a time-delay in order to capture the slow context. Our experiments show that the proposed model exhibits better context modeling ability and captures the dynamic features on multiple large dataset classification tasks. The results illustrate that the multiple timescales concept enhances the ability of our model to handle longer sequences related with human intentions and hence proving to be more suitable for complex tasks, such as intention recognition.

[1]  Maya Lincoln,et al.  Semantic Machine Learning for Business Process Content Generation , 2012, OTM Conferences.

[2]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[3]  Shigeki Sugano,et al.  Imitating others by composition of primitive actions: A neuro-dynamic model , 2012, Robotics Auton. Syst..

[4]  Jun Tani,et al.  Formulating a Cognitive Branching Task by MTRNN: A Robotic Neuroscience Experiments to Simulate the PFC and Its Neighboring Regions , 2013 .

[5]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Minho Lee,et al.  Real-time human action classification using a dynamic neural model , 2015, Neural Networks.

[7]  Minho Lee,et al.  Human motion based intent recognition using a deep dynamic neural model , 2015, Robotics Auton. Syst..

[8]  Danica Kragic,et al.  Deep Representation Learning for Human Motion Prediction and Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Mohsen Ghadessy,et al.  Text and context in functional linguistics , 1999 .

[10]  Marcus Liwicki,et al.  Scene labeling with LSTM recurrent neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[12]  Danica Kragic,et al.  Anticipating many futures: Online human motion prediction and synthesis for human-robot collaboration , 2017, ArXiv.

[13]  Nicolas Usunier,et al.  Improving Neural Language Models with a Continuous Cache , 2016, ICLR.

[14]  Angelo Cangelosi,et al.  Multiple Time Scales Recurrent Neural Network for Complex Action Acquisition , 2011 .

[15]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[16]  Minho Lee,et al.  Understanding human intention by connecting perception and action learning in artificial agents , 2017, Neural Networks.

[17]  Minho Lee,et al.  Neuro-robotics study on integrative learning of proactive visual attention and motor behaviors , 2012, Cognitive Neurodynamics.

[18]  Randall D. Beer,et al.  Design and implementation of multipattern generators in analog VLSI , 2006, IEEE Trans. Neural Networks.

[19]  Mark Hereld,et al.  Oscillation in a network model of neocortex , 2009, ESANN.

[20]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[21]  Shigeki Sugano,et al.  CREATING NOVEL GOAL-DIRECTED ACTIONS AT CRITICALITY: A NEURO-ROBOTIC EXPERIMENT , 2009 .

[22]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[23]  T. Givón,et al.  Context as Other Minds: The Pragmatics of Sociality, Cognition and Communication , 2005 .

[24]  Xuanjing Huang,et al.  Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents , 2015, EMNLP.

[25]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[26]  Randall D. Beer,et al.  On the Dynamics of Small Continuous-Time Recurrent Neural Networks , 1995, Adapt. Behav..

[27]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[28]  Jun Tani,et al.  Achieving "organic compositionality" through self-organization: Reviews on brain-inspired robotics experiments , 2008, Neural Networks.

[29]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[30]  J. Hopfield,et al.  Computing with neural circuits: a model. , 1986, Science.

[31]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[32]  Martin V. Butz,et al.  Just Imagine! Learning to Emulate and Infer Actions with a Stochastic Generative Architecture , 2016, Front. Robot. AI.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Tamim Asfour,et al.  Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks , 2017, Robotics Auton. Syst..

[35]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.