Lifelong learning of human actions with deep neural network self-organization

Lifelong learning is fundamental in autonomous robotics for the acquisition and fine-tuning of knowledge through experience. However, conventional deep neural models for action recognition from videos do not account for lifelong learning but rather learn a batch of training data with a predefined number of action classes and samples. Thus, there is the need to develop learning systems with the ability to incrementally process available perceptual cues and to adapt their responses over time. We propose a self-organizing neural architecture for incrementally learning to classify human actions from video sequences. The architecture comprises growing self-organizing networks equipped with recurrent neurons for processing time-varying patterns. We use a set of hierarchically arranged recurrent networks for the unsupervised learning of action representations with increasingly large spatiotemporal receptive fields. Lifelong learning is achieved in terms of prediction-driven neural dynamics in which the growth and the adaptation of the recurrent networks are driven by their capability to reconstruct temporally ordered input sequences. Experimental results on a classification task using two action benchmark datasets show that our model is competitive with state-of-the-art methods for batch learning also when a significant number of sample labels are missing or corrupted during training sessions. Additional experiments show the ability of our model to adapt to non-stationary input avoiding catastrophic interference.

[1]  John G. Taylor,et al.  The temporal Kohönen map , 1993, Neural Networks.

[2]  Philipp Cimiano,et al.  Online Labelling Strategies for Growing Neural Gas , 2011, IDEAL.

[3]  G. F. Cooper,et al.  Development of the Brain depends on the Visual Environment , 1970, Nature.

[4]  Thomas Serre,et al.  Neural representation of action sequences: how far can a simple snippet-matching model take us? , 2013, NIPS.

[5]  Michael S. Lew,et al.  Deep learning for visual understanding: A review , 2016, Neurocomputing.

[6]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[7]  Michael Beetz,et al.  Incremental Unsupervised Time Series Analysis Using Merge Growing Neural Gas , 2009, WSOM.

[8]  Everton J. Agnes,et al.  Diverse synaptic plasticity mechanisms orchestrated to form and retrieve memories in spiking neural networks , 2015, Nature Communications.

[9]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[10]  Jonathan Tompson,et al.  MoDeep: A Deep Learning Framework Using Motion Features for Human Pose Estimation , 2014, ACCV.

[11]  D. Perrett,et al.  Visual neurones responsive to faces in the monkey temporal cortex , 2004, Experimental Brain Research.

[12]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..

[13]  Michael S. C. Thomas,et al.  Critical periods and catastrophic interference effects in the development of self-organizing feature maps. , 2008, Developmental science.

[14]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[15]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[16]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[17]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[18]  J. DiCarlo,et al.  Using goal-driven deep learning models to understand sensory cortex , 2016, Nature Neuroscience.

[19]  Derek Hoiem,et al.  Action Recognition , 2014, Computer Vision, A Reference Guide.

[20]  Wulfram Gerstner,et al.  Hebbian and non-Hebbian plasticity orchestrated to form and retrieve memories in spiking networks , 2015 .

[21]  F. Gage,et al.  Neurogenesis in the adult human hippocampus , 1998, Nature Medicine.

[22]  Barbara Hammer,et al.  Merge SOM for temporal data , 2005, Neurocomputing.

[23]  T. Poggio,et al.  Cognitive neuroscience: Neural mechanisms for the recognition of biological movements , 2003, Nature Reviews Neuroscience.

[24]  Jun Tani,et al.  Self-Organization of Spatio-Temporal Hierarchy via Learning of Dynamic Visual Image Patterns on Action Sequences , 2015, PloS one.

[25]  Jean-Luc R Stevens,et al.  Mechanisms for Stable, Robust, and Adaptive Development of Orientation Maps in the Primary Visual Cortex , 2013, The Journal of Neuroscience.

[26]  Pablo A. Estévez,et al.  Nonlinear Time Series Analysis by Using Gamma Growing Neural Gas , 2012, WSOM.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  A. Senghas,et al.  Children Creating Core Properties of Language: Evidence from an Emerging Sign Language in Nicaragua , 2004, Science.

[29]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[30]  Stefano Nolfi,et al.  Learning to perceive the world as articulated: an approach for hierarchical learning in sensory-motor systems , 1998, Neural Networks.

[31]  Nikola K. Kasabov,et al.  NeuCube: A spiking neural network architecture for mapping, learning and understanding of spatio-temporal brain data , 2014, Neural Networks.

[32]  C. Blakemore,et al.  Innate and environmental factors in the development of the kitten's visual cortex. , 1975, The Journal of physiology.

[33]  D. Heeger,et al.  A Hierarchy of Temporal Receptive Windows in Human Cortex , 2008, The Journal of Neuroscience.

[34]  G. Rizzolatti,et al.  Neural and Computational Mechanisms of Action Processing: Interaction between Visual and Motor Representations , 2015, Neuron.

[35]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Alexander G. Hauptmann,et al.  MoSIFT : Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 , 2009 .

[37]  R. Vogels,et al.  Functional differentiation of macaque visual temporal cortical neurons using a parametric action space. , 2009, Cerebral cortex.

[38]  S LewMichael,et al.  Deep learning for visual understanding , 2016 .

[39]  W. Maass,et al.  State-dependent computations: spatiotemporal processing in cortical networks , 2009, Nature Reviews Neuroscience.

[40]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Pablo A. Estévez,et al.  Gamma-Filter Self-Organizing Neural Networks for Time Series Analysis , 2011, WSOM.

[42]  Jin Fan,et al.  Effects of motivation on reward and attentional networks: an fMRI study , 2012, Brain and behavior.

[43]  D. N. Spinelli,et al.  Visual Experience Modifies Distribution of Horizontally and Vertically Oriented Receptive Fields in Cats , 1970, Science.

[44]  José Carlos Príncipe,et al.  The gamma model--A new neural model for temporal processing , 1992, Neural Networks.

[45]  Stefan Wermter,et al.  Emergence of multimodal action representations from neural network self-organization , 2017, Cognitive Systems Research.

[46]  Jürgen Schmidhuber,et al.  Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.

[47]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[48]  D. H. Hubel,et al.  RECEPTIVE FIELDS, BINOCULAR AND FUNCTIONAL ARCHITECTURE IN THE CAT’S VISUAL CORTEX , 1962 .

[49]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[50]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[51]  A. Borst Seeing smells: imaging olfactory learning in bees , 1999, Nature Neuroscience.

[52]  Jean-Arcady Meyer,et al.  Learning to Perceive the World as Articulated: An Approach for Hierarchical Learning in Sensory-Motor Systems , 1998 .

[53]  Rajesh P. N. Rao,et al.  Predictive Coding , 2019, A Blueprint for the Hard Problem of Consciousness.

[54]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[55]  W. Gerstner,et al.  The temporal paradox of Hebbian learning and homeostatic plasticity , 2017, bioRxiv.

[56]  Stephen Grossberg,et al.  Competitive Learning: From Interactive Activation to Adaptive Resonance , 1987, Cogn. Sci..

[57]  Stefan Wermter,et al.  Human motion assessment in real time using recurrent self-organization , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[58]  Stephen R. Marsland,et al.  A self-organising network that grows when required , 2002, Neural Networks.

[59]  Stephen R. Marsland,et al.  On-line novelty detection for autonomous mobile robots , 2005, Robotics Auton. Syst..

[60]  Larry S. Davis,et al.  Action Recognition with Image Based CNN Features , 2015, ArXiv.

[61]  Stefan Wermter,et al.  Self-organizing neural integration of pose-motion features for human action recognition , 2015, Front. Neurorobot..

[62]  D. J. Felleman,et al.  Distributed hierarchical processing in the primate cerebral cortex. , 1991, Cerebral cortex.

[63]  Maryam Gholami Doborjeh,et al.  Mapping, Learning, Visualization, Classification, and Understanding of fMRI Data in the NeuCube Evolving Spatiotemporal Data Machine of Spiking Neural Networks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[64]  T. Martínez,et al.  Competitive Hebbian Learning Rule Forms Perfectly Topology Preserving Maps , 1993 .

[65]  Ranjana Sridhar ON Self Organising Network , 2015 .

[66]  Christian Wolf,et al.  Sequential Deep Learning for Human Action Recognition , 2011, HBU.

[67]  Pietro Michelucci,et al.  Cumulative Learning , 2017, Encyclopedia of Machine Learning and Data Mining.

[68]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[69]  Gordon Pipa,et al.  RM-SORN: a reward-modulated self-organizing recurrent neural network , 2015, Front. Comput. Neurosci..

[70]  W. Gerstner,et al.  Hebbian plasticity requires compensatory processes on multiple timescales , 2017, Philosophical Transactions of the Royal Society B: Biological Sciences.

[71]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[72]  J. C. Stanley Computer simulation of a model of habituation , 1976, Nature.

[73]  Anni Cai,et al.  Comparing Evaluation Protocols on the KTH Dataset , 2010, HBU.

[74]  Hayong Harry Zhou,et al.  CSM: A Computational Model of Cumulative Learning , 1990, Machine Learning.

[75]  T. Sato,et al.  Interactions of visual stimuli in the receptive fields of inferior temporal neurons in awake macaques , 2004, Experimental Brain Research.

[76]  G. Ming,et al.  Adult Neurogenesis in the Mammalian Brain: Significant Answers and Significant Questions , 2011, Neuron.

[77]  H T Siegelmann,et al.  The global landscape of cognition: hierarchical aggregation as an organizational principle of human cortical networks and functions , 2015, Scientific Reports.

[78]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Elissa L. Newport,et al.  Critical period effects on universal properties of language: The status of subjacency in the acquisition of a second language , 1991, Cognition.

[80]  Sven Behnke,et al.  Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition , 2010, ICANN.

[81]  C. Nelson Neural plasticity and human development: the role of early experience in sculpting memory systems , 2000 .

[82]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[83]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[84]  C. Honey,et al.  Topographic Mapping of a Hierarchy of Temporal Receptive Windows Using a Narrated Story , 2011, The Journal of Neuroscience.

[85]  Václav Hlavác,et al.  Pose primitive based human action recognition in videos or still images , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Peter Elias,et al.  Predictive coding-I , 1955, IRE Trans. Inf. Theory.

[87]  W. Gerstner,et al.  The temporal paradox of Hebbian learning and homeostatic plasticity , 2017, Current Opinion in Neurobiology.

[88]  Leslie G. Ungerleider Two cortical visual systems , 1982 .

[89]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[90]  Jun Tani,et al.  Learning to generate articulated behavior through the bottom-up and the top-down interaction processes , 2003, Neural Networks.

[91]  F. Sengpiel,et al.  Influence of experience on orientation maps in cat visual cortex , 1999, Nature Neuroscience.