Continuous Online Sequence Learning with an Unsupervised Neural Network Model

The ability to recognize and predict temporal sequences of sensory inputs is vital for survival in natural environments. Based on many known properties of cortical neurons, hierarchical temporal memory (HTM) sequence memory recently has been proposed as a theoretical framework for sequence learning in the cortex. In this letter, we analyze properties of HTM sequence memory and apply it to sequence learning and prediction problems with streaming data. We show the model is able to continuously learn a large number of variable order temporal sequences using an unsupervised Hebbian-like learning rule. The sparse temporal codes formed by the model can robustly handle branching temporal sequences by maintaining multiple predictions until there is sufficient disambiguating evidence. We compare the HTM sequence memory with other sequence learning algorithms, including statistical methods—autoregressive integrated moving average; feedforward neural networks—time delay neural network and online sequential extreme learning machine; and recurrent neural networks—long short-term memory and echo-state networks on sequence prediction problems with both artificial and real-world data. The HTM model achieves comparable accuracy to other state-of-the-art algorithms. The model also exhibits properties that are critical for sequence learning, including continuous online learning, the ability to handle multiple predictions and branching sequences with high-order statistics, robustness to sensor noise and fault tolerance, and good performance without task-specific hyperparameter tuning. Therefore, the HTM sequence memory not only advances our understanding of how the brain may solve the sequence learning problem but is also applicable to real-world sequence learning problems from continuous data streams.

[1]  W. Senn,et al.  Matching Recall and Storage in Sequence Learning with Spiking Neural Networks , 2013, The Journal of Neuroscience.

[2]  Subutai Ahmad,et al.  Why Neurons Have Thousands of Synapses, a Theory of Sequence Memory in Neocortex , 2015, Front. Neural Circuits.

[3]  Dianhui Wang,et al.  Extreme learning machines: a survey , 2011, Int. J. Mach. Learn. Cybern..

[4]  John Scott Bridle,et al.  Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition , 1989, NATO Neurocomputing.

[5]  Andrzej J. Kasinski,et al.  Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification, and Spike Shifting , 2010, Neural Computation.

[6]  Alessandra Angelucci,et al.  Induction of visual orientation modules in auditory cortex , 2000, Nature.

[7]  Plamen P. Angelov,et al.  Handling drifts and shifts in on-line data streams with evolving fuzzy systems , 2011, Appl. Soft Comput..

[8]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[9]  Michael A. Arbib,et al.  The handbook of brain theory and neural networks , 1995, A Bradford book.

[10]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[11]  Wen-Liang L Zhou,et al.  The decade of the dendritic NMDA spike , 2010, Journal of neuroscience research.

[12]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[13]  Yann LeCun,et al.  Orthogonal RNNs and Long-Memory Tasks , 2016, ArXiv.

[14]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[15]  Peter Földiák,et al.  Sparse coding in the primate cortex , 1998 .

[16]  Bartlett W. Mel,et al.  Computational subunits in thin dendrites of pyramidal cells , 2004, Nature Neuroscience.

[17]  Mark F. Bear,et al.  Learned spatiotemporal sequence recognition and prediction in primary visual cortex , 2014, Nature Neuroscience.

[18]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[19]  Michael I. Jordan,et al.  The Handbook of Brain Theory and Neural Networks , 2002 .

[20]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments , 2012 .

[21]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[22]  Yuwei Cui,et al.  Inferring Nonlinear Neuronal Computation Based on Physiologically Plausible Inputs , 2013, PLoS Comput. Biol..

[23]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments: Methods and Applications , 2012 .

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[26]  Minjae Lee,et al.  Fault tolerance analysis of digital feed-forward deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Philip S. Yu,et al.  Mining Data Streams , 2005, The Data Mining and Knowledge Discovery Handbook.

[28]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[29]  Yann LeCun,et al.  Recurrent Orthogonal Networks and Long-Memory Tasks , 2016, ICML.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  C. Schreiner,et al.  Sequence sensitivity of neurons in cat primary auditory cortex. , 2000, Cerebral cortex.

[32]  Jürgen Schmidhuber,et al.  Simple algorithmic theory of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes (特集 高次機能の学習と創発--脳・ロボット・人間研究における新たな展開) , 2009 .

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[34]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[35]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[36]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[37]  Spencer L. Smith,et al.  Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo , 2013, Nature.

[38]  Harald Haas,et al.  Harnessing Nonlinearity: Predicting Chaotic Systems and Saving Energy in Wireless Communication , 2004, Science.

[39]  Amir F. Atiya,et al.  A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition , 2011, Expert Syst. Appl..

[40]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[41]  K. Svoboda,et al.  Activity-Dependent Synaptogenesis in the Adult Mammalian Cortex , 2002, Neuron.

[42]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[43]  D. Buxhoeveden,et al.  The minicolumn hypothesis in neuroscience. , 2002, Brain : a journal of neurology.

[44]  Raghu Ramakrishnan,et al.  Proceedings : KDD 2000 : the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 20-23, 2000, Boston, MA, USA , 2000 .

[45]  N. Spruston Pyramidal neurons: dendritic structure and synaptic integration , 2008, Nature Reviews Neuroscience.

[46]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[47]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[48]  D. Georgescauld Local Cortical Circuits, An Electrophysiological Study , 1983 .

[49]  W. Singer,et al.  Distributed Fading Memory for Stimulus Properties in the Primary Visual Cortex , 2009, PLoS biology.

[50]  Ernest Fokoué,et al.  A Mathematical Formalization of Hierarchical Temporal Memory’s Spatial Pooler , 2017, Front. Robot. AI.

[51]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[52]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[53]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[54]  Professor Moshe Abeles,et al.  Local Cortical Circuits , 1982, Studies of Brain Function.

[55]  Subutai Ahmad,et al.  How do neurons operate on sparse distributed representations? A mathematical theory of sparsity, neurons and active dendrites , 2016, ArXiv.

[56]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[57]  S. N. Yanushkevich,et al.  Design of neuromorphic logic networks and fault-tolerant computing , 2011, 2011 11th IEEE International Conference on Nanotechnology.

[58]  Subutai Ahmad,et al.  Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory , 2015, ArXiv.

[59]  Min Han,et al.  Online sequential extreme learning machine with kernels for nonstationary time series prediction , 2014, Neurocomputing.

[60]  Z. Bashir,et al.  Long-term depression: multiple forms and implications for brain function , 2007, Trends in Neurosciences.

[61]  Subutai Ahmad,et al.  Evaluating Real-Time Anomaly Detection Algorithms -- The Numenta Anomaly Benchmark , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[62]  Sven F. Crone,et al.  Advances in forecasting with neural networks? Empirical evidence from the NN3 competition on time series prediction , 2011 .

[63]  Siem Jan Koopman,et al.  Time Series Analysis by State Space Methods , 2001 .

[64]  Herbert Jaeger,et al.  A tutorial on training recurrent neural networks , covering BPPT , RTRL , EKF and the " echo state network " approach - Semantic Scholar , 2005 .

[65]  Rajesh P. N. Rao,et al.  Predictive learning of temporal sequences in recurrent neocortical circuits. , 2001, Novartis Foundation symposium.

[66]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[67]  V. Mountcastle The columnar organization of the neocortex. , 1997, Brain : a journal of neurology.

[68]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[69]  M. Hallett,et al.  Activation of the primary visual cortex by Braille reading in blind subjects , 1996, Nature.

[70]  J. Schiller,et al.  Active properties of neocortical pyramidal neuron dendrites. , 2013, Annual review of neuroscience.

[71]  Yoram Singer,et al.  The Hierarchical Hidden Markov Model: Analysis and Applications , 1998, Machine Learning.

[72]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[73]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[74]  Y. Dan,et al.  Activity Recall in Visual Cortical Ensemble , 2012, Nature Neuroscience.

[75]  J. van Leeuwen,et al.  Sequence Learning , 2001, Lecture Notes in Computer Science.

[76]  Scott Purdy Encoding Data for HTM Systems , 2016, ArXiv.

[77]  Zachary Chase Lipton A Critical Review of Recurrent Neural Networks for Sequence Learning , 2015, ArXiv.

[78]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[79]  George Athanasopoulos,et al.  Forecasting: principles and practice , 2013 .

[80]  D. Buonomano,et al.  The neural basis of temporal processing. , 2004, Annual review of neuroscience.

[81]  Jing Peng,et al.  An Efficient Gradient-Based Algorithm for On-Line Training of Recurrent Network Trajectories , 1990, Neural Computation.

[82]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[83]  Thomas G. Dietterich Machine Learning for Sequential Data: A Review , 2002, SSPR/SPR.

[84]  Dhananjay S. Phatak,et al.  Investigating the Fault Tolerance of Neural Networks , 2005, Neural Computation.