Actions as contexts

In artificial intelligence, many tasks of speech recognition, video analysis, and language processing involve temporal processing where the outputs depend on not only spatial contents of the current sensory input frame, but also the relevant context in the attended past. It is illusive how brains use temporal contexts. Many computer methods, such as Hidden Markov chains and recurrent neural networks, require the human programmer to handcraft contexts as symbolic contexts. It has been proved that our Developmental Networks (DN) are capable of learning any emergent Turing Machine (TM), their states have been supervised by human teachers as patterns. This demands much effort from the human trainer. In this paper, we study how agent actions are natural sources of contexts. In humans, muscle actions correspond to the firings of muscle neurons. They are dense in time and correlated with the cognitive skills of the individual. Some actions are meant to handle time warping, while others are not (e.g., for time duration counting). We model actions as dense action patterns. We experimented with DN for recognition of audio sequences as an example of modality, but the principles are modality independent. Our experimental results showed how taking dense, frame-wise actions as contexts helps DN to generate temporal contexts. This work is a necessary step toward our goal to enable machines to autonomously generate contexts as actions through life-long development.

[1]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[2]  Yan Meng,et al.  Morphogenetic Self-Reconfiguration of Modular Robots , 2011, Bio-Inspired Self-Organizing Robotic Systems.

[3]  R. Sun,et al.  The interaction of the explicit and the implicit in skill learning: a dual-process approach. , 2005, Psychological review.

[4]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  Juyang Weng,et al.  Brain as an Emergent Finite Automaton: A Theory and Three Theorems , 2015 .

[6]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[7]  Juyang Weng,et al.  Where-What Network 3: Developmental top-down attention for multiple foregrounds and complex backgrounds , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[8]  Günther Palm,et al.  Effects of phase on the perception of intervocalic stop consonants , 1997, Speech Commun..

[9]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  Juyang Weng,et al.  Synapse maintenance in the Where-What Networks , 2011, The 2011 International Joint Conference on Neural Networks.

[12]  I. Elamvazuthi,et al.  Voice Recognition Algorithms using Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques , 2010, ArXiv.

[13]  A. Cook,et al.  Experimental evaluation of duration modelling techniques for automatic speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Juyang Weng,et al.  Challenges in visual parking and how a developmental network approaches the problem , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[15]  Yan Meng,et al.  Bio-Inspired Self-Organizing Robotic Systems , 2011, Bio-Inspired Self-Organizing Robotic Systems.

[16]  Douglas D. O'Shaughnessy,et al.  Speech Processing , 2018 .

[17]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Jennifer M. Burns,et al.  The development of diving behavior in juvenile Weddell seals: pushing physiological limits in order to survive , 1999 .

[19]  Juyang Weng,et al.  Spatio–Temporal Multimodal Developmental Learning , 2010, IEEE Transactions on Autonomous Mental Development.

[20]  Juyang Weng,et al.  Approaching real-world navigation using object recognition network , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[21]  D J DeRosier,et al.  Actin filaments, stereocilia, and hair cells: how cells count and measure. , 1992, Annual review of cell biology.

[22]  D. Fry Homo Loquens: Man as a Talking Animal , 1977 .