Extending the bioinspired hierarchical temporal memory paradigm for sign language recognition

Sign language recognition, SLR, using spatial positions and arrangements of the hands over time is a challenging multi-variable time series recognition problem with several potential applications. Here we explore, for SLR purposes, a hierarchically connected network of nodes based on a Bayesian-like paradigm known as hierarchical temporal memory, HTM, that models neocortical principles of organization and information coding. HTM is a broad paradigm for pattern recognition, control, attention and forward prediction that exploits the hierarchy in time and space existing in the physical world during both learning and inference. In this work we focus on HTM capabilities for pattern recognition. We extend the traditional HTM paradigm with an original top node in order to improve HTMs performance in problems where instances unfold over time. The extended top node stores and compares sequences of spatio-temporally codified inputs to handle the temporal evolution of instances in sign language. Sequence comparison is carried out using the Needleman-Wunsch algorithm for sequence alignment that employs dynamic programming. We compare the performance of the extended HTM with traditional HTMs and machine learning algorithms routinely used in the literature for SLR. The extended HTM improves performance of traditional HTM for SLR, reaching 91% recognition accuracy for a data set of 95 categories of Australian sign language. When sufficient training instances are available, the extended HTM matches or outperforms state of the art methods for SLR such as Hidden Markov Models or Metafeatures T-Classes without the usage of a language model, nor pre-processing of sensor data. The extended HTM employs relatively small feature vectors in comparison to methods in the literature. Our method learns the spatio-temporal data structures and transitions that occur in the data without depending on manually predefined features to be searched for and works well in real time. These results suggest that the extended HTM approach is a valid bioinspired alternative to existing SLR engines and that it can be successfully applied to other machine learning tasks whose input instances also unfold over time.

[1]  Wen Gao,et al.  Adaptive Sign Language Recognition With Exemplar Extraction and MAP/IVFS , 2010, IEEE Signal Processing Letters.

[2]  Ming Ouhyoung,et al.  A real-time continuous gesture recognition system for sign language , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  Peter Vamplew Recognition of sign language gestures using neural networks , 1996 .

[4]  Ming C. Leu,et al.  Linguistic properties based on American Sign Language isolated word recognition with artificial neural networks using a sensory glove and motion tracker , 2007, Neurocomputing.

[5]  S. Mitra,et al.  Gesture Recognition: A Survey , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Khaled Assaleh,et al.  Feature modeling using polynomial classifiers and stepwise regression , 2010, Neurocomputing.

[7]  Mohamed A. Deriche,et al.  Arabic Sign Language Recognition an Image-Based Approach , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[8]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[9]  Robyn A. Owens,et al.  Australian sign language recognition , 2005, Machine Vision and Applications.

[10]  Lalit Gupta,et al.  Gesture-based interaction and communication: automated classification of hand gesture contours , 2001, IEEE Trans. Syst. Man Cybern. Syst..

[11]  Tomasz Kapuscinski,et al.  Using Hierarchical Temporal Memory for Recognition of Signed Polish Words , 2009, Computer Recognition Systems 3.

[12]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Ehud Rivlin,et al.  Understanding Video Events: A Survey of Methods for Automatic Interpretation of Semantic Occurrences in Video , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[14]  Thierry Vi'eville,et al.  A deterministic biologically plausible classifier , 2004, Neurocomputing.

[15]  Francisco B. Rodríguez,et al.  Optimizing Hierarchical Temporal Memory for Multivariable Time Series , 2010, ICANN.

[16]  Robert Giegerich,et al.  A systematic approach to dynamic programming in bioinformatics , 2000, Bioinform..

[17]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[18]  Khaled Assaleh,et al.  Video-based signer-independent Arabic sign language recognition using hidden Markov models , 2009, Appl. Soft Comput..

[19]  I. Infantino,et al.  A System for Sign Language Sentence Recognition Based on Common Sense Context , 2005, EUROCON 2005 - The International Conference on "Computer as a Tool".

[20]  A. Selverston,et al.  Dynamical principles in neuroscience , 2006 .

[21]  Mohammed Waleed Kadous,et al.  Temporal classification: extending the classification paradigm to multivariate time series , 2002 .

[22]  Wen Gao,et al.  A SRN/HMM system for signer-independent continuous sign language recognition , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[23]  Ramón Huerta,et al.  Analysis of perfect mappings of the stimuli through neural temporal sequences , 2004, Neural Networks.

[24]  Dileep George,et al.  Towards a Mathematical Theory of Cortical Micro-circuits , 2009, PLoS Comput. Biol..

[25]  D. George,et al.  A hierarchical Bayesian model of invariant pattern recognition in the visual cortex , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[26]  Paolo Dario,et al.  A Survey of Glove-Based Systems and Their Applications , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Ali Karami,et al.  Persian sign language (PSL) recognition using wavelet transform and neural networks , 2011, Expert Syst. Appl..