A method for noise-robust context-aware pattern discovery and recognition from categorical sequences

An efficient method for weakly supervised pattern discovery and recognition from discrete categorical sequences is introduced. The method utilizes two parallel sources of data: categorical sequences carrying some temporal or spatial information and a set of labeled, but not exactly aligned, contextual events related to the sequences. From these inputs the method builds associative models able to describe systematically co-occurring structures in the input streams. The learned models, based on transitional probabilities of events observed at several different time lags, inherently segment and classify novel sequences into contextual categories. Learning and recognition processes are purely incremental and computationally cheap, making the approach suitable for on-line learning tasks. The capabilities of the algorithm are demonstrated in a keyword learning task from continuous infant-directed speech and a continuous speech recognition task operating at varying noise levels.

[1]  Tetsunori Kobayashi,et al.  Partly hidden Markov model and its application to speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  K. P. Unnikrishnan,et al.  Speaker-Independent Digit Recognition Using a Neural Network with Time-Delayed Connections , 1992, Neural Computation.

[3]  Qi Li,et al.  Recognition of noisy speech using dynamic spectral subband centroids , 2004, IEEE Signal Processing Letters.

[4]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[5]  Pietro Perona,et al.  Weakly Supervised Scale-Invariant Learning of Models for Visual Recognition , 2007, International Journal of Computer Vision.

[6]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[7]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[8]  Fernando Pereira,et al.  Aggregate and mixed-order Markov models for statistical language processing , 1997, EMNLP.

[9]  Unto K. Laine,et al.  Computational language acquisition by statistical bottom-up processing , 2008, INTERSPEECH.

[10]  Toomas Altosaar,et al.  A Speech Corpus for Modeling Language Acquisition: CAREGIVER , 2010, LREC.

[11]  Unto K. Laine,et al.  STREP Thematic Priority : IST / FET Deliverable D 2 . 2 Methods for Enhanced Pattern Discovery in Speech Processing , 2009 .

[12]  Johan A. du Preez,et al.  Efficient backward decoding of high-order hidden Markov models , 2010, Pattern Recognit..

[13]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[14]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[15]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[16]  Oscar E. Agazzi,et al.  Hidden markov model based optical character recognition in the presence of deterministic transformations , 1993, Pattern Recognit..

[17]  Renato De Mori,et al.  High-performance connected digit recognition using maximum mutual information estimation , 1994, IEEE Trans. Speech Audio Process..

[18]  Kenny Smith,et al.  Cross-Situational Learning: A Mathematical Approach , 2006, EELC.

[19]  José A. R. Fonollosa,et al.  A N-gram approach to overcome time and parameter independence assumptions of HMM for speech recognition , 2007, 2007 15th European Signal Processing Conference.

[20]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[21]  Brian Scassellati,et al.  Robotic vocabulary building using extension inference and implicit contrast , 2009, Artificial Intelligence.

[22]  Daniel P. Huttenlocher,et al.  Weakly Supervised Learning of Part-Based Spatial Models for Visual Object Recognition , 2006, ECCV.

[23]  Wentian Li,et al.  Mutual Information Functions of Natural Language Texts , 1989 .

[24]  A. Raftery,et al.  The Mixture Transition Distribution Model for High-Order Markov Chains and Non-Gaussian Time Series , 2002 .

[25]  Yanjun Qi,et al.  Supervised semantic indexing , 2009, ECIR.

[26]  Lee-Min Lee,et al.  A Study on High-Order Hidden Markov Models and Applications to Speech Recognition , 2006, IEA/AIE.

[27]  P. Renevey Speech recognition in noisy conditions using missing feature approach , 2000 .

[28]  Wentian Li Mutual information functions versus correlation functions , 1990 .

[29]  A. Raftery A model for high-order Markov chains , 1985 .

[30]  A. Berchtold,et al.  Estimation of the Mixture Transition Distribution Model , 1999 .

[31]  Frank K. Soong,et al.  Hidden Markov models with divergence based vector quantized variances , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[32]  Hugo Van hamme,et al.  HAC-models: a novel approach to continuous speech recognition , 2008, INTERSPEECH.

[33]  Xuedong Huang,et al.  Unified techniques for vector quantization and hidden Markov modeling using semi-continuous models , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[34]  Yoram Singer,et al.  Adaptive Mixtures of Probabilistic Transducers , 1995, Neural Computation.

[35]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[36]  Linda B. Smith,et al.  Infants rapidly learn word-referent mappings via cross-situational statistics , 2008, Cognition.

[37]  A. Berchtold Modélisation autorégressive des chaînes de Markov : Utilisation d'une matrice différente pour chaque retard , 1996 .

[38]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[39]  Louis ten Bosch,et al.  ACORNS - towards computational modeling of communication and recognition skills , 2007, 6th IEEE International Conference on Cognitive Informatics.

[40]  Michael K. Ng,et al.  Higher‐order Markov chain models for categorical data sequences * , 2004 .

[41]  Steve J. Young,et al.  HMM-based architecture for face identification , 1994, Image Vis. Comput..

[42]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[43]  Gernot A. Fink,et al.  Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs , 2006, Pattern Recognit..

[44]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[45]  Gustavo Carneiro,et al.  Weakly Supervised Top-down Image Segmentation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[46]  R.M. Stern,et al.  Missing-feature approaches in speech recognition , 2005, IEEE Signal Processing Magazine.