Pattern discovery in continuous speech using Block Diagonal Infinite HMM

We propose the application of a recently introduced inference method, the Block Diagonal Infinite Hidden Markov Model (BDiHMM), to the problem of learning the topology of a Hidden Markov Model (HMM) from continuous speech in an unsupervised way. We test the method on the TiDigits continuous digit database and analyse the emerging patterns corresponding to the blocks of states inferred by the model. We show how the complexity of these patterns increases with the amount of observations and number of speakers. We also show that the patterns correspond to sub-word units that constitute stable and discriminative representations of the words contained in the speech material.

[1]  Barak A. Pearlmutter,et al.  Convolutive Non-Negative Matrix Factorisation with a Sparseness Constraint , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[2]  Carl E. Rasmussen,et al.  Factorial Hidden Markov Models , 1997 .

[3]  Cosma Rohilla Shalizi,et al.  Blind Construction of Optimal Nonlinear Recursive Predictors for Discrete Sequences , 2004, UAI.

[4]  Chen Yu,et al.  Grounding word learning in multimodal sensorimotor interaction , 2008 .

[5]  Alexandre Bernardino,et al.  Language Bootstrapping: Learning Word Meanings From Perception–Action Association , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Louis ten Bosch,et al.  Discovering an optimal set of minimally contrasting acoustic speech units: a point of focus for whole-word pattern matching , 2010, INTERSPEECH.

[7]  Yee Whye Teh,et al.  Beam sampling for the infinite hidden Markov model , 2008, ICML '08.

[8]  O. Räsänen A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events , 2011, Cognition.

[9]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[10]  Alexander I. Rudnicky,et al.  OOV Word Detection using Hybrid Models with Mixed Types of Fragments , 2012, INTERSPEECH.

[11]  Tai Sing Lee,et al.  Toward versatile structural modification for bayesian nonparametric time series models , 2010 .

[12]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[13]  Tai Sing Lee,et al.  The Block Diagonal Infinite Hidden Markov Model , 2009, AISTATS.

[14]  Kuldip K. Paliwal,et al.  An improved sub-word based speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[15]  Gustav Eje Henter,et al.  Picking up the pieces: Causal states in noisy data, and how to recover them , 2013, Pattern Recognit. Lett..

[16]  Hugo Van hamme,et al.  Discovering Phone Patterns in Spoken Utterances by Non-Negative Matrix Factorization , 2008, IEEE Signal Processing Letters.

[17]  James R. Glass,et al.  Unsupervised Pattern Discovery in Speech , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Stephanie Seneff,et al.  Subword-based automatic lexicon learning for Speech Recognition , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[19]  R. G. Leonard,et al.  A database for speaker-independent digit recognition , 1984, ICASSP.

[20]  David A. van Leeuwen,et al.  Unsupervised acoustic sub-word unit detection for query-by-example spoken term detection , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[22]  Kuldip K. Paliwal,et al.  Design of a speech recognition system based on acoustically derived segmental units , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[23]  Giampiero Salvi,et al.  Word Discovery with Beta Process Factor Analysis , 2012, INTERSPEECH.

[24]  Louis ten Bosch,et al.  Adaptive non-negative matrix factorization in a computational model of language acquisition , 2009, INTERSPEECH.