SELECTION FOR NOISE ROBUST EXEMPLAR MATCHING

Exemplar-based acoustic modeling is based on labeled training segments that are compared with the unseen test utterances with respect to a dissimilarity measure. Using a larger number of accurately labeled exemplars provides better generalization thus improved recognition performance which comes with increased computation and memory requirements. We have recently developed a noise robust exemplar matching-based automatic speech recognition system which uses a large number of undercomplete dictionaries containing speech exemplars of the same length and label to recognize noisy speech. In this work, we investigate several speech exemplar selection techniques proposed for undercomplete speech dictionaries to find a trade-off between the recognition accuracy and the acoustic model size in terms of the amount of speech exemplars used for recognition. The exemplar selection criterion has be to chosen carefully as the amount of redundancy in these dictionaries is very limited compared to overcomplete dictionaries containing plenty of exemplars. The recognition accuracies obtained on the small vocabulary track of the 2 CHiME Challenge and the AURORA-2 database using the complete and pruned dictionaries are compared to investigate the performance of each selection criterion.

[1]  Hiroaki Sakoe,et al.  A Dynamic Programming Approach to Continuous Speech Recognition , 1971 .

[2]  Xerox Corpora,et al.  Speech Recognition Experiments with Linear Predication, Bandpass Filtering, and Dynamic Programming , 1975 .

[3]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[4]  Hermann Ney,et al.  The use of a one-stage dynamic programming algorithm for connected word recognition , 1984 .

[5]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[6]  S. Goreinov,et al.  A Theory of Pseudoskeleton Approximations , 1997 .

[7]  Alan M. Frieze,et al.  Fast Monte-Carlo algorithms for finding low-rank approximations , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[8]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[9]  Anthony K. H. Tung,et al.  Spatial clustering methods in data mining : A survey , 2001 .

[10]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[11]  Dilek Z. Hakkani-Tür,et al.  Active learning for automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Gerard G. L. Meyer,et al.  Selective sampling of training data for speech recognition , 2002 .

[13]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[14]  Lou Boves,et al.  In search of optimal data selection for training of automatic speech recognition systems , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[15]  Andrzej Cichocki,et al.  Csiszár's Divergences for Non-negative Matrix Factorization: Family of New Algorithms , 2006, ICA.

[16]  Georg Heigold,et al.  Speech recognition with state-based nearest neighbour classifiers , 2007, INTERSPEECH.

[17]  Rong Zhang,et al.  Data selection for speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[18]  Petros Drineas,et al.  CUR matrix decompositions for improved data analysis , 2009, Proceedings of the National Academy of Sciences.

[19]  Douglas D. O'Shaughnessy,et al.  Context-independent phoneme recognition using a K-Nearest Neighbour classification approach , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Tara N. Sainath,et al.  Sparse representation features for speech recognition , 2010, INTERSPEECH.

[21]  Dirk Van Compernolle,et al.  Data pruning for template-based automatic speech recognition , 2010, INTERSPEECH.

[22]  Yunxin Zhao,et al.  New Methods for Template Selection and Compression in Continuous Speech Recognition , 2011, INTERSPEECH.

[23]  Sergio Cruces,et al.  Generalized Alpha-Beta Divergences and Their Application to Robust Nonnegative Matrix Factorization , 2011, Entropy.

[24]  Tuomas Virtanen,et al.  Non-negative matrix deconvolution in noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[26]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Tara N. Sainath,et al.  Exemplar-Based Processing for Speech Recognition: An Overview , 2012, IEEE Signal Processing Magazine.

[28]  Jerome R. Bellegarda,et al.  Latent perceptual mapping with data-driven variable-length acoustic units for template-based speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Mitch Weintraub,et al.  Investigations on exemplar-based features for speech recognition towards thousands of hours of unsupervised, noisy data , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Shrikanth S. Narayanan,et al.  Novel Variations of Group Sparse Regularization Techniques With Applications to Noise Robust Automatic Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  Hugo Van hamme,et al.  Exemplar selection techniques for sparse representations of speech using multiple dictionaries , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[33]  Tuomas Virtanen,et al.  Modelling non-stationary noise with spectral factorisation in automatic speech recognition , 2013, Comput. Speech Lang..

[34]  Hugo Van hamme,et al.  Noise-robust automatic speech recognition with exemplar-based sparse representations using multiple length adaptive dictionaries , 2013 .

[35]  Lale Akarun,et al.  Randomized Matrix Decompositions and Exemplar Selection in Large Dictionaries for Polyphonic Piano Transcription , 2014 .

[36]  Yunxin Zhao,et al.  Integrated exemplar-based template matching and statistical modeling for continuous speech recognition , 2014, EURASIP J. Audio Speech Music. Process..

[37]  Hugo Van hamme,et al.  Noise Robust Exemplar Matching Using Sparse Representations of Speech , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[38]  Hugo Van hamme,et al.  Noise robust exemplar matching with alpha-beta divergence , 2016, Speech Commun..