Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields

Exemplar based recognition systems are characterized by the fact that, instead of abstracting large amounts of data into compact models, they store the observed data enriched with some annotations and infer on-the-fly from the data by finding those exemplars that resemble the input speech best. One advantage of exemplar based systems is that next to deriving what the current phone or word is, one can easily derive a wealth of meta-information concerning the chunk of audio under investigation. In this work we harvest meta-information from the set of best matching exemplars, that is thought to be relevant for the recognition such as word boundary predictions and speaker entropy. Integrating this meta-information into the recognition framework using segmental conditional random fields, reduced the WER of the exemplar based system on the WSJ Nov92 20k task from 8.2% to 7.6%. Adding the HMM-score and multiple HMM phone detectors as features further reduced the error rate to 6.6%.

[1]  V. Ramasubramanian,et al.  Acoustic modeling by phoneme templates and modified one-pass DP decoding for continuous speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Dirk Van Compernolle,et al.  HEAR: an hybrid episodic-abstract speech recognizer , 2009, INTERSPEECH.

[3]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[4]  Douglas D. O'Shaughnessy,et al.  Phoneme classification and lattice rescoring based on a k-NN approach , 2010, INTERSPEECH.

[5]  Roger K. Moore,et al.  Temporal episodic memory model: an evolution of minerva2 , 2007, INTERSPEECH.

[6]  Geoffrey Zweig,et al.  SCARF: a segmental conditional random field toolkit for speech recognition , 2010, INTERSPEECH.

[7]  Dirk Van Compernolle,et al.  Outlier Correction for Local Distance Measures in Example Based Speech Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[8]  Hugo Van hamme,et al.  Progress in example based automatic speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Patrick Wambacq,et al.  SPRAAK: an open source "SPeech recognition and automatic annotation kit" , 2008, INTERSPEECH.

[10]  Tara N. Sainath,et al.  An analysis of sparseness and regularization in exemplar-based methods for speech classification , 2010, INTERSPEECH.

[11]  Geoffrey Zweig,et al.  A segmental CRF approach to large vocabulary continuous speech recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[12]  Yunxin Zhao,et al.  Integrate template matching and statistical modeling for speech recognition , 2010, INTERSPEECH.

[13]  Patrick Wambacq,et al.  Evaluating acoustic distance measures for template based recognition , 2007, INTERSPEECH.

[14]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.