Vocabulary independent speech recognition using particles

A method is presented for performing speech recognition that is not dependent on a fixed word vocabulary. Particles are used as the recognition units in a speech recognition system which permits word-vocabulary independent speech decoding. A particle represents a concatenated phone sequence. Each string of particles that represents a word in the one-best hypothesis from the particle speech recognizer is expanded into a list of phonetically similar word candidates using a phone confusion matrix. The resulting word graph is then re-decoded using a word language model to produce the final word hypothesis. Preliminary results on the DARPA HUB4 97 and 98 evaluation sets using word bigram redecoding of the particle hypothesis show a WER of between 2.2% and 2.9% higher than using a word bigram speech recognizer of comparable complexity. The method has potential applications in spoken document retrieval for recovering out-of-vocabulary words and also in client-server based speech recognition.

[1]  Kenney Ng Information fusion for spoken document retrieval , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Peter Schäuble,et al.  A system for retrieving speech documents , 1992, SIGIR '92.

[3]  Ernest Pusateri,et al.  N-best list generation using word and phoneme recognition fusion , 2001, INTERSPEECH.

[4]  Hsin-Min Wang,et al.  Multi-scale-audio indexing for translingual spoken document retrieval , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Philip C. Woodland,et al.  Particle-based language modelling , 2000, INTERSPEECH.

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Marcello Federico,et al.  A two-stage speech recognition method for information retrieval applications , 1999, EUROSPEECH.