Using k-Nearest Neighbor and Speaker Ranking for Phoneme Prediction

Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability of large amount of data and cheap computing resources (processing power and memory) non-parametric based approach to solve speech recognition and classification task is becoming popular and feasible. The key advantage of non-parametric based approach is that all the information from the training data is retained as we don't approximate our data with specific statistical models resulting in more speaker specific information. In this paper we propose a k-nearest neighbor (k-NN) phoneme prediction scheme using speaker ranking vector. Speaker ranking vector is calculated by finding the similarity of the given TEST speaker with the Instance Space using k-NN. The results were compared with nearest neighbor and k-NN majority voting approach. Our proposed scheme gives a better prediction accuracy as compare with nearest neighbor and k-NN majority voting scheme. This approach can help speech recognizer to customize on the fly for a given talker and customize training data on the basis of similarity measure. In this preliminary research we are using a small amount of data to train our phoneme prediction classifier engine. Performance can be further increase by increasing the training data for finding speaker ranking.

[1]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Nitin Indurkhya,et al.  Handbook of Natural Language Processing , 2010 .

[3]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.

[4]  Jochen L. Leidner Handbook of Natural Language Processing (second edition) Nitin Indurkhya and Fred J. Damerau (editors) (University of New South Wales; IBM Thomas J. Watson Research Center)Boca Raton, FL: CRC Press, 2010, xxxiii+678 pp; hardbound, ISBN 978-1-4200-8592-1, $99.95 , 2011, Computational Linguistics.

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  Patrick Wambacq,et al.  Data driven example based continuous speech recognition , 2003, INTERSPEECH.

[7]  Richard M. Stern,et al.  Delta-spectral cepstral coefficients for robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Douglas D. O'Shaughnessy,et al.  Phoneme classification and lattice rescoring based on a k-NN approach , 2010, INTERSPEECH.

[9]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[10]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[11]  Douglas D. O'Shaughnessy,et al.  Context-independent phoneme recognition using a K-Nearest Neighbour classification approach , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Li Deng,et al.  Structure-based and template-based automatic speech recognition - comparing parametric and non-parametric approaches , 2007, INTERSPEECH.

[13]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[15]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  Jithendra Vepa,et al.  Improving speech recognition using a data-driven approach , 2005, INTERSPEECH.

[17]  David G. Stork,et al.  Pattern Classification , 1973 .