Classification margin for improved class-based speech recognition performance

This paper investigates class-based speech recognition, and more precisely the impact of the selection of the training samples for each class on the final speech recognition performance. Increasing the number of recognition classes should lead to more specific models, and thus to better recognition performance, providing the trained model parameters are reliable. However, when the number of classes increases, the amount of training data for each class gets smaller, and may lead to unreliable parameters. The experiments described in the paper show that taking into account a classification margin tolerance helps associating more training data to each class, and improves the overall speech recognition performance.

[1]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..

[2]  Denis Jouvet,et al.  About Handling Boundary Uncertainty in a Speaking Rate Dependent Modeling Approach , 2011, INTERSPEECH.

[3]  Irina Illina,et al.  Hidden factor dynamic Bayesian networks for speech recognition , 2004, INTERSPEECH.

[4]  Hervé Bourlard,et al.  Speech recognition with auxiliary information , 2004, IEEE Transactions on Speech and Audio Processing.

[5]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[6]  Mark J. F. Gales Cluster adaptive training for speech recognition , 1998, ICSLP.

[7]  Roland Kuhn,et al.  Eigenvoices for speaker adaptation , 1998, ICSLP.

[8]  Frédéric Bimbot,et al.  Rapid speaker adaptation by reference model interpolation , 2007, INTERSPEECH.

[9]  Sacha Krstulovic,et al.  Selecting Representative Speakers for a Speech Database on the Basis of Heterogeneous Similarity Criteria , 2007, Speaker Classification.

[10]  Geoffrey Zweig,et al.  Speech Recognition with Dynamic Bayesian Networks , 1998, AAAI/IAAI.

[11]  Denis Jouvet,et al.  Modeling inter-speaker variability in speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[13]  Paul Deléglise,et al.  Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate? , 2009, INTERSPEECH.