Constrained MLE-based speaker adaptation with L1 regularization

Maximum a posterior (MAP) adaptation is one of the popular and powerful methods for obtaining a speaker-specific acoustic model. Basically, MAP adaptation needs a data storage for speaker adaptive (SA) model as much as speaker independent (SI) model needs. Modern speech recognition systems have a huge number of parameters and deal with millions of users. To reduce the data storage for SA models, in this paper, we propose a constrained maximum likelihood estimation-based speaker adaptation with L1 regularization. By the proposed method, we can more efficiently perform the model adjustments for SA models without almost any loss of phone recognition performance than the conventional sparse MAP adaptation method.

[1]  Jing Huang,et al.  Affine invariant sparse maximum a posteriori adaptation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Jing Huang,et al.  Sparse Maximum A Posteriori adaptation , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[3]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[5]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[6]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[7]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[8]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[9]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10]  Douglas A. Reynolds Gaussian Mixture Models , 2009, Encyclopedia of Biometrics.

[11]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[12]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.