Improving the Slovak LVCSR Performance by Cluster-Sensitive Acoustic Model Retraining

In this paper, we present a cluster-dependent adaptation approach for HMM-based acoustic models. The proposed approach employs clustering techniques to group the original training utterances into clusters with predefined number. The clustered speech data are intended to adapt an initially pre-trained acoustic model to the specific cluster by reestimation based on the standard Baum-Welch procedure. The resulting model, adapted to the homogeneous data may markedly improve the baseline recognition rate, whereas the model complexity may be reduced. In the recognition step, the test samples are scored by each adapted model and the most accurate one is chosen. The proposed approach is thoroughly evaluated in Slovak triphone-based large vocabulary continuous speech recognition (LVCSR) system. The results prove that the cluster-sensitive retraining leads to significant improvements over the baseline reference system trained according to the conventional training procedure.

[1]  P. Pollak,et al.  Knowledge-based and automated clustering in MLLR adaptation of acoustic models for LVCSR , 2012, 2012 International Conference on Applied Electronics.

[2]  Wu Chou,et al.  Decision tree state tying based on segmental clustering for acoustic modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  P.L. De Leon,et al.  Reducing Speaker Model Search Space in Speaker Identification , 2007, 2007 Biometrics Symposium.

[4]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[5]  Roland Kuhn,et al.  Eigenvoices for speaker adaptation , 1998, ICSLP.

[6]  Brian Everitt,et al.  Cluster analysis , 1974 .

[7]  R. Suganya,et al.  Fuzzy C- Means Algorithm- A Review , 2012 .

[8]  S. Shahnawazuddin,et al.  Fast on-line adaptation using KSVD based acoustic clustering , 2013, 2013 Annual IEEE India Conference (INDICON).

[9]  Sanghamitra Bandyopadhyay,et al.  Unsupervised Classification , 2013, Springer Berlin Heidelberg.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Xiaodong Cui,et al.  Clustering of bootstrapped acoustic model with full covariance , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  Jozef Juhar,et al.  Recent advances in the statistical modeling of the Slovak language , 2014, Proceedings ELMAR-2014.

[14]  Csaba Legány,et al.  Cluster validity measurement techniques , 2006 .

[15]  Hermann Ney,et al.  Cross-language bootstrapping for unsupervised acoustic model training: rapid development of a Polish speech recognition system , 2009, INTERSPEECH.

[16]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[17]  Tanja Schultz,et al.  Comparison of acoustic model adaptation techniques on non-native speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[18]  Milos Cernak,et al.  Effective Triphone Mapping for Acoustic Modeling in Speech Recognition , 2011, INTERSPEECH.

[19]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[20]  Pabitra Mitra,et al.  Application of triphone clustering in acoustic modeling for continuous speech recognition in Bengali , 2008, 2008 19th International Conference on Pattern Recognition.