Phone Set Generation Based on Acoustic and Contextual Analysis for Multilingual Speech Recognition

This study presents a novel approach to generating phone units generation for the recognition of multilingual speech. Acoustic and contextual analysis is performed to characterize multilingual phonetic units for phone set generation. A confusion matrix combining acoustic and contextual similarities between every two phonetic units is constructed for phonetic unit clustering. Acoustic likelihood and hyperspace analog to language (HAL) model are adopted for acoustic similarity and contextual similarity estimation of phone models, respectively. Experiments show that the generated phone set provides a compact and robust set that considers acoustic and contextual information for multilingual speech recognition.

[1]  Herbert Allen Giles,et al.  A Chinese–English Dictionary , 1892 .

[2]  Chung-Hsien Wu,et al.  Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  John C. Wells,et al.  Computer-coded Phonemic Notation of Individual Languages of the European Community , 1989, Journal of the International Phonetic Association.

[4]  S. J. Young,et al.  Tree-based state tying for high accuracy acoustic modelling , 1994 .

[5]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[6]  A. Waibel,et al.  Multilingual Speech Recognition , 1997 .

[7]  Jiri Matas,et al.  On Combining Classifiers , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Peter Bruza,et al.  Towards context sensitive information inference , 2003, J. Assoc. Inf. Sci. Technol..

[9]  Wolfgang Wahlster,et al.  Verbmobil: Foundations of Speech-to-Speech Translation , 2000, Artificial Intelligence.

[10]  Hermann Ney,et al.  Progress in dynamic programming search for LVCSR , 2000 .

[11]  Lawrence R. Rabiner,et al.  A modified K-means clustering algorithm for use in isolated work recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[12]  Etienne Barnard,et al.  Phone clustering using the Bhattacharyya distance , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Robert Henry Mathews,et al.  Mathews' Chinese–English Dictionary , 1931 .

[14]  Hagai Aronowitz,et al.  A distance measure between GMMs based on the unscented transform and its application to speaker recognition , 2005, INTERSPEECH.