Shared speech attribute augmentation for English-Tibetan cross-language phone recognition

There has been a challenging research topic on exploring an universal set of speech attributes sharing among a large number of languages for detection-based bottom-up cross-language speech recognition. In some recent research works, articulatory features are used as an universal set of speech attributes shared across many different languages. Since they are defined by human as a set of semantic articulatory descriptions of phones, these manually specified attributes suffer from the incomplete capturing articulation information of all languages and are not distinctive enough for accurate phoneme recognition for cross-language transfer. In this paper, we are solving the problem of a more complete set of articulatory features representation by sparse coding method. We learned the augmented articulatory attributes which sparsely represent more speech articulation information sharing between source and target language. The augmented attributes performed the better accuracy over semantic attributes in our experiments for English-Tibetan cross-language phone recognition.

[1]  Bo Xu,et al.  Chinese-English bilingual phone modeling for cross-language speech recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Elizabeth C. Botha,et al.  An acoustic distance measure for automatic cross-language phoneme mapping , 2001 .

[3]  Chin-Hui Lee,et al.  Toward a detector-based universal phone recognizer , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Joel A. Tropp,et al.  Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit , 2007, IEEE Transactions on Information Theory.

[5]  Chin-Hui Lee,et al.  Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Andreas Stolcke,et al.  Cross-Domain and Cross-Language Portability of Acoustic Features Estimated by Multilayer Perceptrons , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Xiaoyang Wang,et al.  Attribute Augmentation with Sparse Coding , 2014, 2014 22nd International Conference on Pattern Recognition.

[8]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[9]  Dau-Cheng Lyu,et al.  Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Roger K. Moore,et al.  Cross-Language Phone Recognition when the Target Language Phoneme Inventory is not Known , 2011, INTERSPEECH.

[11]  Yue Zhao,et al.  Cross-language speech attribute detection and phone recognition for Tibetan using deep learning , 2014, The 9th International Symposium on Chinese Spoken Language Processing.

[12]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[13]  Simon King,et al.  Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[14]  Chin-Hui Lee,et al.  A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition , 2009, Speech Commun..