Acoustic Features for Hidden Conditional Random Fields--Based Thai Tone Classification

In the Thai language, tone information is necessary for Thai speech recognition systems. Previous studies show that many acoustic cues are attributed to shapes of tones. Nevertheless, most Thai tone classification studies mainly adopted F0 values and their derivatives without considering other acoustic features. In this article, other acoustic features for Thai tone classification are investigated. In the experiment, energy values and spectral information represented by three spectral-based features including the LPC-based feature, PLP-based feature, and MFCC-based feature are applied to the HCRF-based Thai tone classification, which was reported as the best approach for Thai tone classification. The energy values provide an error rate reduction of 22.40% in the isolated word scenario, while there are slight improvements in the continuous speech scenario. On the contrary, spectral-based features greatly contribute to Thai tone classification in the continuous-speech scenario, whereas spectral-based features slightly degrade performances in the isolated-word scenario. The best achievement in the continuous-speech scenario is obtained from the PLP-based feature, which yields an error rate reduction of 13.90%. Therefore, findings in this article are that energy values and spectral-based features, especially the PLP-based feature, are the main contributors to the improvement of the performances of Thai tone classification in the isolated-word scenario and the continuous-speech scenario, respectively.

[1]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[2]  Alexis Michaud,et al.  The interplay of intonation and complex lexical tones: how speaker attitudes affect the realization of glottalization on vietnamese sentence-final particles , 2013, INTERSPEECH.

[3]  Heming Zhao,et al.  Relationship between fundamental and formant frequency in whispered Mandarin , 2008, 2008 International Conference on Audio, Language and Image Processing.

[4]  Daniel Jurafsky,et al.  Hidden Conditional Random Fields for phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[5]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[6]  Natthawut Kertkeidkachorn,et al.  A Hidden Conditional Random Field-Based Approach for Thai Tone Classification , 2014 .

[7]  Sadaoki Furui,et al.  Thai speech processing technology: A review , 2007, Speech Commun..

[8]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[9]  A. Tungthangthum Tone recognition for Thai , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[10]  Virach Sornlertlamvanich,et al.  Thai Speech Corpus for Speech Recognition , 2003 .

[11]  Hao Wu,et al.  Exploiting prosodic and lexical features for tone modeling in a conditional random field framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  A. Samuel,et al.  Perception of Mandarin Lexical Tones when F0 Information is Neutralized , 2004, Language and speech.

[13]  Sudaporn Luksaneeyanawin,et al.  Intonation in Thai. , 1983 .

[14]  Natthawut Kertkeidkachorn,et al.  Contribution of Spectral Shapes to Tone Perception , 2012, INTERSPEECH.

[15]  S. Vorapatratorn,et al.  The CU-MFEC corpus for Thai and english spelling speech recognition , 2012, 2012 International Conference on Speech Database and Assessments.

[16]  Heming Zhao,et al.  Acoustic analyses of whispered mandarin , 2010, 2010 3rd International Congress on Image and Signal Processing.

[17]  Zhou Ning,et al.  Mandarin Chinese Tone Recognition with an Artificial Neural Network , 2006 .

[18]  Jody Kreiman,et al.  Voice quality and tone identification in White Hmong. , 2013, The Journal of the Acoustical Society of America.

[19]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[20]  Kristine M. Yu,et al.  The Role of Creaky Voice in Cantonese Tonal Perception , 2014, ICPhS.

[21]  Ye Tian,et al.  Tone recognition with fractionized models and outlined features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[23]  Fran H. L. Jian Classification of taiwanese tones based on pitch and energy movements , 1998, ICSLP.

[24]  Hong Quang Nguyen,et al.  Tone recognition of Vietnamese continuous speech using hidden Markov model , 2008, 2008 Second International Conference on Communications and Electronics.

[25]  Boonserm Kijsirikul,et al.  A Method for Isolated Thai Tone Recognition Using a Combination of Neural Networks , 2002, Comput. Intell..

[26]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[27]  Tan Li,et al.  A study of tone classification for continuous Thai speech recognition , 2004, INTERSPEECH.

[28]  Jiatang Dong,et al.  A comparative study of the classification techniques in isolated Mandarin syllable tone recognition , 2011, ACM-SE '11.

[29]  Natthawut Kertkeidkachorn,et al.  Using Tone Information in Thai Spelling Speech Recognition , 2014, PACLIC.

[30]  Hong Quang Nguyen,et al.  Using tone information for Vietnamese continuous speech recognition , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[31]  A. D. Dominicis,et al.  Intonation Systems: A Survey of Twenty Languages , 1999 .

[32]  C. Wutiwiwatchai,et al.  Thai ASR development for network-based speech translation , 2012, 2012 International Conference on Speech Database and Assessments.