A Hidden Conditional Random Field-Based Approach for Thai Tone Classification

In Thai, tonal information is a crucial component for identifying the lexical meaning of a word. Consequently, Thai tone classification can obviously improve performance of Thai speech recognition system. In this article, we therefore reported our study of Thai tone classification. Based on our investigation, most of Thai tone classification studies relied on statistical machine learning approaches, especially the Artificial Neural Network (ANN)-based approach and the Hidden Markov Model (HMM)-based approach. Although both approaches gave reasonable performances, they had some limitations due to their mathematical models. We therefore introduced a novel approach for Thai tone classification using a Hidden Conditional Random Field (HCRF)- based approach. In our study, we also investigated tone configurations involving tone features, frequency scaling and normalization techniques in order to fine-tune performances of Thai tone classification. Experiments were conducted in both isolated word scenario and continuous speech scenario. Results showed that the HCRF-based approach with the feature F_dF_aF, ERB-rate scaling and a z-score normalization technique yielded the highest performance and outperformed a baseline using the ANN- based approach, which had been reported as the best for the Thai tone classification, in both scenarios. The best performance of HCRF-based approach provided the error rate reduction of 10.58% and 12.02% for isolated word scenario and continuous speech scenario respectively when comparing with the best result of baselines.

[1]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[2]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[3]  Jiatang Dong,et al.  A comparative study of the classification techniques in isolated Mandarin syllable tone recognition , 2011, ACM-SE '11.

[4]  C. Wutiwiwatchai,et al.  Thai ASR development for network-based speech translation , 2012, 2012 International Conference on Speech Database and Assessments.

[5]  Virach Sornlertlamvanich,et al.  Thai Speech Corpus for Speech Recognition , 2003 .

[6]  A. Tungthangthum Tone recognition for Thai , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[7]  Ye Tian,et al.  Tone recognition with fractionized models and outlined features , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Hao Wu,et al.  Exploiting prosodic and lexical features for tone modeling in a conditional random field framework , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Sadaoki Furui,et al.  Thai speech processing technology: A review , 2007, Speech Commun..

[10]  Sudaporn Luksaneeyanawin,et al.  Intonation in Thai. , 1983 .

[11]  Trevor Darrell,et al.  Hidden-state Conditional Random Fields , 2006 .

[12]  Pichaya Tandayya,et al.  A Study of Thai Tone Classification , 2005 .

[13]  Hong Quang Nguyen,et al.  Tone recognition of Vietnamese continuous speech using hidden Markov model , 2008, 2008 Second International Conference on Communications and Electronics.

[14]  Boonserm Kijsirikul,et al.  A Method for Isolated Thai Tone Recognition Using a Combination of Neural Networks , 2002, Comput. Intell..

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Jian Pei,et al.  A brief survey on sequence classification , 2010, SKDD.

[17]  Alex Acero,et al.  Hidden conditional random fields for phone classification , 2005, INTERSPEECH.

[18]  Natthawut Kertkeidkachorn,et al.  Contribution of Spectral Shapes to Tone Perception , 2012, INTERSPEECH.

[19]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[20]  Daniel Jurafsky,et al.  Hidden Conditional Random Fields for phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[21]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[22]  Hong Quang Nguyen,et al.  Using tone information for Vietnamese continuous speech recognition , 2008, 2008 IEEE International Conference on Research, Innovation and Vision for the Future in Computing and Communication Technologies.

[23]  Darryl Stewart,et al.  Hidden Conditional Random Fields for Visual Speech Recognition , 2009, 2009 13th International Machine Vision and Image Processing Conference.

[24]  Zhou Ning,et al.  Mandarin Chinese Tone Recognition with an Artificial Neural Network , 2006 .

[25]  Fran H. L. Jian Classification of taiwanese tones based on pitch and energy movements , 1998, ICSLP.