Tone classification of syllable-segmented Thai speech based on multilayer perceptron

Thai is a monosyllabic, tonal language that makes use of tone to convey lexical information about the meaning of a syllable. Thai has five distinctive tones, and each tone is well represented by a single F0 contour pattern. In general, a Thai syllable with a different tone has a different lexical meaning. Thus, to completely recognize a spoken Thai syllable, a speech recognition system has to not only recognize a base syllable but also to correctly identify a tone. Hence, tone classification of Thai speech is an essential part of a Thai speech recognition system. In this study, a tone classification of syllable-segmented Thai speech, which incorporates the effects of tonal coarticulation, stress and intonation, was developed Automatic syllable segmentation, which performs segmentation on the training and test utterances into syllable units, was also developed. The acoustical features, including fundamental frequency (F0), duration, and energy extracted from the processing syllable and neighboring syllables, were used as the main discriminating features. A multilayer perceptron (MLP) trained by a backpropagation method was employed to classify these features. The proposed system was evaluated on 920 test utterances spoken by five male and three female native Thai speakers who also uttered the training speech. The proposed system achieved an average accuracy rate of 91.36%.

[1]  X. Shen,et al.  Tonal coarticulation in Mandarin , 1990 .

[2]  Arthur S. Abramson,et al.  Distinctive vowel length: duration vs. spectrum in Thai , 1990 .

[3]  George Zavaliagkos,et al.  A hybrid segmental neural net/hidden Markov model system for continuous speech recognition , 1994, IEEE Trans. Speech Audio Process..

[4]  M. Sondhi,et al.  New methods of pitch extraction , 1968 .

[5]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  C Nieuwoudt,et al.  Connected digit recognition in Afrikaans using hidden Markov models , 1999 .

[7]  M. Harper,et al.  Contextual Variations in Trisyllabic Sequences of Thai Tones , 1997, Phonetica.

[8]  Douglas D. O'Shaughnessy,et al.  Speech communication : human and machine , 1987 .

[9]  Xavier L. Aubert,et al.  Combining TDNN and HMM in a hybrid system for improved continuous-speech recognition , 1994, IEEE Trans. Speech Audio Process..

[10]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11]  Yuan-Fu Liao,et al.  Speech recognition with hierarchical recurrent neural networks , 1995, Pattern Recognit..

[12]  Keechul Jung,et al.  Korean speech vector quantization using a continuous hidden Markov model , 1997, TENCON '97 Brisbane - Australia. Proceedings of IEEE TENCON '97. IEEE Region 10 Annual Conference. Speech and Image Technologies for Computing and Telecommunications (Cat. No.97CH36162).

[13]  Mary P. Harper,et al.  Classification of Thai tone sequences in syllable-segmented speech using the analysis-by-synthesis method , 1999, IEEE Trans. Speech Audio Process..

[14]  Anthony J. Robinson,et al.  An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.

[15]  Somchai Jitapunkul,et al.  Comparison of different techniques on Thai speech recognition , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[16]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Saduoki Furui Unsupervised speaker adaptation based on hierarchical spectral clustering , 1989, IEEE Trans. Acoust. Speech Signal Process..

[18]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[19]  Lin-Shan Lee,et al.  Continuous hidden Markov models integrating transitional and instantaneous features for Mandarin syllable recognition , 1993, Comput. Speech Lang..

[20]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[21]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[22]  L. Rabiner,et al.  Statistical decision approach to the recognition of connected digits , 1976 .

[23]  Chao-Hsin Wu Subsyllable-based Discriminative Segmental Bayesian Network For Mandarin Speech Keyword Spotting , 1997 .

[24]  Aruna Bayya,et al.  Speech recognition using hybrid hidden markov model and NN classifier , 1998, Int. J. Speech Technol..

[25]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.

[26]  Siripong Potisuk,et al.  Inter- and intraspeaker variability in fundamental frequency of Thai tones , 1991, Speech Commun..

[27]  Samuel D. Stearns,et al.  Signal processing algorithms in MATLAB , 1996 .

[28]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[29]  Xiaoyan Zhu,et al.  An approach to smooth fundamental frequencies in tone recognition , 1998, ICCT'98. 1998 International Conference on Communication Technology. Proceedings (IEEE Cat. No.98EX243).

[30]  A. Abramson,et al.  Static and dynamic acoustic cues in distinctive tones. , 1978, Language and speech.

[31]  Tan Lee,et al.  Cantonese syllable recognition using neural networks , 1999, IEEE Trans. Speech Audio Process..

[32]  Chiu-yu Tseng,et al.  Isolated Mandarin syllable recognition with limited training data specially considering the effect of tones , 1997, IEEE Trans. Speech Audio Process..

[33]  M P Harper,et al.  Acoustic Correlates of Stress in Thai , 1996, Phonetica.

[34]  Jack Gandour,et al.  Consonant types and tone in Siamese , 1974 .

[35]  Stephen A. Zahorian,et al.  A partitioned neural network approach for vowel classification using smoothed time/frequency features , 1999, IEEE Trans. Speech Audio Process..

[36]  Siripong Potisuk,et al.  Tonal Coarticulation in Thai , 1994 .

[37]  Eng-Fong Huang,et al.  An efficient algorithm for syllable hypothesization in continuous Mandarin speech recognition , 1994, IEEE Trans. Speech Audio Process..

[38]  J.-L. Shen Continuous Mandarin speech recognition for Chinese language with large vocabulary based on segmental probability model , 1998 .

[39]  Jia-Lin Shen,et al.  Complete recognition of continuous Mandarin speech for Chinese language with very large vocabulary using limited training data , 1997, IEEE Trans. Speech Audio Process..

[40]  Jamie S. Evans,et al.  Hidden Markov model state estimation with randomly delayed observations , 1999, IEEE Trans. Signal Process..

[41]  B.-H. Juang,et al.  Maximum-likelihood estimation for mixture multivariate stochastic observations of Markov chains , 1985, AT&T Technical Journal.

[42]  D. J. Hermes,et al.  The frequency scale of speech intonation. , 1991, The Journal of the Acoustical Society of America.

[43]  M. Zirra,et al.  Hybrid speech recognition system with discriminative training applied for Romanian language , 1998, MELECON '98. 9th Mediterranean Electrotechnical Conference. Proceedings (Cat. No.98CH36056).

[44]  B.-H. Juang,et al.  On the hidden Markov model and dynamic time warping for speech recognition — A unified view , 1984, AT&T Bell Laboratories Technical Journal.

[45]  Gerhard Rigoll,et al.  Maximum mutual information neural networks for hybrid connectionist-HMM speech recognition systems , 1994, IEEE Trans. Speech Audio Process..

[46]  Pak-Chung Ching,et al.  Tone recognition of isolated Cantonese syllables , 1995, IEEE Trans. Speech Audio Process..

[47]  Phil Rose,et al.  Considerations in the normalisation of the fundamental frequency of linguistic tone , 1987, Speech Commun..

[48]  W Jassem,et al.  Acoustic Correlates of Stress , 1965, Language and speech.

[49]  K.F. Lee,et al.  On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition , 1993, IEEE Trans. Speech Audio Process..

[50]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[51]  L. Rabiner,et al.  A statical decision approach to the recognition of connected digits , 1976 .

[52]  Dirk Van Compernolle,et al.  Multilayer perceptrons as labelers for hidden Markov models , 1994, IEEE Trans. Speech Audio Process..

[53]  Ronald W. Schafer,et al.  Real-time digital hardware pitch detector , 1976 .

[54]  Jack Gandour,et al.  Tone perception in Far Eastern languages. , 1983 .

[55]  L. R. Rabiner,et al.  Some properties of continuous hidden Markov model representations , 1985, AT&T Technical Journal.

[56]  J.-L. Shen Segmental probability distribution model approach for isolated Mandarin syllable recognition , 1998 .

[57]  Lawrence R. Rabiner,et al.  Application of dynamic time warping to connected digit recognition , 1980 .

[58]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[59]  Aaron E. Rosenberg,et al.  Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition , 1979 .

[60]  Chin-Hui Lee,et al.  Bayesian adaptive learning of the parameters of hidden Markov model for speech recognition , 1995, IEEE Trans. Speech Audio Process..

[61]  Ren-Yuan Lyu,et al.  Isolated Mandarin base-syllable recognition based upon the segmental probability model , 1998, IEEE Trans. Speech Audio Process..

[62]  M. Ross,et al.  Average magnitude difference function pitch extractor , 1974 .

[63]  Lawrence R. Rabiner,et al.  A segmental k-means training procedure for connected word recognition , 1986, AT&T Technical Journal.

[64]  Chin-Hui Lee,et al.  On-line adaptive learning of the continuous density hidden Markov model based on approximate recursive Bayes estimate , 1997, IEEE Trans. Speech Audio Process..

[65]  Lin-Shan Lee,et al.  A direct-concatenation approach to train hidden Markov models to recognize the highly confusing Mandarin syllables with very limited training data , 1993, IEEE Trans. Speech Audio Process..

[66]  Sin-Horng Chen,et al.  Modular recurrent neural networks for Mandarin syllable recognition , 1998, IEEE Trans. Neural Networks.

[67]  L. R. Rabiner,et al.  Recognition of isolated digits using hidden Markov models with continuous mixture densities , 1985, AT&T Technical Journal.

[68]  Roberto Gemello,et al.  Hybrid HMM-NN modeling of stationary-transitional units for continuous speech recognition , 2000, Inf. Sci..

[69]  Frank K. Soong,et al.  A vector-quantization-based preprocessor for speaker-independent isolated word recognition , 1985, IEEE Trans. Acoust. Speech Signal Process..

[70]  Frank K. Soong,et al.  High performance connected digit recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[71]  Eduardo Lleida,et al.  Utterance verification in continuous speech recognition: decoding and training procedures , 2000, IEEE Trans. Speech Audio Process..

[72]  Lin-Shan Lee,et al.  Voice dictation of Mandarin Chinese , 1997, IEEE Signal Process. Mag..

[73]  Siripong Potisuk,et al.  Prosodic disambiguation in automatic speech understanding of Thai , 1995 .

[74]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[75]  Sin-Horng Chen,et al.  Mandarin tone recognition by multi-layer perceptron , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[76]  Lawrence R. Rabiner,et al.  On the use of autocorrelation analysis for pitch detection , 1977 .

[77]  Somchai Jitapunkul,et al.  Recent advances of Thai speech recognition in Thailand , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[78]  Chai Wutiwiwatchai,et al.  Thai polysyllabic word recognition using fuzzy-neural network , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[79]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[80]  Chai Wutiwiwatchai,et al.  A new strategy of fuzzy-neural network for Thai numeral speech recognition , 1998, IEEE. APCCAS 1998. 1998 IEEE Asia-Pacific Conference on Circuits and Systems. Microelectronics and Integrating Systems. Proceedings (Cat. No.98EX242).

[81]  Chiu-yu Tseng,et al.  Golden Mandarin (I)-A real-time Mandarin speech dictation machine for Chinese language with very large vocabulary , 1993, IEEE Trans. Speech Audio Process..

[82]  Sin-Horng Chen,et al.  Tone recognition of continuous Mandarin speech based on neural networks , 1995, IEEE Trans. Speech Audio Process..

[83]  Shigeru Katagiri,et al.  A new hybrid algorithm for speech recognition based on HMM segmentation and learning vector quantization , 1993, IEEE Trans. Speech Audio Process..

[84]  J. Picone,et al.  Continuous speech recognition using hidden Markov models , 1990, IEEE ASSP Magazine.

[85]  Steve Renals,et al.  Start-synchronous search for large vocabulary continuous speech recognition , 1999, IEEE Trans. Speech Audio Process..

[86]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .