Automatic recognition of Indonesian declarative questions and statements using polynomial coefficients of the pitch contours

We propose an automatic utterance type recognizer that distinguishes declarative questions from statements in Indonesian speech. Since utterances in these two types have the same words with the same order and differ only in their intonations, their classification requires not only a speech recognizer, but also an intonation recognizer. In this paper, the most important utterance part for distinguishing those two types is first identified by perceptual experiments. Then, an utterance type recognizer using that part is proposed, where polynomial expansion is used as a feature extractor and a neural network is used as a classifier. We evaluated our method using Indonesian speech database including 29 pairs of sentences of those two types, each of which uttered by 35 speakers. It was proved that final word and final-two-syllables are equally effective for the discrimination of each utterance. The proposed recognizer achieved the best accuracy of 89.1% when the order of polynomial expansion was three and the neural network was a linear perceptron.

[1]  V. J. Heuven,et al.  Word stress in Indonesian; Its communicative relevance , 1998 .

[2]  Anton Nijholt,et al.  Dialogue Act Recognition with Bayesian Networks for Dutch Dialogues , 2002, SIGDIAL Workshop.

[3]  Ken Samuel,et al.  Automatically Selecting Useful Phrases for Dialogue Act Tagging , 1999, ArXiv.

[4]  Alon Lavie,et al.  Domain Specific Speech Acts for Spoken Language Translation , 2003, SIGDIAL Workshop.

[5]  Carlos Toshinori Ishi Perceptually-Related F0 Parameters for Automatic Classification of Phrase Final Tones , 2005, IEICE Trans. Inf. Syst..

[6]  Nazrul Effendy,et al.  Intonation recognition for indonesian speech based on fujisaki model , 2004, INTERSPEECH.

[7]  Wolfgang Wahlster,et al.  Verbmobil: the combination of deep and shallow processing for spontaneous speech translation , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Masato Akagi,et al.  Speaker individualities in fundamental frequency contours and its control , 1995, EUROSPEECH.

[9]  Helen Wright,et al.  Automatic utterance type detection using suprasegmental features , 1998, ICSLP.

[10]  D. Bolinger Intonation and its parts : melody in spoken English , 1987 .

[11]  P. Watson,et al.  The relationship of vocal loudness manipulation to prosodic F0 and durational variables in healthy adults. , 2006, Journal of speech, language, and hearing research : JSLHR.

[12]  Massimo Poesio,et al.  Using high level dialogue information for dialogue act recognition using prosodic features. , 1999 .

[13]  Saeed Vaseghi,et al.  Analysis of acoustic correlates of British, Australian and American accents , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[14]  Vincent J. van Heuven,et al.  Speech rate as a secondary prosodic characteristic of polarity questions in three languages , 2005, Speech Commun..

[15]  Yoshinori Sagisaka,et al.  Acoustic characteristics of speaker individuality: Control and conversion , 1995, Speech Commun..

[16]  Douglas A. Reynolds,et al.  Modeling prosodic dynamics for speaker recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Gil-Chang Kim,et al.  A Dialogue Analysis Model with Statistical Speech Act Processing for Dialogue Machine Translation , 1997, Workshop On Spoken Language Translation.

[18]  S.-H. Hwang,et al.  Neural network-based F0 text-to-speech synthesiser for Mandarin , 1994 .

[19]  Samsuri Analisa bahasa : memahami bahasa secara ilmiah / oleh Samsuri , 1978 .

[20]  Chilin Shih,et al.  Comparison of Declarative and Interrogative Intonation in Chinese , 2002 .

[21]  H Levitt,et al.  Analysis of fundamental frequency contours in speech. , 1971, The Journal of the Acoustical Society of America.

[22]  Hiyan Alshawi,et al.  Effective Utterance Classification with Unsupervised Phonotactic Models , 2003, NAACL.

[23]  Klaus Ries,et al.  HMM and neural network based speech act detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).