Intonation Based Sentence Modality Classifier for Czech Using Artificial Neural Network

This paper presents an idea and first results of sentence modality classifier for Czech based purely on intonational information. This is in contrast with other studies which usually use more features (including lexical features) for this type of classification. As the sentence melody (intonation) is the most important feature, all the experiments were done on an annotated sample of Czech audiobooks library recorded by Czech leading actors. A non-linear model implemented by artificial neural network (ANN) was chosen for the classification. Two types of ANN are considered in this work in terms of temporal pattern classifications - classical multi-layer perceptron (MLP) network and Elman's network, results for MLP are presented. Pre-processing of temporal intonational patterns for use as ANN inputs is discussed. Results show that questions are very often misclassified as statements and exclamation marks are not detectable in current data set.

[1]  G Pfurtscheller,et al.  Using time-dependent neural networks for EEG classification. , 2000, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[2]  Andreas Stolcke,et al.  Can Prosody Aid the Automatic Classification of Dialog Acts in Conversational Speech? , 1998, Language and speech.

[3]  Ji-Hwan Kim,et al.  The use of prosody in a combined system for punctuation generation and speech recognition , 2001, INTERSPEECH.

[4]  Hwee Tou Ng,et al.  Better Punctuation Prediction with Dynamic Conditional Random Fields , 2010, EMNLP.

[5]  Volker Strom,et al.  Detection of accents, phrase boundaries and sentence modality in German with prosodic features , 1995, EUROSPEECH.

[6]  Heidi Christensen,et al.  Punctuation annotation using statistical prosody models. , 2001 .

[7]  Yoshihiko Gotoh,et al.  Sentence Boundary Detection in Broadcast Speech Transcripts , 2000 .

[8]  Sadaoki Furui,et al.  International Speech Communication Association , 2006 .

[9]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[10]  Georg Dorffner,et al.  Neural Networks for Time Series Processing , 1996 .

[11]  Geoffrey Zweig,et al.  Maximum entropy model for punctuation annotation from speech , 2002, INTERSPEECH.

[12]  Zdena Palková,et al.  Fonetika a fonologie češtiny : s obecným úvodem do problematikyoboru , 1994 .

[13]  Lilian Harada Complex Temporal Patterns Detection over Continuous Data Streams , 2002, ADBIS.

[14]  Pavel Král,et al.  Sentence Modality Recognition in French based on Prosody , 2005 .

[15]  Jinglu Hu,et al.  A dynamic pattern recognition approach based on neural network for stock time-series , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[16]  John D. Lafferty,et al.  Cyberpunc: a lightweight punctuation annotation system for speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[17]  C. Julian Chen,et al.  Speech recognition with automatic punctuation , 1999, EUROSPEECH.

[18]  van Gerardus Noord,et al.  Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010) , 2010 .

[19]  Tao Jiang,et al.  Online Detecting and Predicting Special Patterns over Financial Data Streams , 2009, J. Univers. Comput. Sci..

[20]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..