From English pitch accent detection to Mandarin stress detection, where is the difference?

Although English pitch accent detection has been studied extensively, there relatively a few works explore Mandarin stress detection. Moreover, the comparison and analysis between Mandarin stress detection and English pitch accent detection have not been touched for such counterpart tasks. In this paper, we discuss Mandarin stress detection and compare it with English pitch accent detection. The contributions of the paper are two aspects: one is that we use classifier combination method to detect Mandarin stress and English pitch accent by using acoustic, lexical and syntactic evidence. Our proposed method achieves better performance on both the Mandarin prosodic annotation corpus-ASCCD and the English prosodic annotation corpus-Boston University Radio News Corpus (BURNC) when compared with the baseline system. We also verify our proposed method on other prosodic annotation corpus and continuous speech corpus. The other is the feature analysis. Duration, pitch, energy and intensity features are compared for Mandarin stress detection and English pitch accent detection. Based on the analysis of prosodic annotation corpora, we also verify some linguistic conclusions.

[1]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[3]  Julia Hirschberg,et al.  Detecting pitch accent using pitch-corrected energy-based predictors , 2007, INTERSPEECH.

[4]  Huang Tai-yi Study on Stress Perception in Chinese Speech , 2005 .

[5]  Julia Hirschberg,et al.  Detecting Pitch Accents at the Word, Syllable and Vowel Level , 2009, NAACL.

[6]  Li Aijun,et al.  CHINESE PROSODY AND PROSODIC LABELING OF SPONTANEOUS SPEECH , 2002 .

[7]  Mattias Heldner,et al.  A focus detector using overall intensity and high frequency emphasis , 1999 .

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Julia Hirschberg,et al.  Evaluation of prosodic transcription labeling reliability in the tobi framework , 1994, ICSLP.

[10]  Francine R. Chen,et al.  The use of emphasis to automatically summarize a spoken discourse , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Julia Hirschberg,et al.  Pitch Accent in Context: Predicting Intonational Prominence from Text , 1993, Artif. Intell..

[12]  Bhuvana Ramabhadran,et al.  Discriminative training and unsupervised adaptation for labeling prosodic events with limited training data , 2010, INTERSPEECH.

[13]  Mari Ostendorf,et al.  A Multi-level Model for Recognition of Intonation Labels , 1997, Computing Prosody.

[14]  Yasemin Altun,et al.  Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech , 2004, ACL.

[15]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[16]  Shrikanth S. Narayanan,et al.  An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[17]  Fabio Tamburini,et al.  Automatic prominence identification and prosodic typology , 2005, INTERSPEECH.

[18]  Liang Lu,et al.  Variational Bayesian Joint factor analysis for speaker verification , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Christopher D. Manning,et al.  Optimizing Chinese Word Segmentation for Machine Translation Performance , 2008, WMT@ACL.

[20]  Giuseppe Riccardi,et al.  Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events , 1999, EUROSPEECH.

[21]  Mari Ostendorf,et al.  Automatic labeling of prosodic patterns , 1994, IEEE Trans. Speech Audio Process..

[22]  Mattias Heldner,et al.  Spectral emphasis as an additional source of information in accent detection , 2001 .

[23]  Steven Greenberg,et al.  PROSODIC STRESS REVISITED: REASSESSING THE ROLE OF FUNDAMENTAL FREQUENCY , 2000 .

[24]  Johan Liljencrants,et al.  Acoustic-phonetic Analysis of Prominence in Swedish , 2000 .

[25]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[26]  Mark Hasegawa-Johnson,et al.  An automatic prosody labeling system using ANN-based syntactic-prosodic model and GMM-based acoustic-prosodic model , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[28]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[29]  Yang Liu,et al.  Automatic accent detection: effect of base units and boundary information , 2009, INTERSPEECH.

[30]  D. Bolinger A Theory of Pitch Accent in English , 1958 .

[31]  Ting,et al.  Study on automatic prediction of sentential stress for Chinese Putonghua Text-to-Speech system with natural style , 2007 .

[32]  Hyun-Chul Kim,et al.  Bayesian Classifier Combination , 2012, AISTATS.

[33]  Zhao Yong-zhen Study on automatic prediction of sentential stress with natural style in Chinese , 2006 .

[34]  Ani Nenkova,et al.  To Memorize or to Predict: Prominence labeling in Conversational Speech , 2007, NAACL.

[35]  Agaath M. C. Sluijter,et al.  Spectral balance as an acoustic correlate of linguistic stress. , 1996, The Journal of the Acoustical Society of America.

[36]  B. Rosner,et al.  Loudness predicts prominence: fundamental frequency lends little. , 2005, The Journal of the Acoustical Society of America.

[37]  Daniel Jurafsky,et al.  A Conditional Random Field Word Segmenter for Sighan Bakeoff 2005 , 2005, IJCNLP.

[38]  Steve Renals Proc. NAACL/HLT , 2010 .

[39]  Wu Hua,et al.  An application of SAMPA-c for standard Chinese , 2000, INTERSPEECH.

[40]  Mari Ostendorf,et al.  Prediction of abstract prosodic labels for speech synthesis , 1996, Comput. Speech Lang..

[41]  Xuejing Sun,et al.  Pitch accent prediction using ensemble machine learning , 2002, INTERSPEECH.

[42]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[43]  Shrikanth S. Narayanan,et al.  Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[44]  Roger K. Moore Computer Speech and Language , 1986 .

[45]  Yang Liu,et al.  Automatic prosodic events detection using syllable-based acoustic and syntactic features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.