Prediction of Korean Prosodic Phrase Boundary by Efficient Feature Selection in Machine Learning

Prediction of the prosodic phrase boundary is a potent influence on the performance of speech recognition and voice synthesis systems. We propose a statistical approach using efficient learning features for the natural prediction of the Korean prosodic phrase boundary. These new features reflect factors that affect the generation of the prosodic phrase boundary better than existing learning features. Notably, moreover, learning features that are extracted according to the hand-crafted prosodic phrase prediction rule impart higher accuracy. We evaluated the new learning features in terms of their efficiency in predicting the prosodic phrase boundary, using CRFs (Conditional Random Fields). The results were 84.63% accuracy for three levels and 80.14% accuracy for six levels.

[1]  Mari Ostendorf,et al.  Prosody prediction for speech synthesis using transformational rule-based learning , 1998, ICSLP.

[2]  Hyuk-Chul Kwon,et al.  Stochastic Korean Word-Spacing with Smoothing Using Korean Spelling Checker , 2004, Int. J. Comput. Process. Orient. Lang..

[3]  Gary Geunbae Lee,et al.  Automatic corpus-based tone and break-index prediction using K-ToBI representation , 2002, TALIP.

[4]  Julia Hirschberg,et al.  Training intonational phrasing rules automatically for English and Spanish text-to-speech , 1996, Speech Commun..

[5]  David Yarowsky,et al.  Homograph Disambiguation in Text-to-Speech Synthesis , 1997 .

[6]  Yung-Hwan Oh,et al.  Tree-based modeling of prosodic phrasing and segmental duration for Korean TTS systems , 1999, Speech Commun..

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  S. Jun,et al.  Prosody in Sentence Processing: Korean vs. English , 2005 .

[9]  Mari Ostendorf,et al.  A Hierarchical Stochastic Model for Automatic Prediction of Prosodic Boundary Location , 1994, CL.

[10]  Eric Sanders,et al.  Using Statistical Models to Predict Phrase Boundaries for Speech Synthesis , 1995 .

[11]  Paul Taylor,et al.  Assigning phrase breaks from part-of-speech sequences , 1997, Comput. Speech Lang..

[12]  Hyuk-Chul Kwon,et al.  Grapheme-to-Phoneme Conversion of Arabic Numeral Expressions for Embedded TTS Systems , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Julia Hirschberg,et al.  Automatic classification of intonational phrase boundaries , 1992 .