Characterizing fundamental frequency in Mandarin: a functional principal component approach utilizing mixed effect models.

A model for fundamental frequency (F0, or commonly pitch) employing a functional principal component (FPC) analysis framework is presented. The model is applied to Mandarin Chinese; this Sino-Tibetan language is rich in pitch-related information as the relative pitch curve is specified for most syllables in the lexicon. The approach yields a quantification of the influence carried by each identified component in relation to original tonal content, without formulating any assumptions on the shape of the tonal components. The original five speaker corpus is preprocessed using a locally weighted least squares smoother to produce F0 curves. These smoothed curves are then utilized as input for the computation of FPC scores and their corresponding eigenfunctions. These scores are analyzed in a series of penalized mixed effect models, through which meaningful categorical prototypes are built. The prototypes appear to confirm known tonal characteristics of the language, as well as suggest the presence of a sinusoid tonal component that is previously undocumented.

[1]  Chiu-yu Tseng,et al.  Fluent speech prosody: Framework and modeling , 2005, Speech Commun..

[2]  Nick S. Jones,et al.  Phylogenetic inference for function-valued traits: speech sound evolution. , 2012, Trends in ecology & evolution.

[3]  Hiroya Fujisaki,et al.  Information, prosody, and modeling - with emphasis on tonal features of speech - , 2004, Speech Prosody 2004.

[4]  Elisabeth Dévière,et al.  Analyzing linguistic data: a practical introduction to statistics using R , 2009 .

[5]  Santitham Prom-on,et al.  Quantitative Target Approximation Model: Simulating Underlying Mechanisms of Tones and Intonations , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Jonathan Evans,et al.  Linguistic and Human Effects on F₀ in a Tonal Dialect of Qiang , 2010, Phonetica.

[7]  Yi Xu,et al.  Effects of tone and focus on the formation and alignment of f0contours , 1999 .

[8]  H. Müller,et al.  Functional Data Analysis for Sparse Longitudinal Data , 2005 .

[9]  V. Gracco,et al.  Functional data analyses of lip motion. , 1996, The Journal of the Acoustical Society of America.

[10]  Jane-Ling Wang,et al.  Functional quasi‐likelihood regression models with smooth random effects , 2003 .

[11]  Li Deng,et al.  Speech Recognition, Machine Translation, and Speech Translation—A Unified Discriminative Learning Paradigm [Lecture Notes] , 2011, IEEE Signal Processing Magazine.

[12]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[13]  E. A. Sylvestre,et al.  Principal modes of variation for processes with continuous sample curves , 1986 .

[14]  Daniel Hirst,et al.  Automatic modelling of fundamental frequency using a quadratic sline function , 1993 .

[15]  Stefan Sudhoff,et al.  Methods in empirical prosody research , 2006 .

[16]  V. Carey,et al.  Mixed-Effects Models in S and S-Plus , 2001 .

[17]  趙 元任,et al.  A grammar of spoken Chinese = 中國話的文法 , 1968 .

[18]  Jorge C Lucero,et al.  Speech production variability in fricatives of children and adults: results of functional data analysis. , 2008, The Journal of the Acoustical Society of America.

[19]  P Taylor,et al.  Analysis and synthesis of intonation using the Tilt model. , 2000, The Journal of the Acoustical Society of America.

[20]  Anders Löfqvist,et al.  Measures of articulatory variability in VCV sequences , 2005 .

[21]  Marianne L. Borroff A landmark underspecification account of the patterning of glottal stop , 2007 .

[22]  John A. D. Aston,et al.  Linguistic pitch analysis using functional principal component mixed effect models , 2010 .

[23]  Chilin Shih,et al.  Chinese tone modeling with stem-ML , 2000, INTERSPEECH.

[24]  Sungbok Lee,et al.  Phrase boundary effects on the temporal kinematics of sequential tongue tip consonants. , 2008, The Journal of the Acoustical Society of America.

[25]  C. Gallagher Extending the Linear Model With R: Generalized Linear, Mixed Effects and Nonparametric Regression Models , 2007 .

[26]  P. Hall,et al.  Properties of principal component methods for functional and longitudinal data analysis , 2006, math/0608022.

[27]  Matthias W. Seeger,et al.  Gaussian Processes For Machine Learning , 2004, Int. J. Neural Syst..

[28]  Philipos C. Loizou,et al.  A noise-estimation algorithm for highly non-stationary environments , 2006, Speech Commun..

[29]  David J. Ostry,et al.  Functional data analyses of lip motion , 1995 .

[30]  R. Baayen,et al.  Mixed-effects modeling with crossed random effects for subjects and items , 2008 .

[31]  Shrikanth S. Narayanan,et al.  Primitives-based evaluation and estimation of emotions in speech , 2007, Speech Commun..

[32]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[33]  E. Barnard,et al.  Automatic intonation modeling with INTSINT , 2004 .

[34]  K. Reilly,et al.  Respiratory movement patterns during vocalizations at 7 and 11 months of age. , 2009, Journal of speech, language, and hearing research : JSLHR.

[35]  Daniel Hirst,et al.  A PRAAT PLUGIN FOR MOMEL AND INTSINT WITH IMPROVED ALGORITHMS FOR MODELLING AND CODING INTONATION. , 2007 .

[36]  Hansjörg Mixdorff,et al.  A novel approach to the fully automatic extraction of Fujisaki model parameters , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[37]  R. C. Torgerson,et al.  A Comparison of Beijing and Taiwan Mandarin Tone Register: An Acoustic Analysis of Three Native Speech Styles , 2005 .

[38]  B. Silverman,et al.  Estimating the mean and covariance structure nonparametrically when the data are curves , 1991 .

[39]  James O. Ramsay,et al.  Applied Functional Data Analysis: Methods and Case Studies , 2002 .

[40]  Dani Byrd,et al.  Functional data analysis of prosodic effects on articulatory timing. , 2006, The Journal of the Acoustical Society of America.

[41]  Yu Hu,et al.  Towards the automatic extraction of fujisaki model parameters for Mandarin , 2003, INTERSPEECH.

[42]  R. S. McGowan,et al.  Predicting midsagittal pharyngeal dimensions from measures of anterior tongue position in Swedish vowels: statistical considerations. , 2008, The Journal of the Acoustical Society of America.

[43]  Greg Kochanski,et al.  Connecting Intonation Labels to Mathematical Descriptions of Fundamental Frequency , 2007, Language and speech.

[44]  Chiu-yu Tseng,et al.  Sinica COSPRO and Toolkit — Corpora and Platform of Mandarin Chinese Fluent Speech , 2005 .

[45]  Ren-Hua Wang,et al.  A functional model for generation of local components of F0 contours in Chinese , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[46]  Fu-Chiang Chou,et al.  Machine readable phonetic transcription system for Chinese dialects spoken in Taiwan , 1999 .