Time normalization of voice signals using functional data analysis.

The harmonics-to-noise ratio (HNR) has been used to quantify the waveform irregularity of voice signals [Yumoto et al., J. Acoust. Soc. Am. 71, 1544-1550 (1982)]. This measure assumes that the signal consists of two components: a harmonic component, which is the common pattern that repeats from cycle-to-cycle, and an additive noise component, which produces the cycle-to-cycle irregularity. It has been shown [J. Qi, J. Acoust. Soc. Am. 92, 2569-2576 (1992)] that a valid computation of the HNR requires a nonlinear time normalization of the cycle wavelets to remove phase differences between them. This paper shows the application of functional data analysis to perform an optimal nonlinear normalization and compute the HNR of voice signals. Results obtained for the same signals using zero-padding, linear normalization, and dynamic programming algorithms are presented for comparison. Functional data analysis offers certain advantages over other approaches: it preserves meaningful features of signal shape, produces differentiable results, and allows flexibility in selecting the optimization criteria for the wavelet alignment. An extension of the technique for the time normalization of simultaneous voice signals (such as acoustic, EGG, and airflow signals) is also shown. The general purpose of this article is to illustrate the potential of functional data analysis as a powerful analytical tool for studying aspects of the voice production process.

[1]  Gail D. Chermak,et al.  Speech timing variability of children and adults , 1985 .

[2]  Bruce L. Smith Effects of experimental manipulations and intrinsic contrasts on relationships between duration and temporal variability in children's and adults' speech. , 1994 .

[3]  V. Gracco,et al.  Speech motor coordination and control: evidence from lip, jaw, and laryngeal movements , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  V L Gracco,et al.  Articulatory organization of mandibular, labial, and velar movements during speech. , 1995, The Journal of the Acoustical Society of America.

[5]  Y Qi Time normalization in voice analysis. , 1992, The Journal of the Acoustical Society of America.

[6]  J. Ramsay Estimating smooth monotone functions , 1998 .

[7]  Elaine T. Stathopoulos,et al.  Variability revisited: An acoustic, aerodynamic, and respiratory kinematic comparison of children and adults during speech , 1995 .

[8]  R. N. Ohde,et al.  Fundamental frequency correlates of stop consonant voicing and vowel quality in the speech of preadolescent children. , 1985, The Journal of the Acoustical Society of America.

[9]  J W Folkins,et al.  Variability of lip and jaw movements in children and adults: implications for the development of speech motor control. , 1985, Journal of speech and hearing research.

[10]  V. Gracco,et al.  Lip and jaw kinematics in bilabial stop consonant production. , 1997, Journal of speech, language, and hearing research : JSLHR.

[11]  Raymond D. Kent,et al.  Speech segment durations in sentence recitations by children and adults , 1980 .

[12]  S H Long,et al.  Experimental manipulation of speaking rate for studying temporal variability in children's speech. , 1982, The Journal of the Acoustical Society of America.

[13]  George D. Allen,et al.  Development of Speech Timing Control in Children. , 1975 .

[14]  A Löfqvist,et al.  Interarticulator programming in VCV sequences: lip and tongue movements. , 1999, The Journal of the Acoustical Society of America.

[15]  P. Lieberman Some Acoustic Measures of the Fundamental Periodicity of Normal and Pathologic Larynges , 1963 .

[16]  D. Berry,et al.  Analysis of vocal disorders with methods from nonlinear dynamics. , 1994, Journal of speech and hearing research.

[17]  V L Gracco,et al.  On the registration of time and the patterning of speech movements. , 1997, Journal of speech, language, and hearing research : JSLHR.

[18]  Helmer Strik,et al.  A dynamic programming algorithm for time-aligning and averaging physiological signals related to speech , 1991 .

[19]  I. Hirsh,et al.  Development of speech sounds in children. , 1969, Acta oto-laryngologica. Supplementum.

[20]  V. Gracco,et al.  Functional data analyses of lip motion. , 1996, The Journal of the Acoustical Society of America.

[21]  Y. Qi,et al.  Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. , 1997, The Journal of the Acoustical Society of America.

[22]  B L Smith Variability of Lip and Jaw Movements in the Speech of Children and Adults , 1995, Phonetica.

[23]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[24]  R. S. McGowan,et al.  Articulatory activity and aerodynamic variation during voiceless consonant production , 1995 .

[25]  K. Watkin,et al.  Labial coordination in children: preliminary considerations. , 1984, The Journal of the Acoustical Society of America.

[26]  T. Gasser,et al.  Alignment of curves by dynamic time warping , 1997 .

[27]  B Weinberg,et al.  Minimizing the effect of period determination on the computation of amplitude perturbation in voice. , 1995, The Journal of the Acoustical Society of America.

[28]  I. Titze,et al.  Comparison of Fo extraction methods for high-precision voice perturbation measurements. , 1993, Journal of speech and hearing research.