Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction

Since performance of conventional linear prediction (LP) deteriorates in formant estimation of high-pitched voices, several all-pole modeling methods robust to F0 have been developed. This study compares five such previously known methods and proposes a new technique, Weighted Linear Prediction with Attenuated Main Excitation (WLP-AME). WLP-AME utilizes weighted linear prediction in which the square of the prediction error is multiplied with a weighting function that downgrades the contribution of the glottal source in the model optimization. Consequently, the resulting all-pole model is affected more by the vocal tract characteristics, which leads to more accurate formant estimates. By using synthetic vowels created with a physical modeling approach, the study shows that WLP-AME yields improved formant frequency estimates for high-pitched vowels in comparison to the previously known methods.

[1]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[2]  I. Titze The myoelastic aerodynamic theory of phonation , 2006 .

[3]  Brad H Story,et al.  Synergistic modes of vocal tract articulation for American English vowels. , 2005, The Journal of the Acoustical Society of America.

[4]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[5]  Chin-Hui Lee,et al.  On robust linear prediction of speech , 1988, IEEE Trans. Acoust. Speech Signal Process..

[6]  Ingo R Titze,et al.  Regulating glottal airflow in phonation: application of the maximum power transfer theorem to a low dimensional phonation model. , 2002, The Journal of the Acoustical Society of America.

[7]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[8]  Brad H Story,et al.  Technique for "tuning" vocal tract area functions based on acoustic sensitivity functions. , 2006, The Journal of the Acoustical Society of America.

[9]  Thomas F. Quatieri,et al.  High-Pitch Formant Estimation by Exploiting Temporal Change of Pitch , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  B. Gold,et al.  Analysis of digital and analog formant synthesizers , 1968 .

[11]  I. Titze Parameterization of the glottal area, glottal flow, and vocal fold contact area. , 1984, The Journal of the Acoustical Society of America.

[12]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[14]  Amro El-Jaroudi,et al.  Discrete all-pole modeling , 1991, IEEE Trans. Signal Process..

[15]  Tetsuya Shimamura,et al.  Linear Prediction Using Refined Autocorrelation Function , 2007, EURASIP J. Audio Speech Music. Process..

[16]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[17]  Brad H Story,et al.  Comparison of magnetic resonance imaging-based vocal tract area functions obtained from the same speaker in 1994 and 2002. , 2008, The Journal of the Acoustical Society of America.

[18]  Quarterly Progress and Status Report A preliminary study of acoustic voice quality correlates , 2007 .