Quasi-closed phase forward-backward linear prediction analysis of speech for accurate formant detection and estimation.

Recently, a quasi-closed phase (QCP) analysis of speech signals for accurate glottal inverse filtering was proposed. However, the QCP analysis which belongs to the family of temporally weighted linear prediction (WLP) methods uses the conventional forward type of sample prediction. This may not be the best choice especially in computing WLP models with a hard-limiting weighting function. A sample selective minimization of the prediction error in WLP reduces the effective number of samples available within a given window frame. To counter this problem, a modified quasi-closed phase forward-backward (QCP-FB) analysis is proposed, wherein each sample is predicted based on its past as well as future samples thereby utilizing the available number of samples more effectively. Formant detection and estimation experiments on synthetic vowels generated using a physical modeling approach as well as natural speech utterances show that the proposed QCP-FB method yields statistically significant improvements over the conventional linear prediction and QCP methods.

[1]  D. N. Swingler,et al.  A comparison between burg's maximum entropy method and a nonrecursive technique for the spectral analysis of deterministic signals , 1979 .

[2]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[3]  Li Deng,et al.  A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  P F Assmann The role of formant transitions in the perception of concurrent vowels. , 1995, The Journal of the Acoustical Society of America.

[5]  Abeer Alwan,et al.  A Database of Vocal Tract Resonance Trajectories for Research in Speech Processing , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  C. Gobl The Voice Source in Speech Communication - Production and Perception Experiments Involving Inverse Filtering and Synthesis , 2003 .

[7]  Riichiro Mizoguchi,et al.  Speech analysis by selective linear prediction in the time domain , 1982, ICASSP.

[8]  Li Deng,et al.  Adaptive Kalman Filtering and Smoothing for Tracking Vocal Tract Resonances Using a Continuous-Valued Hidden Dynamic Model , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[10]  Raymond N. J. Veldhuis,et al.  Extraction of vocal-tract system characteristics from speech signals , 1998, IEEE Trans. Speech Audio Process..

[11]  Kenneth Steiglitz,et al.  The use of time-domain selection for improved linear prediction , 1977 .

[12]  J. Hillenbrand,et al.  Acoustic characteristics of American English vowels. , 1994, The Journal of the Acoustical Society of America.

[13]  Daniel Rudoy,et al.  KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking , 2011, The Journal of the Acoustical Society of America.

[14]  Murray B. Sachs,et al.  Frequency-shaped amplification changes the neural representation of speech with noise-induced hearing loss , 1998, Hearing Research.

[15]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[16]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[17]  Haizhou Li,et al.  Formant excursion in singing synthesis , 2015, 2015 IEEE International Conference on Digital Signal Processing (DSP).

[18]  Paavo Alku,et al.  Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction , 2012, INTERSPEECH.

[19]  Ian C Bruce Physiological assessment of contrast-enhancing frequency shaping and multiband compression in hearing aids. , 2004, Physiological measurement.

[20]  P. Wolfe,et al.  Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking a ) , 2011 .

[21]  B. Gold,et al.  Analysis of digital and analog formant synthesizers , 1968 .

[22]  G. R. Stegen,et al.  Experiments with maximum entropy power spectra of sinusoids , 1974 .

[23]  Paavo Alku,et al.  Quasi Closed Phase Glottal Inverse Filtering Analysis With Weighted Linear Prediction , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[25]  Chin-Hui Lee,et al.  On robust linear prediction of speech , 1988, IEEE Trans. Acoust. Speech Signal Process..

[26]  Robert Mores,et al.  Fast and robust formant detection from LP data , 2012, Speech Commun..

[27]  Paavo Alku,et al.  Stabilised weighted linear prediction , 2009, Speech Commun..

[28]  Chong-Kwan Un,et al.  On Predictive Coding of Speech Signals , 1985 .

[29]  P. Fougere,et al.  Spontaneous line splitting in maximum entropy power spectrum analysis , 1976 .

[30]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Yves Kamp,et al.  Robust signal selection for linear prediction analysis of voiced speech , 1993, Speech Commun..

[32]  Bhiksha Raj,et al.  Formant manipulations in voice disguise by mimicry , 2016, 2016 4th International Conference on Biometrics and Forensics (IWBF).

[33]  Hermann Ney,et al.  Formant estimation for speech recognition , 1998, IEEE Trans. Speech Audio Process..

[34]  Paavo Alku,et al.  Extended weighted linear prediction (XLP) analysis of speech and its application to speaker verification in adverse conditions , 2010, INTERSPEECH.

[35]  Hyeontaek Lim,et al.  Formant-Based Robust Voice Activity Detection , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[36]  P. Alku,et al.  Formant frequency estimation of high-pitched vowels using weighted linear prediction. , 2013, The Journal of the Acoustical Society of America.

[37]  T. Ulrych,et al.  Time series modeling and maximum entropy , 1976 .

[38]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[39]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[40]  Donald G. Childers,et al.  Formant speech synthesis: improving production quality , 1989, IEEE Trans. Acoust. Speech Signal Process..