Sparse representation and epoch estimation of voiced speech

Whereas most approaches to linear speech prediction fail to account for the quasi-periodic glottal flow, this paper incorporates a model for the glottal flow derivative (GFD) directly into the linear prediction problem. A linear model for the prediction error is obtained by constructing a dictionary of time-shifted GFD pulses. The pulses are constructed by applying glottal inverse filtering (GIF) to recorded speech. Minimizing the difference between the linear prediction residual and a sparse combination of the pulses in the dictionary leads to joint estimation of the linear predictor as well as a sparse representation for the prediction error that reveals the instants of vocal tract excitation (epochs). The method is applied to voiced segments extracted from the CMU Arctic dataset which also includes electro-glottograms. Results show that the proposed method is effective in estimating the parameters of interest and that GIF-based pulses more accurately model GFD pulses occurring in real speech than pulses computed using the mathematical models.

[1]  R. Yarlagadda,et al.  Linear predictive spectral analysis via the Lpnorm , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions , 1994 .

[3]  Marc Moonen,et al.  Sparse Linear Prediction and Its Applications to Speech Processing , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Jan Skoglund,et al.  Low Rate Speech Coding using a Glottal Pulse Codebook , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[5]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[6]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  PAAVO ALKU,et al.  Glottal inverse filtering analysis of human voice production — A review of estimation and parameterization methods of the glottal excitation and their applications , 2011 .

[8]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[9]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[10]  Yoshihisa Ishida,et al.  Pole-zero estimation of speech based on L/sub 1/ norm linear prediction , 1995, IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing. Proceedings.

[11]  Jan Skoglund Analysis and quantization of glottal pulse shapes , 1998, Speech Commun..

[12]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[14]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[15]  Etienne Denoel,et al.  Linear prediction of speech with a least absolute error criterion , 1985, IEEE Trans. Acoust. Speech Signal Process..

[16]  Paavo Alku Low bit rate speech coding with glottal linear prediction , 1990, IEEE International Symposium on Circuits and Systems.

[17]  Thomas Quatieri,et al.  Discrete-Time Speech Signal Processing: Principles and Practice , 2001 .

[18]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Chin-Hui Lee,et al.  On robust linear prediction of speech , 1988, IEEE Trans. Acoust. Speech Signal Process..

[20]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[21]  Qiang Fu,et al.  Robust Glottal Source Estimation Based on Joint Source-Filter Model Optimization , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Eric Moulines,et al.  Non-parametric techniques for pitch-scale and time-scale modification of speech , 1995, Speech Commun..

[23]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[24]  Todd Moon,et al.  Joint linear prediction and epoch estimation of voiced speech using a basis where the prediction residual can be sparsely represented , 2013, 2013 IEEE Digital Signal Processing and Signal Processing Education Meeting (DSP/SPE).

[25]  M. N. S. Swamy,et al.  Accurate estimation of the glottal flow derivative using iteratively reweighted 1-norm minimization , 2011, 2011 IEEE 9th International New Circuits and systems conference.