Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index

Epoch is defined as the instant of significant excitation within a pitch period of voiced speech. Epoch extraction continues to attract the interest of researchers because of its significance in speech analysis. Existing high performance epoch extraction algorithms require either dynamic programming techniques or a priori information of the average pitch period. An algorithm without such requirements is proposed based on integrated linear prediction residual (ILPR) which resembles the voice source signal. Half wave rectified and negated ILPR (or Hilbert transform of ILPR) is used as the pre-processed signal. A new non-linear temporal measure named the plosion index (PI) has been proposed for detecting ‘transients’ in speech signal. An extension of PI, called the dynamic plosion index (DPI) is applied on pre-processed signal to estimate the epochs. The proposed DPI algorithm is validated using six large databases which provide simultaneous EGG recordings. Creaky and singing voice samples are also analyzed. The algorithm has been tested for its robustness in the presence of additive white and babble noise and on simulated telephone quality speech. The performance of the DPI algorithm is found to be comparable or better than five state-of-the-art techniques for the experiments considered.

[1]  A G Ramakrishnan,et al.  Detection of the closure-burst transitions of stops and affricates in continuous speech using the plosion index. , 2014, The Journal of the Acoustical Society of America.

[2]  Bayya Yegnanarayana,et al.  Determination of instants of significant excitation in speech using group delay function , 1995, IEEE Trans. Speech Audio Process..

[3]  John Kane,et al.  Evaluation of glottal closure instant detection in a range of voice qualities , 2013, Speech Commun..

[4]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[5]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[6]  M. Ng,et al.  Acoustic, aerodynamic, physiologic, and perceptual properties of modal and vocal fry registers. , 1998, The Journal of the Acoustical Society of America.

[7]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Thierry Dutoit,et al.  Oscillating Statistical Moments for Speech Polarity Detection , 2011, NOLISP.

[9]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  B YEGNANARAYANA,et al.  Epoch-based analysis of speech signals , 2011 .

[11]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[12]  L. H. Anauer,et al.  Speech Analysis and Synthesis by Linear Prediction of the Speech Wave , 2000 .

[13]  D G Childers,et al.  Modeling the glottal volume-velocity waveform for three voice types. , 1995, The Journal of the Acoustical Society of America.

[14]  Mike Brookes,et al.  The DYPSA algorithm for estimation of glottal closure instants in voiced speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  D. Childers,et al.  A critical review of electroglottography. , 1985, Critical reviews in biomedical engineering.

[17]  B. Yegnanarayana,et al.  Epoch extraction of voiced speech , 1975 .

[18]  Kishore Prahallad,et al.  An FIR Implementation of Zero Frequency Filtering of Speech Signals , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[21]  S. Boyd Multitone signals with low crest factor , 1986 .

[22]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  J. Markel Digital inverse filtering-a new tool for formant trajectory estimation , 1972 .

[24]  S. R. Mahadeva Prasanna,et al.  Determination of Instants of Significant Excitation in Speech Using Hilbert Envelope and Group Delay Function , 2007, IEEE Signal Processing Letters.

[25]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[26]  R. Miller Nature of the Vocal Cord Wave , 1956 .

[27]  Yves Kamp,et al.  A Frobenius norm approach to glottal closure detection from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[28]  平野 実,et al.  Vocal fold physiology : voice quality control , 1995 .