Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review

The pseudo-periodicity of voiced speech can be exploited in several speech processing applications. This requires however that the precise locations of the glottal closure instants (GCIs) are available. The focus of this paper is the evaluation of automatic methods for the detection of GCIs directly from the speech waveform. Five state-of-the-art GCI detection algorithms are compared using six different databases with contemporaneous electroglottographic recordings as ground truth, and containing many hours of speech by multiple speakers. The five techniques compared are the Hilbert Envelope-based detection (HE), the Zero Frequency Resonator-based method (ZFR), the Dynamic Programming Phase Slope Algorithm (DYPSA), the Speech Event Detection using the Residual Excitation And a Mean-based Signal (SEDREAMS) and the Yet Another GCI Algorithm (YAGA). The efficacy of these methods is first evaluated on clean speech, both in terms of reliabililty and accuracy. Their robustness to additive noise and to reverberation is also assessed. A further contribution of the paper is the evaluation of their performance on a concrete application of speech processing: the causal-anticausal decomposition of speech. It is shown that for clean speech, SEDREAMS and YAGA are the best performing techniques, both in terms of identification rate and accuracy. ZFR and SEDREAMS also show a superior robustness to additive noise and reverberation.

[1]  Noureddine Ellouze,et al.  Glottal opening instant detection from speech signal , 2004, 2004 12th European Signal Processing Conference.

[2]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[3]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[4]  P.A. Naylor,et al.  Spatiotemporal Averagingmethod for Enhancement of Reverberant Speech , 2007, 2007 15th International Conference on Digital Signal Processing.

[5]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[6]  Stéphane Mallat,et al.  Characterization of Signals from Multiscale Edges , 2011, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thierry Dutoit,et al.  MIXED-PHASE SPEECH MODELING AND FORMANT ESTIMATION , USING DIFFERENTIAL PHASE SPECTRUMS , 2003 .

[8]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[9]  Bayya Yegnanarayana,et al.  A robust method for determining instants of major excitations in voiced speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Thierry Dutoit,et al.  A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis , 2019, INTERSPEECH.

[11]  H. Strube Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.

[12]  H. Sabine Room Acoustics , 1953, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[13]  Christophe d'Alessandro,et al.  Zeros of Z-transform representation with application to source-filter separation in speech , 2005, IEEE Signal Processing Letters.

[14]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[15]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[16]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[17]  Christophe d'Alessandro,et al.  The voice source as a causal/anticausal linear filter , 2003 .

[18]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[19]  Brian M. Sadler,et al.  Analysis of Multiscale Products for Step Detection and Estimation , 1999, IEEE Trans. Inf. Theory.

[20]  Mike Brookes,et al.  A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Patrick A. Naylor,et al.  Data-driven voice soruce waveform modelling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[23]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Bayya Yegnanarayana,et al.  Enhancement of reverberant speech using LP residual signal , 2000, IEEE Trans. Speech Audio Process..

[25]  P. Ladefoged,et al.  Fundamental problems in phonetics , 1977 .

[26]  Bayya Yegnanarayana,et al.  Epoch Extraction From Speech Signals , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  R. H. Bolt,et al.  Theory of Speech masking by reverberation , 1949 .

[29]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[30]  S. R. Mahadeva Prasanna,et al.  Determination of Instants of Significant Excitation in Speech Using Hilbert Envelope and Group Delay Function , 2007, IEEE Signal Processing Letters.

[31]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[32]  Yves Kamp,et al.  A Frobenius norm approach to glottal closure detection from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[33]  M. Pavel,et al.  Variability of Glottal Pulse Estimation Using Cepstral Method , 2006, Proceedings of the 7th Nordic Signal Processing Symposium - NORSIG 2006.

[34]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[35]  Patrick A. Naylor,et al.  Multichannel DYPSA for estimation of glottal closure instants in reverberant speech , 2007, 2007 15th European Signal Processing Conference.

[36]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[37]  Keiichi Funaki,et al.  Recursive ARMAX speech analysis based on a glottal source model with phase compensation , 1999, Signal Process..

[38]  Nick Campbell,et al.  Determining polarity of speech signals based on gradient of spurious glottal waveforms , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[39]  N. Ellouze,et al.  Multiscale Product of Electroglottogram Signal for Glottal Closure and Opening Instant Detection , 2006, The Proceedings of the Multiconference on "Computational Engineering in Systems Applications".

[40]  Thierry Dutoit,et al.  Complex cepstrum-based decomposition of speech for glottal source estimation , 2009, INTERSPEECH.