Classification-Based Detection of Glottal Closure Instants from Speech Signals

In this paper a classification-based method for the automatic detection of glottal closure instants (GCIs) from the speech signal is proposed. Peaks in the speech waveforms are taken as candidates for GCI placements. A classification framework is used to train a classification model and to classify whether or not a peak corresponds to the GCI. We show that the detection accuracy in terms of F1 score is 97.27%. In addition, despite using the speech signal only, the proposed method behaves comparably to a method utilizing the glottal signal. The method is also compared with three existing GCI detection algorithms on publicly available databases.

[1]  Abeer Alwan,et al.  Glottal source processing: From analysis to applications , 2014, Comput. Speech Lang..

[2]  Junichi Yamagishi,et al.  Glottal spectral separation for parametric speech synthesis , 2008, INTERSPEECH.

[3]  Thomas Drugman Residual Excitation Skewness for Automatic Speech Polarity Detection , 2013, IEEE Signal Processing Letters.

[4]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[5]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[6]  Patrick A. Naylor,et al.  Data-driven voice soruce waveform modelling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  P.A. Naylor,et al.  Spatiotemporal Averagingmethod for Enhancement of Reverberant Speech , 2007, 2007 15th International Conference on Digital Signal Processing.

[8]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9]  Daniel Tihelka,et al.  Pitch Marks at Peaks or Valleys? , 2007, TSD.

[10]  S. R. Mahadeva Prasanna,et al.  Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[11]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[12]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13]  Jindrich Matousek,et al.  Automatic pitch-synchronous phonetic segmentation , 2008, INTERSPEECH.

[14]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[15]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[17]  Prasanta Kumar Ghosh,et al.  An error correction scheme for GCI detection algorithms using pitch smoothness criterion , 2015, INTERSPEECH.

[18]  Beat Pfister,et al.  Accurate pitch marking for prosodic modification of speech segments , 2010, INTERSPEECH.

[19]  Thierry Dutoit,et al.  MIXED-PHASE SPEECH MODELING AND FORMANT ESTIMATION , USING DIFFERENTIAL PHASE SPECTRUMS , 2003 .

[20]  Daniel Tihelka,et al.  A robust multi-phase pitch-mark detection algorithm , 2007, INTERSPEECH.

[21]  Khalid Daoudi,et al.  Detection of Glottal Closure Instants Based on the Microcanonical Multiscale Formalism , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Alan W. Black,et al.  The CMU Arctic speech databases , 2004, SSW.

[23]  Jindrich Matousek,et al.  F0 transformation within the voice conversion framework , 2007, INTERSPEECH.

[24]  Patrick A. Naylor,et al.  Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[25]  Junichi Yamagishi,et al.  Towards an improved modeling of the glottal source in statistical parametric speech synthesis , 2007, SSW.

[26]  Thierry Dutoit,et al.  Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[27]  Thierry Dutoit,et al.  On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation , 1996, Speech Commun..

[28]  John Kane,et al.  COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  J. Matoušek,et al.  On the detection of pitch marks using a robust multi-phase algorithm , 2011, Speech Commun..

[30]  A. G. Ramakrishnan,et al.  Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Ronald A. Cole,et al.  Pitch detection with a neural-net classifier , 1991, IEEE Trans. Signal Process..

[32]  Jonas Beskow,et al.  Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[33]  Mike Brookes,et al.  Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.