论文信息 - Classification-Based Detection of Glottal Closure Instants from Speech Signals

Classification-Based Detection of Glottal Closure Instants from Speech Signals

In this paper a classification-based method for the automatic detection of glottal closure instants (GCIs) from the speech signal is proposed. Peaks in the speech waveforms are taken as candidates for GCI placements. A classification framework is used to train a classification model and to classify whether or not a peak corresponds to the GCI. We show that the detection accuracy in terms of F1 score is 97.27%. In addition, despite using the speech signal only, the proposed method behaves comparably to a method utilizing the glottal signal. The method is also compared with three existing GCI detection algorithms on publicly available databases.

Daniel Tihelka | Jindrich Matousek

[1] Abeer Alwan,et al. Glottal source processing: From analysis to applications , 2014, Comput. Speech Lang..

[2] Junichi Yamagishi,et al. Glottal spectral separation for parametric speech synthesis , 2008, INTERSPEECH.

[3] Thomas Drugman. Residual Excitation Skewness for Automatic Speech Polarity Detection , 2013, IEEE Signal Processing Letters.

[4] Christophe d'Alessandro,et al. Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[5] Eric Moulines,et al. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[6] Patrick A. Naylor,et al. Data-driven voice soruce waveform modelling , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7] P.A. Naylor,et al. Spatiotemporal Averagingmethod for Enhancement of Reverberant Speech , 2007, 2007 15th International Conference on Digital Signal Processing.

[8] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[9] Daniel Tihelka,et al. Pitch Marks at Peaks or Valleys? , 2007, TSD.

[10] S. R. Mahadeva Prasanna,et al. Extraction of speaker-specific excitation information from linear prediction residual of speech , 2006, Speech Commun..

[11] David Talkin,et al. A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[12] D. Childers,et al. Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[13] Jindrich Matousek,et al. Automatic pitch-synchronous phonetic segmentation , 2008, INTERSPEECH.

[14] A. Gray,et al. Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[15] Patrick A. Naylor,et al. Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[16] Steven Salzberg,et al. On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[17] Prasanta Kumar Ghosh,et al. An error correction scheme for GCI detection algorithms using pitch smoothness criterion , 2015, INTERSPEECH.

[18] Beat Pfister,et al. Accurate pitch marking for prosodic modification of speech segments , 2010, INTERSPEECH.

[19] Thierry Dutoit,et al. MIXED-PHASE SPEECH MODELING AND FORMANT ESTIMATION , USING DIFFERENTIAL PHASE SPECTRUMS , 2003 .

[20] Daniel Tihelka,et al. A robust multi-phase pitch-mark detection algorithm , 2007, INTERSPEECH.

[21] Khalid Daoudi,et al. Detection of Glottal Closure Instants Based on the Microcanonical Multiscale Formalism , 2014, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22] Alan W. Black,et al. The CMU Arctic speech databases , 2004, SSW.

[23] Jindrich Matousek,et al. F0 transformation within the voice conversion framework , 2007, INTERSPEECH.

[24] Patrick A. Naylor,et al. Detection of Glottal Closure Instants From Speech Signals: A Quantitative Review , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[25] Junichi Yamagishi,et al. Towards an improved modeling of the glottal source in statistical parametric speech synthesis , 2007, SSW.

[26] Thierry Dutoit,et al. Glottal closure and opening instant detection from speech signals , 2019, INTERSPEECH.

[27] Thierry Dutoit,et al. On the use of a hybrid harmonic/stochastic model for TTS synthesis-by-concatenation , 1996, Speech Commun..

[28] John Kane,et al. COVAREP — A collaborative voice analysis repository for speech technologies , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29] J. Matoušek,et al. On the detection of pitch marks using a robust multi-phase algorithm , 2011, Speech Commun..

[30] A. G. Ramakrishnan,et al. Epoch Extraction Based on Integrated Linear Prediction Residual Using Plosion Index , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[31] Ronald A. Cole,et al. Pitch detection with a neural-net classifier , 1991, IEEE Trans. Signal Process..

[32] Jonas Beskow,et al. Wavesurfer - an open source speech tool , 2000, INTERSPEECH.

[33] Mike Brookes,et al. Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm , 2007, IEEE Transactions on Audio, Speech, and Language Processing.