Estimation of Glottal Closure Instants in Voiced Speech Using the DYPSA Algorithm

We present the Dynamic Programming Projected Phase-Slope Algorithm (DYPSA) for automatic estimation of glottal closure instants (GCIs) in voiced speech. Accurate estimation of GCIs is an important tool that can be applied to a wide range of speech processing tasks including speech analysis, synthesis and coding. DYPSA is automatic and operates using the speech signal alone without the need for an EGG signal. The algorithm employs the phase-slope function and a novel phase-slope projection technique for estimating GCI candidates from the speech signal. The most likely candidates are then selected using a dynamic programming technique to minimize a cost function that we define. We review and evaluate three existing methods of GCI estimation and compare the new DYPSA algorithm to them. Results are presented for the APLAWD and SAM databases for which 95.7% and 93.1% of GCIs are correctly identified

[1]  Bayya Yegnanarayana,et al.  Robustness of group-delay-based method for extraction of significant instants of excitation from speech signals , 1999, IEEE Trans. Speech Audio Process..

[2]  H. Strube Determination of the instant of glottal closure from the speech wave. , 1974, The Journal of the Acoustical Society of America.

[3]  Eduardo Lleida,et al.  A new method for epoch detection based on the Cohen's class of time frequency representations , 2001, IEEE Signal Processing Letters.

[4]  Douglas D. O'Shaughnessy,et al.  Automatic and reliable estimation of glottal closure instant and period , 1989, IEEE Trans. Acoust. Speech Signal Process..

[5]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[6]  Edward L. Riegelsberger,et al.  Glottal source estimation: Methods of applying the LF-model to inverse filtering , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Christophe d'Alessandro,et al.  Robust glottal closure detection using the wavelet transform , 1999, EUROSPEECH.

[8]  Mark Huckvale,et al.  The SPAR speech filing system , 1987, ECST.

[9]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[10]  Dafydd Gibbon,et al.  EUROM - a spoken language resource for the EU - the SAM projects , 1995, EUROSPEECH.

[11]  Raymond N. J. Veldhuis,et al.  Extraction of vocal-tract system characteristics from speech signals , 1998, IEEE Trans. Speech Audio Process..

[12]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[13]  Mike Brookes,et al.  A Quantitative Assessment of Group Delay Methods for Identifying Glottal Closures in Voiced Speech , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  David M. Howard,et al.  The development of a portable real-time display of voice source characteristics , 2000, Proceedings of the 26th Euromicro Conference. EUROMICRO 2000. Informatics: Inventing the Future.

[15]  Donald G. Childers,et al.  Variability in closed phase analysis of speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Douglas A. Reynolds,et al.  Measuring fine structure in speech: application to speaker identification , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  D G Hanson,et al.  Integrated analyzer and classifier of glottographic signals. , 1998, IEEE transactions on rehabilitation engineering : a publication of the IEEE Engineering in Medicine and Biology Society.

[18]  Yves Kamp,et al.  A Frobenius norm approach to glottal closure detection from the speech signal , 1994, IEEE Trans. Speech Audio Process..

[19]  R. Schwartz,et al.  The N-best algorithms: an efficient and exact procedure for finding the N most likely sentence hypotheses , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[20]  T. V. Ananthapadmanabha,et al.  Calculation of true glottal flow and its components , 1982, Speech Commun..

[21]  D. M. Brookes,et al.  SPEAKER CHARACTERISTICS FROM A GLOTTAL AIRFLOW MODEL USING ROBUST INVERSE FILTERING , 1994 .

[22]  D. Veeneman,et al.  Automatic glottal inverse filtering from speech and electroglottographic signals , 1985, IEEE Trans. Acoust. Speech Signal Process..

[23]  Bayya Yegnanarayana,et al.  A robust method for determining instants of major excitations in voiced speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  John G. McKenna Automatic glottal closed-phase location and analysis by Kalman filtering , 2001, SSW.

[25]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[26]  Mike Brookes,et al.  The DYPSA algorithm for estimation of glottal closure instants in voiced speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Frank K. Soong,et al.  An N-best candidates-based discriminative training for speech recognition applications , 1994, IEEE Trans. Speech Audio Process..

[28]  Bayya Yegnanarayana,et al.  Determination of instants of significant excitation in speech using group delay function , 1995, IEEE Trans. Speech Audio Process..

[29]  Mike Brookes,et al.  Modelling energy flow in the vocal tract with applications to glottal closure and opening detection , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[30]  Paavo Alku,et al.  Time-domain parameterization of the closing phase of glottal airflow waveform from voices over a large intensity range , 2002, IEEE Trans. Speech Audio Process..

[31]  David Talkin,et al.  A Robust Algorithm for Pitch Tracking ( RAPT ) , 2005 .

[32]  D. Childers,et al.  Two-channel speech analysis , 1986, IEEE Trans. Acoust. Speech Signal Process..

[33]  Patrick A. Naylor,et al.  Voice source parameters for speaker verification , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[34]  Evelyn Abberton,et al.  Laryngographic assessment of normal voice: A tutorial , 1989 .

[35]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .