Classification of Voice Modality Using Electroglottogram Waveforms

It has been proven that the improper function of the vocal folds can result in perceptually distorted speech that is typically identified with various speech pathologies or even some neurological diseases. As a consequence, researchers have focused on finding quantitative voice characteristics to objectively assess and automatically detect non-modal voice types. The bulk of the research has focused on classifying the speech modality by using the features extracted from the speech signal. This paper proposes a different approach that focuses on analyzing the signal characteristics of the electroglottogram (EGG) waveform. The core idea is that modal and different kinds of non-modal voice types produce EGG signals that have distinct spectral/cepstral characteristics. As a consequence, they can be distinguished from each other by using standard cepstral-based features and a simple multivariate Gaussian mixture model. The practical usability of this approach has been verified in the task of classifying among modal, breathy, rough, pressed and soft voice types. We have achieved 83% frame-level accuracy and 91% utterance-level accuracy by training a speaker-dependent system.

[1]  G. P. Moore,et al.  Electroglottography and vocal fold physiology. , 1990, Journal of speech and hearing research.

[2]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[3]  M. Hirano,et al.  Clinical Examination of Voice , 1981 .

[4]  Antanas Verikas,et al.  Exploring sustained phonation recorded with acoustic and contact microphones to screen for laryngeal disorders , 2014, 2014 IEEE Symposium on Computational Intelligence in Healthcare and e-health (CICARE).

[5]  Patrick A. Naylor,et al.  The SIGMA Algorithm: A Glottal Activity Detector for Electroglottographic Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Philip de Chazal,et al.  Telephony-based voice pathology assessment using automated speech analysis , 2006, IEEE Transactions on Biomedical Engineering.

[7]  C. Painter Electroglottogram waveform types , 1988, Archives of oto-rhino-laryngology.

[8]  Mark Hasegawa-Johnson,et al.  Detecting Non-modal Phonation in Telephone Speech , 2008 .

[9]  Geoffrey S. Meltzner,et al.  Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V , 2010, Clinical linguistics & phonetics.

[10]  Hiroshi Ishiguro,et al.  A Method for Automatic Detection of Vocal Fry , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Samuel Kim,et al.  Detecting pathological speech using contour modeling of harmonic-to-noise ratio , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Peter Ladefoged,et al.  Phonation types: a cross-linguistic overview , 2001, J. Phonetics.

[13]  Evelyn Abberton,et al.  Laryngographic assessment of normal voice: A tutorial , 1989 .

[14]  R. Hillman,et al.  Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. , 2009, American journal of speech-language pathology.

[15]  Miguel Angel Ferrer-Ballester,et al.  Characterization of Healthy and Pathological Voice Through Measures Based on Nonlinear Dynamics , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Ghulam Muhammad,et al.  Vocal fold disorder detection based on continuous speech by using MFCC and GMM , 2013, 2013 7th IEEE GCC Conference and Exhibition (GCC).

[17]  Martin Rothenberg A multichannel electroglottograph , 1992 .