Acoustic Measure of Vocal Strain Based on Glottal Airflow Periodicity

In the clinical practice of dysphonia, the effects of treatment are traditionally monitored by a sequence of auditory-perceptual assessments aimed at measuring vocal quality for the patient. Alternatively, acoustic measurement of vocal quality promises to automate perceptual assessments while keeping the assessments accurate and non-invasive. However, acoustic measures of vocal quality need to be further developed in both functional and technical terms. On the one hand, many of them are susceptible to non-dysphonic perturbations from articulatory movements in continuous speech, while on the other, their accuracy in approximating the generally nonlinear mapping from observation to vocal quality is limited by their use of a linear model. This paper presents an acoustic measure of vocal strain, a specific vocal quality that typically co-occurs with the development of vocal-fold nodules in vocal hyper-function. Vocal strain merits acoustic measurement more than other vocal qualities because its perceptual assessment typically exhibits a lower intra- and inter-rater reliability than the assessment of other vocal qualities. Based on an assumed correlation between vocal strain and the degree of periodicity in vocal-fold vibrations, this paper presents an acoustic measure in which a nonlinear regression model is used to predict the strain from some periodicity features extracted from a glottal airflow estimate. When tested on a set of listener-rated utterances composed mostly of continuous speech, the proposed glottal measure outperformed a direct-analysis measure in producing strain assessments which are consistent with perceptual ratings.

[1]  Ghulam Muhammad,et al.  Multidirectional regression (MDR)-based features for automatic voice disorder detection. , 2012, Journal of voice : official journal of the Voice Foundation.

[2]  Lisa M. Kopf,et al.  Objective Indices of Perceived Vocal Strain. , 2019, Journal of voice : official journal of the Voice Foundation.

[3]  T. Baer,et al.  Harmonics-to-noise ratio as an index of the degree of hoarseness. , 1982, The Journal of the Acoustical Society of America.

[4]  Patrick A. Naylor,et al.  Estimation of Glottal Closing and Opening Instants in Voiced Speech Using the YAGA Algorithm , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  P. Alku,et al.  Normalized amplitude quotient for parametrization of the glottal flow. , 2002, The Journal of the Acoustical Society of America.

[6]  P. Lieberman Some Acoustic Measures of the Fundamental Periodicity of Normal and Pathologic Larynges , 1963 .

[7]  Soren Y. Lowell,et al.  Spectral- and Cepstral-Based Acoustic Features of Dysphonic, Strained Voice Quality , 2012, The Annals of otology, rhinology, and laryngology.

[8]  A. Gray,et al.  Least squares glottal inverse filtering from the acoustic speech waveform , 1979 .

[9]  Jón Guðnason,et al.  Objective Severity Assessment from Disordered Voice Using Estimated Glottal Airflow , 2017, INTERSPEECH.

[10]  Nancy Pearl Solomon,et al.  Vocal fatigue and its relation to vocal hyperfunction , 2008, International journal of speech-language pathology.

[11]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[12]  P. H. Damsté,et al.  On vocal strain. , 1972, Folia phoniatrica.

[13]  Khalid Daoudi,et al.  An efficient solution to sparse linear prediction analysis of speech , 2013, EURASIP J. Audio Speech Music. Process..

[14]  Michal Novotný,et al.  High-Accuracy Voice-Based Classification Between Patients With Parkinson’s Disease and Other Neurological Diseases May Be an Easy Task With Inappropriate Experimental Design , 2017, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[15]  Karthikeyan Umapathy,et al.  Discrimination of pathological voices using a time-frequency approach , 2005, IEEE Transactions on Biomedical Engineering.

[16]  P. Van cauwenberge,et al.  Acoustic measurement of overall voice quality: a meta-analysis. , 2009, The Journal of the Acoustical Society of America.

[17]  Elizabeth U. Grillo,et al.  Evidence for distinguishing pressed, normal, resonant, and breathy voice qualities by laryngeal resistance and vocal efficiency in vocally trained subjects. , 2008, Journal of voice : official journal of the Voice Foundation.

[18]  Geoffrey S. Meltzner,et al.  Quantifying dysphonia severity using a spectral/cepstral-based acoustic index: Comparisons with auditory-perceptual judgements from the CAPE-V , 2010, Clinical linguistics & phonetics.

[19]  Jean-François Bonastre,et al.  Application of automatic speaker recognition techniques to pathological voice assessment (dysphonia) , 2005, INTERSPEECH.

[20]  P. Alku,et al.  Formant frequency estimation of high-pitched vowels using weighted linear prediction. , 2013, The Journal of the Acoustical Society of America.

[21]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[22]  I R Titze,et al.  Vocal intensity in speakers and singers. , 1991, The Journal of the Acoustical Society of America.

[23]  B. Walden,et al.  An evaluation of residue features as correlates of voice disorders. , 1987, Journal of communication disorders.

[24]  Soren Y Lowell,et al.  Aerodynamic and Acoustic Features of Vocal Effort , 2019 .

[25]  R. Hillman,et al.  Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. , 2009, American journal of speech-language pathology.

[26]  R. Shrivastav The use of an auditory model in predicting perceptual ratings of breathy voice quality. , 2003, Journal of voice : official journal of the Voice Foundation.

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  D G Childers,et al.  Vocal quality factors: analysis, synthesis, and perception. , 1991, The Journal of the Acoustical Society of America.

[29]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[30]  Richard I Zraick,et al.  Establishing validity of the Consensus Auditory-Perceptual Evaluation of Voice (CAPE-V). , 2011, American journal of speech-language pathology.

[31]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[32]  H. Kasuya,et al.  Normalized noise energy as an acoustic measure to evaluate pathologic voice. , 1986, The Journal of the Acoustical Society of America.

[33]  Paavo Alku,et al.  Glottal wave analysis with Pitch Synchronous Iterative Adaptive Inverse Filtering , 1991, Speech Commun..

[34]  平野 実 Clinical examination of voice , 1981 .

[35]  Jack J. Jiang,et al.  Perturbation and nonlinear dynamic analyses of voices from patients with unilateral laryngeal paralysis. , 2005, Journal of voice : official journal of the Voice Foundation.

[36]  G Molenberghs,et al.  The dysphonia severity index: an objective measure of vocal quality based on a multiparameter approach. , 2000, Journal of speech, language, and hearing research : JSLHR.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[39]  R Netsell,et al.  Laryngeal aerodynamics associated with selected voice disorders. , 1984, American journal of otolaryngology.

[40]  Thomas F. Quatieri,et al.  Evaluation of Glottal Inverse Filtering Algorithms Using a Physiologically Based Articulatory Speech Synthesizer , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[41]  Paavo Alku,et al.  OPENGLOT - An open environment for the evaluation of glottal inverse filtering , 2019, Speech Commun..