Automatic Evaluation of Voice Quality Using Text-Based Laryngograph Measurements and Prosodic Analysis

Due to low intra- and interrater reliability, perceptual voice evaluation should be supported by objective, automatic methods. In this study, text-based, computer-aided prosodic analysis and measurements of connected speech were combined in order to model perceptual evaluation of the German Roughness-Breathiness-Hoarseness (RBH) scheme. 58 connected speech samples (43 women and 15 men; 48.7 ± 17.8 years) containing the German version of the text “The North Wind and the Sun” were evaluated perceptually by 19 speech and voice therapy students according to the RBH scale. For the human-machine correlation, Support Vector Regression with measurements of the vocal fold cycle irregularities (CFx) and the closed phases of vocal fold vibration (CQx) of the Laryngograph and 33 features from a prosodic analysis module were used to model the listeners' ratings. The best human-machine results for roughness were obtained from a combination of six prosodic features and CFx (r = 0.71, ρ = 0.57). These correlations were approximately the same as the interrater agreement among human raters (r = 0.65, ρ = 0.61). CQx was one of the substantial features of the hoarseness model. For hoarseness and breathiness, the human-machine agreement was substantially lower. Nevertheless, the automatic analysis method can serve as the basis for a meaningful objective support for perceptual analysis.

[1]  A. Giovanni,et al.  Influence de l’attaque sur l’analyse perceptive des dysphonies , 2002, Folia Phoniatrica et Logopaedica.

[2]  Santiago Omar Caballero Morales Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech , 2013, Comput. Math. Methods Medicine.

[3]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[4]  M. Ptok,et al.  Objektiv gemessene Stimmlippenschwingungsirregularität vs. subjektiver Eindruck der Rauigkeit , 2006, HNO.

[5]  J Kreiman,et al.  The perceptual structure of pathologic voice quality. , 1996, The Journal of the Acoustical Society of America.

[6]  W. J. Barry,et al.  Methodische Aspekte der auditiven Beurteilung von Stimmqualität , 2004 .

[7]  平野 実 Clinical examination of voice , 1981 .

[8]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[9]  Elmar Nöth,et al.  Automatic evaluation of prosodic features of tracheoesophageal substitute voice , 2007, European Archives of Oto-Rhino-Laryngology.

[10]  D. Jamieson,et al.  Acoustic discrimination of pathological voice: sustained vowels versus continuous speech. , 2001, Journal of speech, language, and hearing research : JSLHR.

[11]  Elmar Nöth,et al.  Application of Automatic Speech Recognition to Quantitative Assessment of Tracheoesophageal Speech with Different Signal Quality , 2008, Folia Phoniatrica et Logopaedica.

[12]  Irene Velsvik Bele Reliability in perceptual analysis of voice quality. , 2005, Journal of voice : official journal of the Voice Foundation.

[13]  S. Caballero-Morales Estimation of Phoneme-Specific HMM Topologies for the Automatic Recognition of Dysarthric Speech , 2013, Computational and Mathematical Methods in Medicine.

[14]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[15]  Antoine Giovanni,et al.  Comparison of Different Voice Samples for Perceptual Analysis , 1999, Folia Phoniatrica et Logopaedica.

[16]  Jacqueline Vaissière,et al.  Objective acoustic and aerodynamic measures of breathiness in paralytic dysphonia , 2003, European Archives of Oto-Rhino-Laryngology.

[17]  P. Van cauwenberge,et al.  Acoustic measurement of overall voice quality: a meta-analysis. , 2009, The Journal of the Acoustical Society of America.

[18]  Jacques Koreman,et al.  Correlates of Varying Vocal Fold Adduction Deficiencies in Perception and Production: Methodological and Practical Considerations , 2004, Folia Phoniatrica et Logopaedica.

[19]  A Fourcin,et al.  Laryngograph: speech pattern element tools for therapy, training and assessment. , 1995, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[20]  Adrian Fourcin Aspects of Voice Irregularity Measurement in Connected Speech , 2009, Folia Phoniatrica et Logopaedica.

[21]  M. Ptok,et al.  Zur auditiven Bewertung der Stimmqualität , 2006, HNO.

[22]  Ian Witten,et al.  Data Mining , 2000 .

[23]  J. Hillenbrand Perception of aperiodicities in synthetically generated voices. , 1988, The Journal of the Acoustical Society of America.

[24]  G. Fairbanks Voice and articulation drillbook , 1960 .

[25]  I. Deary,et al.  The reliability and sensitivity to change of acoustic measures of voice quality. , 2004, Clinical otolaryngology and allied sciences.

[26]  N. Roy,et al.  Sustained vowels and continuous speech in the auditory-perceptual evaluation of dysphonia severity. , 2012, Jornal da Sociedade Brasileira de Fonoaudiologia.

[27]  J. Laver The phonetic description of voice quality , 1980 .

[28]  Elmar Nöth,et al.  The Prosody Module , 2006, SmartKom.

[29]  A. Aronson Clinical Voice Disorders , 2009 .

[30]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[31]  P. Dejonckere,et al.  A basic protocol for functional assessment of voice pathology, especially for investigating the efficacy of (phonosurgical) treatments and evaluating new assessment techniques , 2001, European Archives of Oto-Rhino-Laryngology.

[32]  E Abberton,et al.  First applications of a new laryngograph. , 1971, Medical & biological illustration.

[33]  Elmar Nöth,et al.  How to find trouble in communication , 2003, Speech Commun..

[34]  Elmar Nöth,et al.  Vowel- and text-based cepstral analysis of chronic hoarseness. , 2012, Journal of voice : official journal of the Voice Foundation.

[35]  Elmar Nöth,et al.  Automatic Rating of Hoarseness by Text-based Cepstral and Prosodic Evaluation , 2012, TSD.

[36]  G. de Krom,et al.  Some spectral correlates of pathological breathy and rough voice quality for different types of vowel fragments. , 1995, Journal of speech and hearing research.

[37]  Jennifer Oates,et al.  Auditory-Perceptual Evaluation of Disordered Voice Quality , 2009, Folia Phoniatrica et Logopaedica.

[38]  R. Hillman,et al.  Consensus auditory-perceptual evaluation of voice: development of a standardized clinical protocol. , 2009, American journal of speech-language pathology.