Testing Acoustic Voice Quality Classification Across Languages and Speech Styles

Many studies relate acoustic voice quality measures to perceptual classification. We extend this line of research by training a classifier on a balanced set of perceptually annotated voice quality categories with high inter-rater agreement, and test it on speech samples from a different language and on a different speech style. Annotations were done on continuous speech from different laboratory settings. In Experiment 1, we trained a random forest with Standard Chinese and German recordings labelled as modal, breathy, or glottalized. The model had an accuracy of 78.7% on unseen data from the same sample (most important variables were harmonics-to-noise ratio, cepstral-peak prominence, and H1-A2). This model was then used to classify data from a different language (Icelandic, Experiment 2) and to classify a different speech style (German infant-directed speech (IDS), Experiment 3). Cross-linguistic generalizability was high for Icelandic (78.6% accuracy), but lower for German IDS (71.7% accuracy). Accuracy of recordings of adult-directed speech from the same speakers as in Experiment 3 (77%, Experiment 4) suggests that it is the special speech style of IDS, rather than the recording setting that led to lower performance. Results are discussed in terms of efficiency of coding and generalizability across languages and speech styles.

[1]  Jody Kreiman,et al.  Acoustic properties of different kinds of creaky voice , 2015, ICPhS.

[2]  Jody Kreiman,et al.  DEFINING AND MEASURING VOICE QUALITY , 2004 .

[3]  Abeer Alwan,et al.  Age, sex, and vowel dependencies of acoustic measures related to the voice source. , 2007, The Journal of the Acoustical Society of America.

[4]  B. Barsties,et al.  The effect of visual feedback and training in auditory-perceptual judgment of voice quality , 2017, Logopedics, phoniatrics, vocology.

[5]  Yiya Chen,et al.  The prosodic marking of rhetorical questions in Standard Chinese , 2020, J. Phonetics.

[6]  Hideaki Kikuchi,et al.  Vowels in infant-directed speech: More breathy and more variable, but not clearer , 2017, Cognition.

[7]  Jianjing Kuang,et al.  Covariation between voice quality and pitch: Revisiting the case of Mandarin creaky voice. , 2017, The Journal of the Acoustical Society of America.

[8]  Felix Burkhardt Rule-based voice quality variation with formant synthesis , 2009, INTERSPEECH.

[9]  J. Hillenbrand,et al.  Acoustic correlates of breathy vocal quality. , 1994, Journal of speech and hearing research.

[10]  J. Laver The phonetic description of voice quality , 1980 .

[11]  Nicole Dehé,et al.  The prosody of rhetorical questions in English , 2020, English Language and Linguistics.

[12]  James Hillenbrand Some effects of intonation contour on sentence intelligibility , 2003 .

[13]  Ailbhe Ní Chasaide,et al.  The role of voice quality in communicating emotion, mood and attitude , 2003, Speech Commun..

[14]  Dirk Michaelis,et al.  Acoustic "breathiness measures" in the description of pathologic voices , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[15]  Ben Barsties V Latoszek,et al.  Validation of the Acoustic Voice Quality Index Version 03.01 and Acoustic Breathiness Index in German. , 2018, Journal of voice : official journal of the Voice Foundation.

[16]  Nicole Dehé,et al.  The Intonation of Information-Seeking and Rhetorical Questions in Icelandic , 2020, Journal of Germanic Linguistics.

[17]  D. Abercrombie,et al.  Elements of General Phonetics , 1967 .

[18]  P. Van cauwenberge,et al.  Acoustic measurement of overall voice quality: a meta-analysis. , 2009, The Journal of the Acoustical Society of America.

[19]  Oliver Niebuhr,et al.  “A little more ironic” Voice quality and segmental reduction differences between sarcastic and neutral utterances , 2014 .

[20]  Pedro Gómez-Vilda,et al.  The effectiveness of the glottal to noise excitation ratio for the screening of voice disorders. , 2010, Journal of voice : official journal of the Voice Foundation.

[21]  A G Askenfelt,et al.  Speech waveform perturbation analysis: a perceptual-acoustical comparison of seven measures. , 1986, Journal of speech and hearing research.

[22]  T. Jayakumar,et al.  Effect of Age and Gender on Acoustic Voice Quality Index Across Lifespan: A Cross-sectional Study in Indian Population. , 2020, Journal of voice : official journal of the Voice Foundation.

[23]  R. Starr Sweet voice: The role of voice quality in a Japanese feminine style , 2015, Language in Society.

[24]  Titia Benders,et al.  Mommy is only happy! Dutch mothers' realisation of speech sounds in infant-directed speech expresses emotion, not didactic intent. , 2013, Infant behavior & development.

[25]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[26]  Bettina Braun,et al.  The Processing of Prosodic Cues to Rhetorical Question Interpretation: Psycholinguistic and Neurolinguistics Evidence , 2019, INTERSPEECH.

[27]  M. Garellek Perception of glottalization and phrase-final creak. , 2015, The Journal of the Acoustical Society of America.

[28]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[29]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[30]  Patricia A. Keating,et al.  Voicesauce: A Program for Voice Analysis , 2009, ICPhS.

[31]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[32]  Ping Tang,et al.  The Role of Voice Quality in Mandarin Sarcastic Speech: An Acoustic and Electroglottographic Study. , 2020, Journal of speech, language, and hearing research : JSLHR.

[33]  Renee P Clapham,et al.  The Relationship Between Acoustic Signal Typing and Perceptual Evaluation of Tracheoesophageal Voice Quality for Sustained Vowels. , 2015, Journal of voice : official journal of the Voice Foundation.

[34]  D. Klatt,et al.  Analysis, synthesis, and perception of voice quality variations among female and male talkers. , 1990, The Journal of the Acoustical Society of America.

[35]  J. Perkell,et al.  Comparisons among aerodynamic, electroglottographic, and acoustic spectral measures of female voice. , 1995, Journal of speech and hearing research.

[36]  Denis Burnham,et al.  The origins of babytalk: smiling, teaching or social convergence? , 2017, Royal Society Open Science.

[37]  Nicole Dehé,et al.  The Prosody of Rhetorical and Information-Seeking Questions in German , 2018, Language and speech.

[38]  Christine Mooshammer,et al.  Acoustic and laryngographic measures of the laryngeal reflexes of linguistic prominence and vocal effort in German. , 2010, The Journal of the Acoustical Society of America.

[39]  Anja Arnhold,et al.  Complex prosodic focus marking in Finnish: Expanding the data landscape , 2016, J. Phonetics.

[40]  K. Árnason The Phonology of Icelandic and Faroese , 2011 .