Personalized Modeling of Real-World Vocalizations from Nonverbal Individuals

Nonverbal vocalizations contain important affective and communicative information, especially for those who do not use traditional speech, including individuals who have autism and are non- or minimally verbal (nv/mv). Although these vocalizations are often understood by those who know them well, they can be challenging to understand for the community-at-large. This work presents (1) a methodology for collecting spontaneous vocalizations from nv/mv individuals in natural environments, with no researcher present, and personalized in-the-moment labels from a family member; (2) speaker-dependent classification of these real-world sounds for three nv/mv individuals; and (3) an interactive application to translate the nonverbal vocalizations in real time. Using support-vector machine and random forest models, we achieved speaker-dependent unweighted average recalls (UARs) of 0.75, 0.53, and 0.79 for the three individuals, respectively, with each model discriminating between 5 nonverbal vocalization classes. We also present first results for real-time binary classification of positive- and negative-affect nonverbal vocalizations, trained using a commercial wearable microphone and tested in real time using a smartphone. This work informs personalized machine learning methods for non-traditional communicators and advances real-world interactive augmentative technology for an underserved population.

[1]  Erik Marchi,et al.  Emotion in the speech of children with autism spectrum conditions: prosody and everything else , 2012, WOCCI.

[2]  Björn W. Schuller,et al.  The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[3]  Sophie K. Scott,et al.  Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations , 2010, Proceedings of the National Academy of Sciences.

[4]  Andrey Anikin,et al.  Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations , 2017, Quarterly journal of experimental psychology.

[5]  Patricia Dowden,et al.  Development of the communication complexity scale. , 2012, American journal of speech-language pathology.

[6]  Jim Euchner,et al.  Spectrum Disorder , 2012 .

[7]  Carlos Busso,et al.  Supervised domain adaptation for emotion recognition from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Sofia Douzgou,et al.  Further Clinical Delineation of the MEF2C Haploinsufficiency Syndrome: Report on New Cases and Literature Review of Severe Neurodevelopmental Disorders Presenting with Seizures, Absent Speech, and Involuntary Movements , 2017, Journal of Pediatric Genetics.

[9]  Björn W. Schuller,et al.  Speech emotion recognition , 2018, Commun. ACM.

[10]  Jaya Narain,et al.  The ECHOS Platform to Enhance Communication for Nonverbal Children with Autism: A Case Study , 2020, CHI Extended Abstracts.

[11]  Mark L. Knapp,et al.  Essentials of nonverbal communication , 1980 .

[12]  S. Scott,et al.  Perceptual Cues in Nonverbal Vocal Expressions of Emotion , 2010 .

[13]  Andrey Anikin,et al.  Nonlinguistic vocalizations from online amateur videos for emotion research: A validated corpus , 2016, Behavior Research Methods.

[14]  P. Wolfe,et al.  Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking a ) , 2011 .

[15]  K. Michelsson,et al.  The identification of some specific meanings in infant vocalization , 1964, Experientia.

[16]  Connie Kasari,et al.  Minimally Verbal School‐Aged Children with Autism Spectrum Disorder: The Neglected End of the Spectrum , 2013, Autism research : official journal of the International Society for Autism Research.

[17]  Rhea Paul,et al.  Defining spoken language benchmarks and selecting measures of expressive language development for young children with autism spectrum disorders. , 2009, Journal of speech, language, and hearing research : JSLHR.

[18]  D. Beukelman,et al.  Augmentative & Alternative Communication: Supporting Children & Adults With Complex Communication Needs , 2006 .

[19]  Daniel Rudoy,et al.  KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking , 2011, The Journal of the Acoustical Society of America.

[20]  Ting Dang,et al.  Speech Based Emotion Prediction: Can a Linear Model Work? , 2019, INTERSPEECH.

[21]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[22]  Henning Reetz,et al.  Comparison of Supervised-learning Models for Infant Cry Classification / Vergleich von Klassifikationsmodellen zur Säuglingsschreianalyse , 2015 .

[23]  C. Lord,et al.  Assessing the Minimally Verbal School‐Aged Child With Autism Spectrum Disorder , 2013, Autism research : official journal of the International Society for Autism Research.

[24]  Pattie Maes,et al.  Nonverbal Vocalizations as Speech: Characterizing Natural-Environment Audio from Nonverbal Individuals with Autism , 2020 .

[25]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[26]  Eric Courchesne,et al.  Naturalistic language sampling to characterize the language abilities of 3-year-olds with autism spectrum disorder , 2019, Autism : the international journal of research and practice.

[27]  Björn Schuller,et al.  Continuous Emotion Recognition in Speech - Do We Need Recurrence? , 2019, INTERSPEECH.

[28]  D K Oller,et al.  Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development , 2010, Proceedings of the National Academy of Sciences.

[29]  Lichuan Liu,et al.  Infant cry language analysis and recognition: an experimental approach , 2019, IEEE/CAA Journal of Automatica Sinica.

[30]  Artur Dubrawski,et al.  Classification of Time Sequences using Graphs of Temporal Constraints , 2017, J. Mach. Learn. Res..

[31]  Sonja A. Kotz,et al.  Factors in the recognition of vocally expressed emotions: A comparison of four languages , 2009, J. Phonetics.