论文信息 - Personalized Modeling of Real-World Vocalizations from Nonverbal Individuals

Personalized Modeling of Real-World Vocalizations from Nonverbal Individuals

Nonverbal vocalizations contain important affective and communicative information, especially for those who do not use traditional speech, including individuals who have autism and are non- or minimally verbal (nv/mv). Although these vocalizations are often understood by those who know them well, they can be challenging to understand for the community-at-large. This work presents (1) a methodology for collecting spontaneous vocalizations from nv/mv individuals in natural environments, with no researcher present, and personalized in-the-moment labels from a family member; (2) speaker-dependent classification of these real-world sounds for three nv/mv individuals; and (3) an interactive application to translate the nonverbal vocalizations in real time. Using support-vector machine and random forest models, we achieved speaker-dependent unweighted average recalls (UARs) of 0.75, 0.53, and 0.79 for the three individuals, respectively, with each model discriminating between 5 nonverbal vocalization classes. We also present first results for real-time binary classification of positive- and negative-affect nonverbal vocalizations, trained using a commercial wearable microphone and tested in real time using a smartphone. This work informs personalized machine learning methods for non-traditional communicators and advances real-world interactive augmentative technology for an underserved population.

[1] Erik Marchi,et al. Emotion in the speech of children with autism spectrum conditions: prosody and everything else , 2012, WOCCI.

[2] Björn W. Schuller,et al. The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing , 2016, IEEE Transactions on Affective Computing.

[3] Sophie K. Scott,et al. Cross-cultural recognition of basic emotions through nonverbal emotional vocalizations , 2010, Proceedings of the National Academy of Sciences.

[4] Andrey Anikin,et al. Perceptual and acoustic differences between authentic and acted nonverbal emotional vocalizations , 2017, Quarterly journal of experimental psychology.

[5] Patricia Dowden,et al. Development of the communication complexity scale. , 2012, American journal of speech-language pathology.

[6] Jim Euchner,et al. Spectrum Disorder , 2012 .

[7] Carlos Busso,et al. Supervised domain adaptation for emotion recognition from speech , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Sofia Douzgou,et al. Further Clinical Delineation of the MEF2C Haploinsufficiency Syndrome: Report on New Cases and Literature Review of Severe Neurodevelopmental Disorders Presenting with Seizures, Absent Speech, and Involuntary Movements , 2017, Journal of Pediatric Genetics.

[9] Björn W. Schuller,et al. Speech emotion recognition , 2018, Commun. ACM.

[10] Jaya Narain,et al. The ECHOS Platform to Enhance Communication for Nonverbal Children with Autism: A Case Study , 2020, CHI Extended Abstracts.

[11] Mark L. Knapp,et al. Essentials of nonverbal communication , 1980 .

[12] S. Scott,et al. Perceptual Cues in Nonverbal Vocal Expressions of Emotion , 2010 .

[13] Andrey Anikin,et al. Nonlinguistic vocalizations from online amateur videos for emotion research: A validated corpus , 2016, Behavior Research Methods.

[14] P. Wolfe,et al. Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking a ) , 2011 .

[15] K. Michelsson,et al. The identification of some specific meanings in infant vocalization , 1964, Experientia.

[16] Connie Kasari,et al. Minimally Verbal School‐Aged Children with Autism Spectrum Disorder: The Neglected End of the Spectrum , 2013, Autism research : official journal of the International Society for Autism Research.

[17] Rhea Paul,et al. Defining spoken language benchmarks and selecting measures of expressive language development for young children with autism spectrum disorders. , 2009, Journal of speech, language, and hearing research : JSLHR.

[18] D. Beukelman,et al. Augmentative & Alternative Communication: Supporting Children & Adults With Complex Communication Needs , 2006 .

[19] Daniel Rudoy,et al. KARMA: Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking , 2011, The Journal of the Acoustical Society of America.

[20] Ting Dang,et al. Speech Based Emotion Prediction: Can a Linear Model Work? , 2019, INTERSPEECH.

[21] Fernando Nogueira,et al. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[22] Henning Reetz,et al. Comparison of Supervised-learning Models for Infant Cry Classification / Vergleich von Klassifikationsmodellen zur Säuglingsschreianalyse , 2015 .

[23] C. Lord,et al. Assessing the Minimally Verbal School‐Aged Child With Autism Spectrum Disorder , 2013, Autism research : official journal of the International Society for Autism Research.

[24] Pattie Maes,et al. Nonverbal Vocalizations as Speech: Characterizing Natural-Environment Audio from Nonverbal Individuals with Autism , 2020 .

[25] Björn W. Schuller,et al. Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[26] Eric Courchesne,et al. Naturalistic language sampling to characterize the language abilities of 3-year-olds with autism spectrum disorder , 2019, Autism : the international journal of research and practice.

[27] Björn Schuller,et al. Continuous Emotion Recognition in Speech - Do We Need Recurrence? , 2019, INTERSPEECH.

[28] D K Oller,et al. Automated vocal analysis of naturalistic recordings from children with autism, language delay, and typical development , 2010, Proceedings of the National Academy of Sciences.

[29] Lichuan Liu,et al. Infant cry language analysis and recognition: an experimental approach , 2019, IEEE/CAA Journal of Automatica Sinica.

[30] Artur Dubrawski,et al. Classification of Time Sequences using Graphs of Temporal Constraints , 2017, J. Mach. Learn. Res..

[31] Sonja A. Kotz,et al. Factors in the recognition of vocally expressed emotions: A comparison of four languages , 2009, J. Phonetics.