Towards Large-Scale Data Annotation of Audio from Wearables: Validating Zooniverse Annotations of Infant Vocalization Types

Recent developments allow the collection of audio data from lightweight wearable devices, potentially enabling us to study language use from everyday life samples. However, extracting useful information from these data is currently impossible with automatized routines, and overly expensive with trained human annotators. We explore a strategy fit to the 21st century, relying on the collaboration of citizen scientists. A large dataset of infant speech was uploaded on a citizen science platform. The same data were annotated in the laboratory by highly trained annotators. We investigate whether crowd-sourced annotations are qualitatively and quantitatively com-parable to those produced by expert annotators in a dataset of children at high- and low-risk for language disorders. Our results reveal that classification of individual vocalizations on Zooniverse was overall moderately accurate compared to the laboratory gold standard. The analysis of descriptors defined at the level of individual children found strong correlations between descriptors derived from Zooniverse versus laboratory annotations.

[1]  J. Silvertown A new dawn for citizen science. , 2009, Trends in ecology & evolution.

[2]  Y. Bhattacharjee Citizen Scientists Supplement Work of Cornell Researchers , 2005, Science.

[3]  Elmar Nöth,et al.  The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects, Continuous Sleepiness, Baby Sounds & Orca Activity , 2019, INTERSPEECH.

[4]  Umit Yapanel,et al.  Reliability of the LENA Language Environment Analysis System in Young Children’s Natural Home Environment , 2009 .

[5]  Anita Greenhill,et al.  Defining and Measuring Success in Online Citizen Science: A Case Study of Zooniverse Projects , 2015, Computing in Science & Engineering.

[6]  Alejandrina Cristia,et al.  Towards Detection of Canonical Babbling by Citizen Scientists: Performance as a Function of Clip Length , 2019, INTERSPEECH.

[7]  Blaine A. Price,et al.  Wearables: has the age of smartwatches finally arrived? , 2015, Commun. ACM.

[8]  Anita Greenhill,et al.  Playing with science:gamised aspects of gamification found on the Online Citizen Science Project - Zooniverse , 2014 .

[9]  John H. L. Hansen,et al.  Signal processing for young child speech language development , 2008, WOCCI.

[10]  Elika Bergelson,et al.  What Do North American Babies Hear? A large-scale cross-corpus analysis. , 2018, Developmental science.

[11]  D. Oller,et al.  Late onset canonical babbling: a possible early marker of abnormal development. , 1998, American journal of mental retardation : AJMR.

[12]  Björn W. Schuller,et al.  VCMNet: Weakly Supervised Learning for Automatic Infant Vocalisation Maturity Analysis , 2019, ICMI.

[13]  Adam J. Berinsky,et al.  Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk , 2012, Political Analysis.

[14]  Steven V. Rouse,et al.  A reliability analysis of Mechanical Turk data , 2015, Comput. Hum. Behav..

[15]  Kenneth Ward Church,et al.  The Second DIHARD Diarization Challenge: Dataset, task, and baselines , 2019, INTERSPEECH.