Workshop on Corpus Collection, (Semi)-Automated Analysis, and Modeling of Large-Scale Naturalistic Language Acquisition Data

The main goal of this full-day workshop is to bring together researchers from several distinct fields: behavioral psychologists studying language acquisition, speech technology researchers, linguists, and computational modelers of cognitive development. These groups are broadly interested in the same questions, i.e. what is the nature of speech and language, and how might a system learn to process it in supervised or unsupervised ways? Since the groups interested in these questions work on different analysis levels, cross-pollination has been sparse. Recent technological innovations have made collecting long naturalistic recordings of children’s home environment far simpler than in the past. However, the raw output of such recordings is not immediately usable for most analyses. Simultaneously, speech technology (ST) and machine learning tools have improved immensely over the past decade, making it feasible to use such tools with increasingly diverse and noise-laden data. Relatedly, cognitively viable computational models have made recent strides in explaining learning and development, but few such models can be applied to novel data-sets without encountering many hurdles about translatability across frameworks. This workshop brings together experts from all of these areas, and seeks to build bridges across them, with insight from other similar interdisciplinary efforts in other areas of cognitive science. Talks will discuss the match between the theory-driven questions researchers would like to ask, and the answers the current state of the art allows. The program committee is part of a newly formed group called DARCLE (Daylong Audio Recordings of Children’s Language Environment); with the help of an NSF grant, DARCLE has created a repository called HomeBank for raw data, metadata, and analysis/processing tools for longform recordings of child language. This workshop is an opportunity to network with related efforts in Europe, and for a talk and demo of a related effort, the NSF-funded Speech Recognition Virtual Kitchen.