Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box

The Big Australian Speech Corpus project incorporates the strategic goals of 30 Chief Investigators from various speech science areas. Speech from 1000 geographically and socially diverse speakers is being recorded using a uniform and automated protocol plus standardized hardware and software to produce a widely applicable and extensible database – AusTalk. Here we describe the project’s major components and organization; share the lessons learnt from difficulties and challenges; and present the results achieved so far. Index Terms: speech corpus, AV data, Australian English.