AusTalk — The Australian speech database: Design framework, recording experience and localisation

Aiming to create a comprehensive Australian speech database, the “AusTalk” project was carefully designed by 30 speech scientists contributing their disciplinary expertise. Standardised three one-hour audio-visual sessions for each of 1000 speakers around Australia were recorded having diverse components suitable for different research areas. The design of this database provides a good framework for any speech data corpus collection. In this paper, we present the AusTalk design and recording protocol, as well as problems faced and lessons learned. Localisation of this protocol and the potential customisation based on other countries' specifications are discussed. Collecting such speech databases including accent groups is encouraged to boost speech research in areas such as linguistics, speech and speaker recognition, forensic voice comparison, auditory-visual speech processing and many more.

[1]  Sharynne McLeod,et al.  Prevalence of communication disorders compared with other learning needs in 14,500 primary and secondary school students. , 2007, International journal of language & communication disorders.

[2]  Dominique Estival,et al.  The Big Australian Speech Corpus (The Big ASC) , 2010 .

[3]  Milos Blagojevic,et al.  MCMAC: An Optimized Medium Access Control Protocol for Mobile Clusters in Wireless Sensor Networks , 2010, 2010 7th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks (SECON).

[4]  P. McConvell,et al.  State of Indigenous languages in Australia - 2001, Australia State of the Environment Second Technical Paper Series (Natural and Cultural Heritage), Department of the Environment and Heritage, Canberra , 2001 .

[5]  P. McConvell,et al.  State of Indigenous Languages in Australia , 2001 .

[6]  Koen Langendoen,et al.  Efficient broadcasting protocols for regular wireless sensor networks , 2003, 2003 International Conference on Parallel Processing, 2003. Proceedings..

[7]  Sanjay Jha,et al.  An adaptive mobility-aware MAC protocol for sensor networks (MS-MAC) , 2004, 2004 IEEE International Conference on Mobile Ad-hoc and Sensor Systems (IEEE Cat. No.04EX975).

[8]  Jin-Shyan Lee,et al.  Performance evaluation of IEEE 802.15.4 for low-rate wireless personal area networks , 2006, IEEE Transactions on Consumer Electronics.

[9]  S. Romaine,et al.  Overview of indigenous languages of Australia , 1991 .

[10]  Andreas Stolcke,et al.  Prosody-based automatic detection of annoyance and frustration in human-computer dialog , 2002, INTERSPEECH.

[11]  Arshad Jhumka,et al.  On the Design of Mobility-Tolerant TDMA-Based Media Access Control (MAC) Protocol for Mobile Sensor Networks , 2007, ICDCIT.

[12]  Myung J. Lee,et al.  Will IEEE 802.15.4 make ubiquitous networking a reality?: a discussion on a potential low power, low bit rate standard , 2004, IEEE Communications Magazine.

[13]  Laurence Devillers,et al.  Five emotion classes detection in real-world call center data : the use of various types of paralinguistic features , 2007 .

[14]  T. Wykes,et al.  Can the psychiatrist learn from the psycholinguist? Detecting coherence in the disordered speech of manics and schizophrenics , 1981, Psychological Medicine.

[15]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[16]  Kay Römer,et al.  Medium access control issues in sensor networks , 2006, CCRV.

[17]  Olaf Landsiedel,et al.  MobiSense: Power-efficient micro-mobility in wireless sensor networks , 2011, 2011 International Conference on Distributed Computing in Sensor Systems and Workshops (DCOSS).

[18]  Tim Polzehl,et al.  Anger recognition in speech using acoustic and linguistic cues , 2011, Speech Commun..

[19]  Jonathan Harrington,et al.  A national database of spoken language: concept, design, and implementation , 1990, ICSLP.

[20]  Max A. Little,et al.  Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection , 2007 .

[21]  Takaaki Kuratate,et al.  A blueprint for a comprehensive Australian English auditory-visual speech corpus , 2009 .

[22]  Muneeb Ali,et al.  MMAC: a mobility-adaptive, collision-free MAC protocol for wireless sensor networks , 2005, PCCC 2005. 24th IEEE International Performance, Computing, and Communications Conference, 2005..

[23]  A. Postma Detection of errors during speech production: a review of speech monitoring models , 2000, Cognition.

[24]  Tat Chee Wan,et al.  Performance evaluation of IEEE 802.15.4 wireless multi-hop networks: simulation and testbed approach , 2007, Int. J. Ad Hoc Ubiquitous Comput..

[25]  Waltenegus Dargie,et al.  A mobility-aware medium access control protocol for wireless sensor networks , 2010, 2010 IEEE Globecom Workshops.

[26]  Dominique Estival,et al.  Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box , 2011, INTERSPEECH.

[27]  Michael Wagner,et al.  From Joyous to Clinically Depressed: Mood Detection Using Spontaneous Speech , 2012, FLAIRS.

[28]  Alfred Mertins,et al.  Automatic speech recognition and speech variability: A review , 2007, Speech Commun..