AlloVera: A Multilingual Allophone Database

We introduce a new resource, AlloVera, which provides mappings from 218 allophones to phonemes for 14 languages. Phonemes are contrastive phonological units, and allophones are their various concrete realizations, which are predictable from phonological context. While phonemic representations are language specific, phonetic representations (stated in terms of (allo)phones) are much closer to a universal (language-independent) transcription. AlloVera allows the training of speech recognition models that output phonetic transcriptions in the International Phonetic Alphabet (IPA), regardless of the input language. We show that a “universal” allophone model, Allosaurus, built with AlloVera, outperforms “universal” phonemic models and language-specific models on a speech-transcription task. We explore the implications of this technology (and related technologies) for the documentation of endangered and minority languages. We further explore other applications for which AlloVera will be suitable as it grows, including phonological typology.

[1]  Ignatius Suharno A descriptive study of Javanese , 1982 .

[2]  P. Ladefoged A course in phonetics , 1975 .

[3]  Paul Deléglise,et al.  TED-LIUM: an Automatic Speech Recognition dedicated corpus , 2012, LREC.

[4]  Aline Villavicencio,et al.  Unwritten languages demand attention too! Word discovery with encoder-decoder models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[5]  Anna Kazantseva,et al.  Indigenous language technologies in Canada: Assessment, challenges, and successes , 2018, COLING.

[6]  Adam Lopez,et al.  Spoken Term Discovery for Language Documentation using Translations , 2017, SCNLP@EMNLP 2017.

[7]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[8]  Irena Yanushevskaya,et al.  ILLUSTRATIONS OF THE IPA Russian , 2015 .

[9]  K. Maekawa CORPUS OF SPONTANEOUS JAPANESE : ITS DESIGN AND EVALUATION , 2003 .

[10]  Bernard Bloch,et al.  The Syllabic Phonemes of English , 1941 .

[11]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[12]  Miguel Rodríguez Mondoñedo,et al.  Handbook of the International Phonetic Association. A Guide to the Use of the lnternational Phonetic Alphahet. Cambridge: University Press, 1999. 204 pp. , 1999 .

[13]  Jacqueline Vaissière,et al.  Proposals for a representation of sounds based on their main acoustico-perceptual properties , 2011 .

[14]  Solomon Teferra Abate,et al.  An Amharic speech corpus for large vocabulary continuous speech recognition , 2005, INTERSPEECH.

[15]  C. Cieri,et al.  Evaluating phonemic transcription of low-resource tonal languages for language documentation , 2018 .

[16]  周宜童 浅谈美国英语(American English) , 1999 .

[17]  Siddharth Dalmia,et al.  Epitran: Precision G2P for Many Languages , 2018, LREC.

[18]  Chin-Hui Lee,et al.  Toward a detector-based universal phone recognizer , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  A. Simpson,et al.  Chinese , 2021, Encyclopedic Dictionary of Archaeology.

[20]  Mark Hasegawa-Johnson,et al.  Bayesian Models for Unit Discovery on a Very Low Resource Language , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  R. Hayward Amharic , 1992, Journal of the International Phonetic Association.

[22]  Janet Wiles,et al.  Elpis, an Accessible Speech-to-Text Tool , 2019, INTERSPEECH.

[23]  Arthur S. Abramson,et al.  Voice Onset Time (VOT) at 50: Theoretical and practical issues in measuring voicing distinctions , 2017, J. Phonetics.

[24]  Ludger Hoffmann Grundzüge der Phonologie , 2010 .

[25]  Nick Thieberger,et al.  Documentary Linguistics: Methodological Challenges and Innovatory Responses. , 2016 .

[26]  Emily M. Bender,et al.  STREAMLInED Challenges: Aligning Research Interests with Shared Tasks , 2017 .

[27]  Dong Wang,et al.  THCHS-30 : A Free Chinese Speech Corpus , 2015, ArXiv.

[28]  Pascale Fung,et al.  HKUST/MTS: A Very Large Scale Mandarin Telephone Speech Corpus , 2006, ISCSLP.

[29]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  Dudenredaktion Duden, die Grammatik : unentbehrlich für richtiges Deutsch , 2005 .

[31]  A. Ross Structural Linguistics , 1953, Nature.

[32]  Michael Kenstowicz,et al.  Phonology In Generative Grammar , 1994 .

[33]  许 曦明,et al.  语音学与音系学导论 = An introduction to phonetics and phonology , 2011 .

[34]  Alexander Gutkin,et al.  Cross-Lingual Consistency of Phonological Features: An Empirical Study , 2019, INTERSPEECH.