论文信息 - The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.

[1] Richard Sproat. Multilingual text analysis for text-to-speech synthesis , 1996, Nat. Lang. Eng..

[2] Thomas Eckart,et al. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[3] Ryan Cotterell,et al. Applying the Transformer to Character-level Transduction , 2020, EACL.

[4] Steven Moran,et al. The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles , 2017 .

[5] Ngoc Thang Vu,et al. Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection , 2020, SIGMORPHON.

[6] Ryan Cotterell,et al. Weird Inflects but OK: Making Sense of Morphological Generation Errors , 2019, Conference on Computational Natural Language Learning.

[7] Michael J. Fischer,et al. The String-to-String Correction Problem , 1974, JACM.

[8] Hermann Ney,et al. Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion , 2013, INTERSPEECH.

[9] Paul S. Anderson. Korean Language Reform , 1948 .

[10] Grzegorz Kondrak,et al. String Transduction with Target Language Models and Insertion Handling , 2018, ArXiv.

[11] David Yarowsky,et al. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12] Tara N. Sainath,et al. State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Arya D. McCarthy,et al. Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade , 2019, IWSLT.

[14] Josef van Genabith,et al. Massively Multilingual Neural Grapheme-to-Phoneme Conversion , 2017, ArXiv.

[15] Isabelle Augenstein,et al. From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings , 2018, NAACL-HLT.

[16] Mike Schuster,et al. Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17] Josef Fruehwald,et al. University of Pennsylvania Working Papers in Linguistics , 2016 .

[18] Arya D. McCarthy,et al. SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19] Hermann Ney,et al. Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[20] Emily M. Bender. Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology , 2009 .

[21] Paul Taylor,et al. Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[22] Omnia S. ElSaadany,et al. Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model , 2020, SIGMORPHON.

[23] H. Rogers. Writing Systems: A Linguistic Approach , 2004 .

[24] Ryan Cotterell,et al. CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages , 2017, CoNLL.

[25] Yoshua Bengio,et al. Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.

[26] André F. T. Martins,et al. One-Size-Fits-All Multilingual Models , 2020, SIGMORPHON.

[27] André F. T. Martins,et al. Sparse Sequence-to-Sequence Models , 2019, ACL.

[28] Quoc V. Le,et al. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).