The SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion

We describe the design and findings of the SIGMORPHON 2020 shared task on multilingual grapheme-to-phoneme conversion. Participants were asked to submit systems which take in a sequence of graphemes in a given language as input, then output a sequence of phonemes representing the pronunciation of that grapheme sequence. Nine teams submitted a total of 23 systems, at best achieving a 18% relative reduction in word error rate (macro-averaged over languages), versus strong neural sequence-to-sequence baselines. To facilitate error analysis, we publicly release the complete outputs for all systems—a first for the SIGMORPHON workshop.

[1]  Richard Sproat Multilingual text analysis for text-to-speech synthesis , 1996, Nat. Lang. Eng..

[2]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[3]  Ryan Cotterell,et al.  Applying the Transformer to Character-level Transduction , 2020, EACL.

[4]  Steven Moran,et al.  The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles , 2017 .

[5]  Ngoc Thang Vu,et al.  Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection , 2020, SIGMORPHON.

[6]  Ryan Cotterell,et al.  Weird Inflects but OK: Making Sense of Morphological Generation Errors , 2019, Conference on Computational Natural Language Learning.

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  Hermann Ney,et al.  Structure learning in hidden conditional random fields for grapheme-to-phoneme conversion , 2013, INTERSPEECH.

[9]  Paul S. Anderson Korean Language Reform , 1948 .

[10]  Grzegorz Kondrak,et al.  String Transduction with Target Language Models and Insertion Handling , 2018, ArXiv.

[11]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[12]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Arya D. McCarthy,et al.  Harnessing Indirect Training Data for End-to-End Automatic Speech Translation: Tricks of the Trade , 2019, IWSLT.

[14]  Josef van Genabith,et al.  Massively Multilingual Neural Grapheme-to-Phoneme Conversion , 2017, ArXiv.

[15]  Isabelle Augenstein,et al.  From Phonology to Syntax: Unsupervised Linguistic Typology at Different Levels with Language Embeddings , 2018, NAACL-HLT.

[16]  Mike Schuster,et al.  Japanese and Korean voice search , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Josef Fruehwald,et al.  University of Pennsylvania Working Papers in Linguistics , 2016 .

[18]  Arya D. McCarthy,et al.  SkinAugment: Auto-Encoding Speaker Conversions for Automatic Speech Translation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[20]  Emily M. Bender Linguistically Naïve != Language Independent: Why NLP Needs Linguistic Typology , 2009 .

[21]  Paul Taylor,et al.  Hidden Markov models for grapheme to phoneme conversion , 2005, INTERSPEECH.

[22]  Omnia S. ElSaadany,et al.  Grapheme-to-Phoneme Conversion with a Multilingual Transformer Model , 2020, SIGMORPHON.

[23]  H. Rogers Writing Systems: A Linguistic Approach , 2004 .

[24]  Ryan Cotterell,et al.  CoNLL-SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection in 52 Languages , 2017, CoNLL.

[25]  Yoshua Bengio,et al.  Char2Wav: End-to-End Speech Synthesis , 2017, ICLR.

[26]  André F. T. Martins,et al.  One-Size-Fits-All Multilingual Models , 2020, SIGMORPHON.

[27]  André F. T. Martins,et al.  Sparse Sequence-to-Sequence Models , 2019, ACL.

[28]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Keikichi Hirose,et al.  Phonetisaurus: Exploring grapheme-to-phoneme conversion with joint n-gram models in the WFST framework , 2016, Nat. Lang. Eng..

[30]  Rolf Noyer,et al.  Vietnamese 'Morphology' and the Definition of Word , 1998 .

[31]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[32]  Arya D. McCarthy,et al.  Massively Multilingual Pronunciation Modeling with WikiPron , 2020, LREC.

[33]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Daan van Esch,et al.  Predicting Pronunciations with Syllabification and Stress with Recurrent Neural Networks , 2016, INTERSPEECH.

[35]  Katharina Kann,et al.  Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion , 2020, SIGMORPHON.

[36]  Siddharth Dalmia,et al.  Epitran: Precision G2P for Many Languages , 2018, LREC.

[37]  Simon Clematide,et al.  Imitation Learning for Neural Morphological String Transduction , 2018, EMNLP.

[38]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[39]  Ryan Cotterell,et al.  What Kind of Language Is Hard to Language-Model? , 2019, ACL.

[40]  Kenneth Ward Church,et al.  Morphology and rhyming: two powerful alternatives to letter-to-sound rules for speech synthesis , 1990, SSW.

[41]  Thomas Crump,et al.  Visible Speech: The Diverse Oneness of Writing Systems. , 1990 .

[42]  Grzegorz Kondrak,et al.  Low-Resource G2P and P2G Conversion with Synthetic Training Data , 2020, SIGMORPHON.

[43]  Géza Németh,et al.  Transformer based Grapheme-to-Phoneme Conversion , 2019, INTERSPEECH.

[44]  Kyle Gorman,et al.  Pynini: A Python library for weighted finite-state grammar compilation , 2016 .

[45]  Simon Clematide,et al.  CLUZH at SIGMORPHON 2020 Shared Task on Multilingual Grapheme-to-Phoneme Conversion , 2020, SIGMORPHON.

[46]  Mans Hulden,et al.  Data Augmentation for Transformer-based G2P , 2020, SIGMORPHON.

[47]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[48]  Kyle Gorman,et al.  Improving homograph disambiguation with supervised machine learning , 2018, LREC.

[49]  Ryan Cotterell,et al.  UniMorph 3.0: Universal Morphology , 2018, LREC.

[50]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[51]  Brian Roark,et al.  Encoding linear models as weighted finite-state transducers , 2014, INTERSPEECH.

[52]  Fuchun Peng,et al.  Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  Vera Demberg,et al.  Phonological Constraints and Morphological Preprocessing for Grapheme-to-Phoneme Conversion , 2007, ACL.

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Muhammad Abdul-Mageed,et al.  One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble , 2020, SIGMORPHON.