An RNN Model of Text Normalization

We present a recurrent neural net (RNN) model of text normalization — defined as the mapping of written text to its spoken form, and a description of the open-source dataset that we used in our experiments. We show that while the RNN model achieves very high overall accuracies, there remain errors that would be unacceptable in a speech application like TTS.We then show that a simple FST-based filter can help mitigate those errors. Even with that mitigation challenges remain, and we end the paper outlining some possible solutions. In releasing our data we are thereby inviting others to help solve this problem.

[1]  Richard Sproat Lightly supervised learning of text normalization: Russian number names , 2010, 2010 IEEE Spoken Language Technology Workshop.

[2]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[3]  Satoshi Nakamura,et al.  Incorporating Discrete Translation Lexicons into Neural Machine Translation , 2016, EMNLP.

[4]  Animesh Mukherjee,et al.  Investigation and modeling of the structure of texting language , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[5]  Arul Menezes,et al.  Social Text Normalization using Contextual Graph Random Walks , 2013, ACL.

[6]  Max Kaufmann Syntactic Normalization of Twitter Messages , 2010 .

[7]  Ming Zhou,et al.  Joint Inference of Named Entity Recognition and Normalization for Tweets , 2012, ACL.

[8]  Bradford W. Mott,et al.  NCSU_SAS_WOOKHEE: A Deep Contextual Long-Short Term Memory Model for Text Normalization , 2015, NUT@IJCNLP.

[9]  Richard Sproat Multilingual text analysis for text-to-speech synthesis , 1996, Nat. Lang. Eng..

[10]  Yi Yang,et al.  A Log-Linear Model for Unsupervised Text Normalization , 2013, EMNLP.

[11]  Shankar Kumar,et al.  Normalization of non-standard words , 2001, Comput. Speech Lang..

[12]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[13]  Grzegorz Chrupala,et al.  Normalizing tweets with edit scripts and recurrent neural embeddings , 2014, ACL.

[14]  Navdeep Jaitly,et al.  RNN Approaches to Text Normalization: A Challenge , 2016, ArXiv.

[15]  Richard Sproat,et al.  Applications of maximum entropy rankers to problems in spoken language processing , 2014, INTERSPEECH.

[16]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Fei Liu,et al.  Insertion, Deletion, or Substitution? Normalizing Text Messages without Pre-categorization nor Supervision , 2011, ACL.

[19]  Brian Roark,et al.  Hippocratic Abbreviation Expansion , 2014, ACL.

[20]  Kam-Fai Wong,et al.  A Phonetic-Based Approach to Chinese Chat Text Normalization , 2006, ACL.

[21]  Yang Liu,et al.  A Character-Level Machine Translation Approach for Normalization of SMS Abbreviations , 2011, IJCNLP.

[22]  Cédrick Fairon,et al.  A Hybrid Rule/Model-Based Finite-State Framework for Normalizing SMS Messages , 2010, ACL.

[23]  Richard Sproat,et al.  The Kestrel TTS text normalization system , 2014, Natural Language Engineering.

[24]  Ryan Cotterell,et al.  Weighting Finite-State Transductions With Neural Context , 2016, NAACL.

[25]  AiTi Aw,et al.  Personalized Normalization for a Multilingual Chat System , 2012, ACL.

[26]  François Yvon,et al.  Normalizing SMS: are Two Metaphors Better than One ? , 2008, COLING.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Fei Liu,et al.  A Broad-Coverage Normalization System for Social Media Language , 2012, ACL.

[29]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).