Une approche hybride traduction/correction pour la normalisation des SMS

This paper presents a method of normalizing SMS messages that shares similarities with both spell checking and machine translation approaches. The normalization part of the system is entirely based on models trained from a corpus. Evaluated in French by ten-fold cross-validation, the system achieves a 9.3% Word Error Rate and a 0.83 BLEU score. Mots-cles : SMS, normalisation, machines a etats finis, approche hybride, oriente traduction, oriente correction, apprentissage sur corpus.

[1]  Cédrick Fairon,et al.  A translated corpus of 30,000 French SMS , 2006, LREC.

[2]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[3]  Animesh Mukherjee,et al.  Investigation and modeling of the structure of texting language , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[4]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[5]  Stanley F. Chen,et al.  An empirical study of smoothing techniques for language modeling , 1999 .

[6]  Richard Beaufort,et al.  SSLD: a French SMS to Standard Language Dictionary , 2010 .

[7]  References , 1971 .

[8]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[9]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[10]  Suzanne Stevenson,et al.  An Unsupervised Model for Text Message Normalization , 2009 .

[11]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[12]  Cédrick Fairon,et al.  Le langage SMS. Étude d'un corpus informatisé à partir de l’enquête «Faites don de vos sms à la science» , 2006 .

[13]  Jian Su,et al.  A Phrase-Based Statistical Model for SMS Text Normalization , 2006, ACL.

[14]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  François Yvon,et al.  Normalizing SMS: are Two Metaphors Better than One ? , 2008, COLING.

[17]  Shankar Kumar,et al.  Normalization of non-standard words , 2001, Comput. Speech Lang..