Edit transducers for spelling variation in Old Spanish

A system for the analysis of Old Spanish word forms using weighted finite-state transducers is presented. The system uses previously existing resources such as a modern lexicon, a phonological transcriber and a set of rules implementing the evolution of Spanish from the Middle Ages. The results obtained in all datasets show significant improvements, both in accuracy and in the trade-off between precision and recall, with respect to the baseline and the Levenshtein edit distance. A qualitative error analysis suggests several potential ways to improve the performance of the system.

[1]  Esslli Site,et al.  Natural Language Processing for Historical Texts , 2012 .

[2]  Johan Schalkwyk,et al.  OpenFst: A General and Efficient Weighted Finite-State Transducer Library , 2007, CIAA.

[3]  Anja Voeste,et al.  Variation and standardization in the history of Spanish spelling , 2012 .

[4]  José Ignacio Aguaded Gómez,et al.  La anotación de los corpus CREA y CORDE , 1999 .

[5]  P. M. Lloyd,et al.  From Latin to Spanish , 1987 .

[6]  Lauri Karttunen Directed Replacement , 1996, ACL.

[7]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[8]  Brian Roark,et al.  The OpenGrm open-source finite-state grammar software libraries , 2012, ACL.

[9]  Cyril Allauzen,et al.  3-Way Composition of Weighted Finite-State Transducers , 2008, CIAA.

[10]  Bryan Jurish,et al.  More than Words: Using Token Context to Improve Canonicalization of Historical German , 2010, J. Lang. Technol. Comput. Linguistics.

[11]  Mehryar Mohri,et al.  Weighted Automata Algorithms , 2009 .

[12]  Michael Piotrowski,et al.  Natural Language Processing for Historical Texts , 2012, Synthesis Lectures on Human Language Technologies.

[13]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[14]  Ralph Penny,et al.  A History of the Spanish Language: Lexis , 2002 .

[15]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[16]  Markus Forsberg,et al.  Something Old , Something New : A Computational Morphological Description of Old Swedish , 2008 .

[17]  Lauri Karttunen,et al.  The Replace Operator , 1995, ACL.

[18]  Stefanie Dipper,et al.  Applying Rule-Based Normalization to Different Types of Historical Texts - An Evaluation , 2011, LTC.

[19]  Bryan Jurish Efficient Online k-Best Lookup in Weighted Finite-State Cascades , 2010 .

[20]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[21]  Gemma Boleda,et al.  Extending the tool, or how to annotate historical language varieties , 2011, LaTeCH@ACL.

[22]  Mehryar Mohri,et al.  An efficient algorithm for the n-best-strings problem , 2002, INTERSPEECH.

[23]  Osvaldo Chiareno Diccionario de la Lengua Española, XVIIIª Edición , 1957 .