SisHiTra : A Hybrid Machine Translation System from Spanish to Catalan

In the current European scenario, characterized by the coexistence of communities writing and speaking a great variety of languages, machine translation has become a technology of capital importance. In areas of Spain and of other countries, coofficiality of several languages implies producing several versions of public information. Machine translation between all the languages of the Iberian Peninsula and from them into English will allow for a better integration of Iberian linguistic communities among them and inside Europe. The purpose of this paper is to show a machine translation system from Spanish to Catalan that deals with text input. In our approach, both deductive (linguistic) and inductive (corpus-based) methodologies are combined in an homogeneous and efficient framework: finite-state transducers. Some preliminary results show the interest of the proposed architecture.

[1]  Mehryar Mohri,et al.  The Design Principles of a Weighted Finite-State Transducer Library , 2000, Theor. Comput. Sci..

[2]  Yves Schabes,et al.  Deterministic Part-of-Speech Tagging with Finite-State Transducers , 1995, Comput. Linguistics.

[3]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[4]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[5]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[6]  Sergio Ortiz Rojas,et al.  The Spanish<>Catalan machine translation system interNOSTRUM , 2001, MTSUMMIT.

[7]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[8]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[9]  Ferran Plà,et al.  Improving part-of-speech tagging using lexicalized HMMs , 2004, Natural Language Engineering.

[10]  Sergi Cervell,et al.  An environment for mophosyntactic processing of unrestricted Spanish text , 1998 .

[11]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[12]  Lauri Karttunen,et al.  Finite-state lexicon compiler , 1993 .

[13]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[14]  Emmanuel Roche,et al.  Finite state transducers: parsing free and frozen sentences , 1999 .

[15]  Francisco Casacuberta,et al.  A MORPHOLOGICAL ANALYSER FOR MACHINE TRANSLATION BASED ON FINITE-STATE TRANSDUCERS , 2001 .