A SomAgent statistical machine translation

Abstract: The paper describes the process by which the word alignment task performed within SOMAgent works in collaboration with the statistical machine translation system in order to learn a phrase translation table. We studied improvements in the quality of translation using syntax augmented machine translation. We also experimented with different degrees of linguistic analysis from the lexical level to a syntactic or semantic level, in order to generate a more precise alignment. We developed a contextual environment using the Self-Organizing Map, which can model a semantic agent (SOMAgent) that learns the correct meaning of a word used in context in order to deal with specific phenomena such as ambiguity, and to generate more precise alignments that can improve the first choice of the statistical machine translation system giving linguistic knowledge.

[1]  Jorma Laaksonen,et al.  SOM_PAK: The Self-Organizing Map Program Package , 1996 .

[2]  T. Kohonen,et al.  Self-organizing semantic maps , 1989, Biological Cybernetics.

[3]  Nick Chater,et al.  Models of Language Acquisition: Inductive and Deductive Approaches , 2000 .

[4]  Peter Broeder,et al.  Models of Language Acquisition: Inductive and Deductive Approaches , 2001 .

[5]  Thomas R. Shultz,et al.  Connectionist Models of Development: Developmental Processes in Real and Artificial Neural Networks , 2003 .

[6]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[7]  Eugene Charniak,et al.  Statistical language learning , 1997 .

[8]  Gerhard Weiss,et al.  Multiagent Systems , 1999 .

[9]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[10]  D. Vila Combining statistical and finite-state methods for machine translation , 2005 .

[11]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[12]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[13]  J. Scharf [Language evolution]. , 1973, Gegenbaurs morphologisches Jahrbuch.

[14]  I. Dan Melamed,et al.  Statistical Machine Translation by Parsing , 2004, ACL.

[15]  Jason Eisner,et al.  Learning Non-Isomorphic Tree Mappings for Machine Translation , 2003, ACL.

[16]  J. C. Scholtes Resolving Linguistic Ambiguities with a Neural Data-Oriented Parsing (DOP) System , 1992 .

[17]  Eiichiro Sumita,et al.  The NiCT-ATR statistical machine translation system for IWSLT 2006 , 2006, IWSLT.

[18]  Kevin Knight,et al.  A Decoder for Syntax-based Statistical MT , 2002, ACL.

[19]  Lluís Màrquez i Villodre,et al.  The LDV-COMBO system for SMT , 2006, WMT@HLT-NAACL.

[20]  Tao Xiong,et al.  A combined SVM and LDA approach for classification , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[21]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[22]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[23]  Hans-Jürgen Eikmeyer,et al.  The Production of Finnish Nouns: A Psycholinguistically Motivated Connectionist Model , 1997, Connect. Sci..

[24]  Philipp Koehn,et al.  Statistical Significance Tests for Machine Translation Evaluation , 2004, EMNLP.

[25]  Timo Honkela,et al.  Simulating Language Learning in Community of Agents Using Self-Organizing Maps , 2003 .

[26]  Javier Bajo,et al.  A SomAgent statistical machine translation , 2011, Appl. Soft Comput..

[27]  Graeme Hirst,et al.  Semantic Interpretation and Ambiguity , 1988, Artif. Intell..

[28]  Philipp Slusallek,et al.  Introduction to real-time ray tracing , 2005, SIGGRAPH Courses.

[29]  Risto Miikkulainen,et al.  Lexical Disambiguation Based on Distributed Representations of Context Frequency , 1994, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[30]  João Balsa,et al.  A Distributed Approach for a Robust and Evolving NLP System , 2000, Natural Language Processing.

[31]  J. Elman,et al.  Rethinking Innateness: A Connectionist Perspective on Development , 1996 .

[32]  Vera Lúcia Strube de Lima,et al.  Distributing linguistic knowledge in a multi-agent natural language processing system: re-modelling the dictionary , 1998 .

[33]  Giovanni Da San Martino Self-Organizing Maps in Natural Language Processing , 2003 .

[34]  Stephan Vogel,et al.  An Efficient Two-Pass Approach to Synchronous-CFG Driven Statistical MT , 2007, NAACL.

[35]  Geoffrey K. Pullum,et al.  Generalized Phrase Structure Grammar , 1985 .

[36]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[37]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[38]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[39]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[40]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[41]  M. Ishikawa,et al.  Broeder & Murre, eds.: Models of language acquisition: Inductive and deductive approaches , 2003 .

[42]  José B. Mariño,et al.  N-gram-based Machine Translation , 2006, CL.

[43]  Ping Li,et al.  Early lexical development in a self-organizing neural network , 2004, Neural Networks.

[44]  Andreas Zollmann,et al.  Syntax Augmented Machine Translation via Chart Parsing , 2006, WMT@HLT-NAACL.

[45]  Alfonso Pitarque,et al.  Las redes neuronales como herramientas estadísticas no paramétricas de clasificación , 2000 .

[46]  Daniel Marcu,et al.  A Phrase-Based,Joint Probability Model for Statistical Machine Translation , 2002, EMNLP.

[47]  Ping Li Language Acquisition in a Self-Organising Neural Network Model , 2004 .

[48]  Helge Ritter,et al.  Learning ″Semantotopic Maps″ from Context , 1990 .

[49]  Lluís Màrquez i Villodre,et al.  SVMTool: A general POS Tagger Generator Based on Support Vector Machines , 2004, LREC.

[50]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[51]  Jean Carletta,et al.  Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization , 2005, ACL 2005.

[52]  María N. Moreno García,et al.  A SOMAgent for Identification of Semantic Classes and Word Disambiguation , 2009, PAAMS.

[53]  Francisco Casacuberta,et al.  Architectures for Speech-to-Speech Translation Using Finite-state Models , 2002, Speech-to-Speech Translation@ACL.

[54]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[55]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[56]  Vivian Félix López Batista Desambiguación semántica basada en métodos conexionistas , 1996 .

[57]  Ira Rudowsky,et al.  Intelligent Agents , 2004, Commun. Assoc. Inf. Syst..

[58]  David Yarowsky,et al.  A two-level syntax-based approach to Arabic-English statistical machine translation , 2003, MTSUMMIT.

[59]  Jean-Cédric Chappelier,et al.  A Generalized CYK Algorithm for Parsing Stochastic CFG , 1998, TAPD.

[60]  J. Hobbs,et al.  Semantic Interpretation and Ambiguity , 1988 .

[61]  Timo Honkela Philosophical Aspects of Neural, Probabilistic and Fuzzy Modeling of Language Use and Translation , 2007, 2007 International Joint Conference on Neural Networks.

[62]  H. Ney,et al.  A novel string-to-string distance measure with applications to machine translation evaluation , 2003, MTSUMMIT.

[63]  Lluís Màrquez i Villodre,et al.  Combining Linguistic Data Views for Phrase-based SMT , 2005, ParallelText@ACL.

[64]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.