Idiomatic MWEs and Machine Translation. A Retrieval and Representation Model: the AraMWE Project

A preliminary implementation of AraMWE, a hybrid project that includes a statistical component and a CCG symbolic component to extract and treat MWEs and idioms in Arabic and English parallel texts is presented, together with a general sketch of the system, a thorough description of the statistical component and a proof of concept of the CCG component.

[1]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[2]  Josef van Genabith,et al.  Automatic Extraction of Arabic Multiword Expressions , 2010, MWE@COLING.

[3]  Khalid Al Khatib,et al.  Automatic extraction of Arabic multi-word terms , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.

[4]  Mark Steedman,et al.  CCGbank: User's Manual , 2005 .

[5]  Jason Baldridge,et al.  DotCCG and VisCCG: Wiki and Programming Paradigms for Improved Grammar Engineering with OpenCCG , 2007 .

[6]  Driss Aboutajdine,et al.  A Multi-Word Term Extraction Program for Arabic Language , 2008, LREC.

[7]  Y. Bar-Hillel A Quasi-Arithmetical Notation for Syntactic Description , 1953 .

[8]  Cristina Cacciari,et al.  Idioms: Processing, Structure, and Interpretation , 1993 .

[9]  Christiane Fellbaum,et al.  Introducing the Arabic WordNet project , 2006 .

[10]  Jason Baldridge,et al.  Multi-Modal Combinatory Categorial Grammar , 2003, EACL.

[11]  Mona Diab,et al.  Verb noun construction MWE token supervised classification , 2009 .

[12]  A. Pawley,et al.  Two puzzles for linguistic theory: nativelike selection and nativelike fluency , 2014 .

[13]  Mona T. Diab,et al.  Building an Arabic Multiword Expressions Repository , 2012, SPMRL@ACL 2012.

[14]  Afsaneh Fazly,et al.  Pulling their Weight: Exploiting Syntactic Forms for the Automatic Identification of Idiomatic Expressions in Context , 2007 .

[15]  Jason S. Chang,et al.  Collocational Translation Memory Extraction Based on Statistical and Linguistic Information , 2004, ROCLING.