Detecting Cross-lingual Semantic Similarity Using Parallel PropBanks

This paper suggests a method for detecting cross-lingual semantic similarity using parallel PropBanks. We begin by improving word alignments for verb predicates generated by GIZA++ by using information available in parallel PropBanks. We applied the Kuhn-Munkres method to measure predicateargument matching and improved verb predicate alignments by an F-score of 12.6%. Using the enhanced word alignments we checked the set of target verbs aligned to a specific source verb for semantic consistency. For a set of English verbs aligned to a Chinese verb, we checked if the English verbs belong to the same semantic class using an existing lexical database, WordNet. For a set of Chinese verbs aligned to an English verb we manually checked semantic similarity between the Chinese verbs within a set. Our results show that the verb sets we generated have a high correlation with semantic classes. This could potentially lead to an automatic technique for generating semantic classes for verbs.

[1]  Nianwen Xue,et al.  Adding semantic roles to the Chinese Treebank , 2009, Natural Language Engineering.

[2]  Pascale Fung,et al.  Semantic Roles for SMT: A Hybrid Two-Pass Model , 2009, NAACL.

[3]  Philip Resnik,et al.  Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation , 2004, CICLing.

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Pascale Fung,et al.  Learning bilingual semantic frames: shallow semantic parsing vs. semantic role projection , 2007, TMI.

[6]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[7]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[8]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[9]  Nianwen Xue,et al.  Using Parallel Propbanks to enhance Word-alignments , 2009, Linguistic Annotation Workshop.

[10]  David Mareček Using Tectogrammatical Alignment in Phrase-Based Machine Translation , 2009 .

[11]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[12]  David Marecek,et al.  Improving Word Alignment Using Alignment of Deep Structures , 2009, TSD.

[13]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[14]  Neville Ryant,et al.  Extending VerbNet with Novel Verb Classes , 2006, LREC.

[15]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[16]  Archna Bhatia,et al.  PropBank Annotation of Multilingual Light Verb Constructions , 2010, Linguistic Annotation Workshop.

[17]  Nitin Madnani,et al.  Are Multiple Reference Translations Necessary? Investigating the Value of Paraphrased Reference Translations in Parameter Optimization , 2008, AMTA.

[18]  Brooke Cowan,et al.  A tree-to-tree model for statistical machine translation , 2008 .

[19]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[20]  Marina del Rey,et al.  Synchronous Tree Adjoining Machine Translation Steve DeNeefe and Kevin Knight USC Information Sciences Institute 4676 Admiralty Way , Suite 1001 , 2009 .