A Hybrid Approach to Extracting and Classifying Verb+Noun Constructions

We present the main findings and preliminary results of an ongoing project aimed at developing a system for collocation extraction based on contextual morpho-syntactic properties. We explored two hybrid extraction methods: the first method applies language-indepedent statistical techniques followed by a linguistic filtering, while the second approach, available only for German, is based on a set of lexico-syntactic patterns to extract collocation candidates. To define extraction and filtering patterns, we studied a specific collocation category, the Verb-Noun constructions, using a model inspired by the systemic functional grammar, proposing three level analysis: lexical, functional and semantic criteria. From tagged and lemmatized corpus, we identify some contextual morpho-syntactic properties helping to filter the output of the statistical methods and to extract some potential interesting VN constructions (complex predicates vs complex predicator). The extracted candidates are validated and classified manually.

[1]  Alexander M. Fraser,et al.  Squibs and Discussions: Measuring Word Alignment Quality for Statistical Machine Translation , 2007, CL.

[2]  Agnès Tutin Pour une modélisation dynamique des collocations dans les textes , 2004 .

[3]  Michael Halliday,et al.  An Introduction to Functional Grammar , 1985 .

[4]  Ulrich Heid,et al.  Extraction tools for collocations and their morphosyntactic specificities , 2006, LREC.

[5]  Ulrich Heid Towards a corpus-based dictionary of German noun-verb collocations , 1998 .

[6]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[7]  Geoffrey Williams Les collocations et l'école contextualiste britannique , 2003 .

[8]  Leo Wanner Lexical functions in lexicography and natural language processing , 1996 .

[9]  Fiammetta Namer FLEMM : Un analyseur flexionnel du français à base de règles , 2000 .

[10]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[11]  Dan Tufis Term Translations in Parallel Corpora: Discovery and Consistency Check , 2004, LREC.

[12]  Dan Tufis,et al.  Improved Lexical Alignment by Combining Multiple Reified Alignments , 2006, EACL.

[13]  Brigitte Krenn,et al.  The usual suspects: data-oriented models for identification und representation of lexical collocations , 1999 .

[14]  Helmut Schmid,et al.  Etiquetage morphologique de textes français avec un arbre de décisions , 1995 .

[15]  Hannah Kermes,et al.  Off-line (and on-line) text analysis for computational lexicography , 2003 .

[16]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[17]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[18]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[19]  SmadjaFrank Retrieving collocations from text , 1993 .

[20]  Afsaneh Fazly,et al.  Automatically Constructing a Lexicon of Verb Phrase Idiomatic Combinations , 2006, EACL.

[21]  Zdenek Salzmann Lexical functions in lexicography and natural language processing Ed. by Leo Wanner (review) , 1998 .