Towards a corpus-based dictionary of German noun-verb collocations

We 1 describe our attempts to automatically extract raw material for a dictionary of German noun-verb collo­ cations from large corpora of newspaper text. Such a dictionary should be about collocations and it should include a description of their linguistic properties, rather than listing the mere lexical cooccurrence. Since most statistical collocation finding tools do not provide other than lexical cooccurrence information, we first use symbolic extraction tools, based on a regular grammar over part-of-speech tagged and lemmatized text, and we use statistical filters thereafter. We first list the types of information which should be contained in a collocational dictionary for Natural Language Processing, then sketch our extraction methods and finally discuss and illustrate our initial results.

[1]  Brigitte Krenn,et al.  Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations , 1998 .

[2]  Robert F. Ilson,et al.  The BBI Combinatory Dictionary of English: A guide to word combinations , 1989 .

[3]  John T. Maxwell,et al.  Formal issues in lexical-functional grammar , 1998 .

[4]  Andrea Lehr,et al.  Linguistische Theorie und lexikographische Praxis : Symposiumsvorträge, Heidelberg 1996 , 1997 .

[5]  Franz Josef Hausmann,et al.  Semiotaxis und Wörterbuch , 1997 .

[6]  Jens Bahns Kollokationen als lexikographisches Problem : eine Analyse allgemeiner und spezieller Lernerwörterbücher des Englischen , 1996 .

[7]  Judith Eckle-Kohler,et al.  Methods for quality assurance in semi-automatic lexicon acquisition from corpora , 1998 .

[8]  Elisabeth Breidt,et al.  Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German , 1996, VLC@ACL.

[9]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[10]  Karin Bausewein Akkusativobjekt, Akkusativobjektsätze und Objektsprädikate im Deutschen , 1990 .

[11]  Igor A. Mel'chuk,et al.  Towards an Efficient Representation of Restricted Lexical Cooccurrence , 1994 .

[12]  Igor Mel’čuk,et al.  Dictionnaire explicatif et combinatoire du français contemporain. Recherches lexico-sémantiques IV: Recherches lexico-sémantiques IV , 1999 .

[13]  Igorʹ A. Melʹčuk,et al.  DEC dictionnaire explicatif et combinatoire du français contemporain , 1984 .

[14]  Ulrich Heid Building of a dictionary of german support verb constructions from text corpora , 1998 .

[15]  Thierry Fontenelle,et al.  Turning a bilingual dictionary into a lexical semantic database , 1997 .

[16]  Ulrich Heid,et al.  The DECIDE Project: Multilingual Collocation Extraction , 1996 .

[17]  Marlene Dolitsky,et al.  The BBI combinatory dictionary of english: A guide to word combinations , 1990 .

[18]  Erhard Agricola,et al.  Wörter und Wendungen : Wörterbuch zum deutschen Sprachgebrauch , 1992 .

[19]  Miriam Butt,et al.  Syntactic Analyses for Parallel Grammars: Auxiliaries and Genitive NPs , 1996, COLING.