Extraction of V-N-Collocations from Text Corpora: A Feasibility Study for German

The usefulness of a statistical approach suggested by Church and Hanks (1989) is evaluated for the extraction of verb-noun (V-N) collocations from German text corpora. Some motivations for the extraction of V-N collocations from corpora are given and a couple of differences concerning the German language are mentioned that have implications on the applicability of extraction methods developed for English. We present precision and recall results for V-N collocations with support verbs and discuss the consequences for further work on the extraction of collocations from German corpora. Depending on the goal to be achieved, emphasis can be put on a high recall for lexicographic purposes or on high precision for automatic lexical acquisition, in each case leading to a decrease of the corresponding other variable. Low recall can still be acceptable if very large corpora (i.e. 50 100 miUion words) are available or if corpora are used for special domains in addition to the data found in machine readable (collocation) dictionaries.

[1]  Erhard Agricola,et al.  Wörter und Wendungen : Wörterbuch zum deutschen Sprachgebrauch , 1992 .

[2]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[3]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[4]  Thierry Fontenelle,et al.  Survey of collocation extraction tools , 1994 .

[5]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[6]  Angelika Storrer,et al.  Multiword Lexemes: A Monolingual and Contrastive Typology for NLP and MT , 1992, IWBS Report.

[7]  Frank A. Smadja,et al.  Microcoding the Lexicon with Co-occurrence Knowledge , 1989 .

[8]  Kathleen McKeown,et al.  Automatically Extracting and Representing Collocations for Language Generation , 1990, ACL.

[9]  Laurence Danlos Support verb constructions: linguistic properties, representation, translation , 1992 .

[10]  Kenneth Ward Church,et al.  - 1-What ’ s Wrong with Adding One ? , 1994 .

[11]  Günther Drosdowski,et al.  Duden, Stilwörterbuch der deutschen Sprache : die Verwendung der Wörter im Satz , 1970 .

[12]  Yaacov Choueka,et al.  Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[13]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[14]  SmadjaFrank Retrieving collocations from text , 1993 .

[15]  Uri Zernik,et al.  Lexical acquisition: Exploiting on-line resources to build a lexicon. , 1991 .

[16]  Nicoletta Calzolari,et al.  Acquisition of Lexical Information from a Large Textual Italian Corpus , 1990, COLING.

[17]  Evelyn Marcussen Hatch,et al.  Research Design and Statistics for Applied Linguistics , 1982 .

[18]  Frank A. Srnad ja,et al.  From N-Grams to Collocations: An Evaluation of Xtract , 1991, ACL.

[19]  Frank Smadja,et al.  From N-Grams to Collocations: An Evaluation of Xtract , 1991, ACL.