论文信息 - Automatic Extraction of Verb Phrases from Annotated Corpora : A Linguistic Evaluation for Estonian

Automatic Extraction of Verb Phrases from Annotated Corpora : A Linguistic Evaluation for Estonian

In order to be able to analyze and synthesize real sentences of a language, one has to be aware of the common expressions, which may be complicated idioms as well as simple frequent phrases. A special case of such common expressions is verb phrases i.e. phrasal verbs like to pay off and idiomatic expressions like to laugh one to pieces. In this paper, we will present the SENTA system that proposes an innovative architecture that avoids the definition of global association measure thresholds and defines a new association measure that does not over-evaluate the degree of cohesion of sequences of words containing frequent fragments. Finally, we will present a case study to demonstrate a successful way of combining linguistic and statistical processing to extract Estonian phrasal verbs from a text corpus.

Gaël Dias | Kadri Muischnek | Heiki-Jaan Kaalep

[1] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[2] Béatrice Daille,et al. Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[3] Slava M. Katz,et al. Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[4] Frank Smadja,et al. Retrieving Collocations from Text: Xtract , 1993, CL.

[5] Vasileios Hatzivassiloglou,et al. Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[6] Sayori Shimohata,et al. Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[7] Cornelius Hasselblatt,et al. Das estnische Partikelverb als Lehnübersetzung aus dem Deutschen , 1990 .

[8] G. Dias,et al. Extraction automatique d'unités lexicales complexes : Un enjeu fondamental pour la recherche documentaire , 2000 .

[9] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.