论文信息 - Automatic Extraction of Subcorpora based on Subcategorization Frames from a Part-ofSpeech Tagged Corpus

Automatic Extraction of Subcorpora based on Subcategorization Frames from a Part-ofSpeech Tagged Corpus

This paper presents a method for extracting subcorpora documenting different subcategorization frames for verbs, nouns, and adjectives in the 100 mio. word British National Corpus. The extraction tool consists of a set of batch files for use with the Corpus Query Processor (CQP), which is part of the IMS corpus workbench (cf. Christ 1994a, b). A macroprocessor has been developed that allows the user to specify in a simple input file which subcorpora are to be created for a given lemma.The resulting subcorpora can be used (1) to provide evidence for the subcategorization properties of a given lemma, and to facilitate the selection of corpus lines for lexicographic research, and (2) to determine the frequencies of different syntactic contexts of each lemma.

Susanne Gahl | S. Gahl

[1] W. Nelson Francis,et al. FREQUENCY ANALYSIS OF ENGLISH USAGE: LEXICON AND GRAMMAR , 1983 .

[2] X YingGuoPeiShengJiaoYuChuBanYou. Longman Dictionary of Contemporary English , 1991 .

[3] John B. Lowe,et al. The Berkeley FrameNet Project , 1998, ACL.

[4] Paul Procter,et al. Longman Dictionary of Contemporary English , 1978 .

[5] 李荫华. 可喜的突破──评Oxford Advanced Learner’s Dictionary of Current English (fourth edition)兼谈《牛津高阶》 , 1998 .

[6] Christopher D. Manning. Automatic Acquisition of a Large Sub Categorization Dictionary From Corpora , 1993, ACL.

[7] John Sinclair,et al. Collins COBUILD English Language Dictionary , 1987 .

[8] 刘江雪,et al. LIN volume 11 issue 2 Cover and Back matter , 1975, Journal of Linguistics.

[9] R. Burchfield. Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .