Annotating 200 Million Words: The Bank Of English Project
暂无分享,去创建一个
The Bank of English is an international English language project sponsored by Harper-Collins Publishers, Glasgow, and conducted by the COBUILD team at the University of Birmingham, UK. The text bank will comprise some 200 million words of both written and spoken English. The whole 200 million word corpus is being annotated morphologically and syntactically during 1993--94 at the Research Unit for Computational Linguistics (RUCL), University of Helsinki, using the English morphological analyser (ENGTWOL) and English Constraint Grammar (ENGCG) parser. The first half of the texts (103 million words) has already been processed in 1993. The project is lead by Prof. John Sinclair in Birmingham, and Prof. Fred Karlsson in Helsinki. The present author is responsible for conducting the annotation.
[1] Arto Anttila. How to recognise Subjects in English , 1995 .
[2] Timo Järvinen,et al. Syntactic Analysis Of Natural Language Using Linguistic Rules And Corpus-Based Patterns , 1994, COLING.