论文信息 - A Self-Expanding Corpus Based on Newspapers on the Web

A Self-Expanding Corpus Based on Newspapers on the Web

A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month.

Knut Hofland