论文信息 - Categorizing Web Pages as a Preprocessing Step for Information Extraction

Categorizing Web Pages as a Preprocessing Step for Information Extraction

At present, information systems combining crawling and information extraction (IE) technologies acquire a lot of research and industrial interest. In this paper, we present an algorithm that exploits techniques for unsupervised IE pattern acquisition in order to facilitate identification of web pages containing information relevant to the IE task.

Richard Evans | Viktor Pekar | Ruslan Mitkov

[1] Thorsten Joachims,et al. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[2] Micha Hofri,et al. Probabilistic Analysis of Algorithms , 1987, Texts and Monographs in Computer Science.

[3] Ellen Riloff,et al. Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[4] Ralph Grishman,et al. Automatic Acquisition of Domain Knowledge for Information Extraction , 2000, COLING.

[5] Andrew McCallum,et al. A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[6] Timo Järvinen,et al. A non-projective dependency parser , 1997, ANLP.

[7] Nicholas Kushmerick,et al. Wrapper Induction for Information Extraction , 1997, IJCAI.