论文信息 - KnowItNow: Fast, Scalable Information Extraction from the Web

KnowItNow: Fast, Scalable Information Extraction from the Web

Numerous NLP applications rely on search-engine queries, both to extract information from and to compute statistics over the Web corpus. But search engines often limit the number of available queries. As a result, query-intensive NLP applications such as Information Extraction (IE) distribute their query load over several days, making IE a slow, offline process.This paper introduces a novel architecture for IE that obviates queries to commercial search engines. The architecture is embodied in a system called KnowItNow that performs high-precision IE in minutes instead of days. We compare KnowItNow experimentally with the previously-published KnowItAll system, and quantify the tradeoff between recall and speed. KnowItNow's extraction rate is two to three orders of magnitude higher than KnowItAll's.

[1] Doug Downey,et al. A Probabilistic Model of Redundancy in Information Extraction , 2005, IJCAI.

[2] Peter D. Turney. Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[3] Jimmy J. Lin,et al. Data-Intensive Question Answering , 2001, TREC.

[4] Doug Downey,et al. Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[5] Douglas E. Appelt,et al. SRI International: description of the FASTUS system used for MUC-4 , 1992, MUC.

[6] Oren Etzioni,et al. A search engine for natural language applications , 2005, WWW '05.

[7] Peter D. Turney. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[8] Philip Resnik,et al. The Linguist''s Search Engine User''s Guide , 2004 .

[9] Marti A. Hearst. Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[10] Christopher D. Manning,et al. Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.