论文信息 - Automatic Extraction of Information from the Web

Automatic Extraction of Information from the Web

The semantic Web will bring meaning to the Internet, making it possible for web agents to understand the information it contains. However, current trends seem to suggest that the semantic web is not likely to be adopted in the forthcoming years. In this sense, meaningful information extraction from the web becomes a handicap for web agents. In this article, we present a framework for automatic extraction of semantically-meaningful information from the current web. Separating the extraction process from the business logic of an agent enhances modularity, adaptability, and maintainability. Our approach is novel in that it combines different technologies to extract information, surf the web and automatically adapt to web changes.

Rafael Corchuelo | José Luis Arjona

[1] Jennifer Widom,et al. Integrating and Accessing Heterogeneous Information Sources in TSIMMIS , 1994 .

[2] Steven J. DeRose,et al. XML linking , 1999, CSUR.

[3] William W. Cohen. A structured wrapper induction system for extracting information from semi-structured documents , 2001, IJCAI 2001.

[4] Paolo Merialdo,et al. Araneus in the Era of XML , 1999, IEEE Data Eng. Bull..

[5] Nicholas Kushmerick,et al. Wrapper induction: Efficiency and expressiveness , 2000, Artif. Intell..

[6] James A. Hendler,et al. Agents and the Semantic Web , 2001, IEEE Intell. Syst..

[7] Nicholas Kushmerick,et al. Wrapper verification , 2000, World Wide Web.

[8] Stephen Soderland,et al. Learning Information Extraction Rules for Semi-Structured and Free Text , 1999, Machine Learning.

[9] Stephen Cranefield,et al. Generating Ontology-Specific Content Languages , 2001 .

[10] James A. Hendler,et al. Ontology-based Web agents , 1997, AGENTS '97.

[11] Timothy W. Finin,et al. KQML as an agent communication language , 1994, CIKM '94.

[12] Bernhard Bauer,et al. Extending UML for agents , 2000 .

[13] Nicholas Kushmerick,et al. Regression testing for wrapper maintenance , 1999, AAAI/IAAI.