论文信息 - Towards creating a knowledge base for World-Wide Web documents

Towards creating a knowledge base for World-Wide Web documents

The lack of organization of information on the web results in non-efficient information retrieval. Several approaches for improvement have been suggested. We propose to use a document knowledge base that contains semantic and structural information concerning the retrievable documents that is extracted from the actual documents. We show that using such a knowledge base gives a number of advantages, including advanced query functionality. We also discuss the creation of such a knowledge base and in particular we show how we can automatically extract structural information from HTML documents for addition to the document knowledge base.

N. Shahmehri | P. Lambrix | J. Aberg

[1] Deborah L. McGuinness,et al. CLASSIC: a structural data model for objects , 1989, SIGMOD '89.

[2] Patrick Lambrix,et al. A default extension to description logics for use in an intelligent search engine , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[3] Rainer Hoch,et al. Using IR techniques for text classification in document analysis , 1994, SIGIR '94.

[4] Serge Abiteboul,et al. From structured documents to novel query facilities , 1994, SIGMOD '94.

[5] Patrick Lambrix,et al. Learning Composite Concepts in Description Logics: A First Step , 1996, ISMIS.

[6] Diego Calvanese,et al. Representing SGML Documents in Description Logics , 1996, Description Logics.

[7] Umberto Straccia,et al. A model of information retrieval based on a terminological logic , 1993, SIGIR.

[8] Ian A. Macleod,et al. Storage and retrieval of structured documents , 1990, Inf. Process. Manag..

[9] Charles F. Goldfarb,et al. SGML handbook , 1990 .

[10] Lin Padgham,et al. A Description Logic Model for Querying Knowledge Bases for Structured Documents , 1997, ISMIS.