Embedding Knowledge in Web Documents: CGs versus XML-based Metadata Languages

The paper argues for the use of general and intuitive knowledge representation lajiguages for indexing the content of Web documents and representing knowledge within them. We believe these languages have advantages over metadata languages based on the Extensible Mark-up Language (XML). Indeed, the representation and retrieval of precise information is better supported by languages designed to represent semantic content and support logical inference, and the readability of such a language eases its exploitation, presentation and direct insertion within a document.To further ease the representation process, we propose techniques allowing users to leave some knowledge terms undeclared. We illustrate these ideas with WebKB, a precision-oriented information retrieval/annotation tool, and show how lexical, structural ajid knowledge-based techniques may be combined to retrieve or generate knowledge or Web documents. Finally, to overcome the scalability problems of storing knowledge within Web documents, we propose some ideas for scalable and cooperatively built knowledge repositories.