Systematics and Architecture for a Resource Representing Knowledge about Named Entities

Named entities are ubiquitous in documents in the web and other document repositories. The information that a human user associates with named entities occurring in a document often suffices to derive a simplified picture, or a fingerprint, of its contents. Quite generally, background knowledge on named entities simplifies proper document understanding. In order to use this kind of information in automated document processing, resources are needed that make information implicitly carried by named entities explicit, formalizing it in an appropriate way. We describe the systematics and architecture of an experimental resource that contains a thematic-geographic-temporal hierarchy for classifying named entities, positions named entities of various kinds with respect to the hierarchy, lists synonyms, and gives formal descriptions of these entities and their relations. The resource should offer a general basis for semantic annotation, indexing, retrieval, querying, browsing and hyperlinking of (semi-)textual web documents, structured documents and flat texts.