An Approach for Populating and Enriching Ontology-Based Repositories

Publically available text-based documents (e.g. news, meeting transcripts) are a very important source of knowledge, especially for organizations. These documents mention domain entities such as persons, places, professional positions, decisions and actions. Querying these documents (instead of browsing, searching and finding) is a very relevant task for any person in general, and particularly for professionals dealing with intensive knowledge tasks. Querying text-based documents' data, however, is not supported by common technology. For that, such documents' content has to be explicitly and formally captured as facts into a knowledge base. Making use of automatic NLP processes for capturing such facts is a common approach, but their relatively low precision and recall give rise to data quality problems. Furthermore, facts existing in the documents are often insufficient to answer complex queries, thus the need to enrich the captured facts with facts from third-party repositories (e.g. public LOD). This paper describes the adopted process to clean, populate and enrich a knowledge base repository that is further exploited to answer complex queries. This process is triggered by a previous NLP parsing process and conducted by the (rich) ontology describing such repository.