Automatic construction of a domain-independent knowledge base from heterogeneous data sources

Manual construction and maintenance of general-purpose knowledge bases forms a major limiting factor towards their full adoption, use and reuse in practical settings. In this paper, we present KnowBase, a system for automatic knowledge base construction from heterogeneous data sources including domain-specific ontologies, general-purpose ontologies, plain texts, and image and video captions, which are automatically extracted from WebPages. In our approach, several information extraction techniques are integrated to automatically create, enrich, and keep the knowledge base up to date. Consequently, knowledge represented by the produced knowledge base can be employed in several application domains. In our experiments, we used the produced knowledge base as an external resource to align heterogeneous ontologies from the environmental and agricultural domains. The produced results demonstrate the effectiveness of the used knowledge base in finding corresponding entities between the used ontologies.

[1]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[2]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[3]  Nophar Geifman,et al.  Towards an Age-Phenome Knowledge-base , 2011, BMC Bioinformatics.

[4]  Juan-Zi Li,et al.  A gauss function based approach for unbalanced ontology matching , 2009, SIGMOD Conference.

[5]  Doug Downey,et al.  Unsupervised named-entity extraction from the Web: An experimental study , 2005, Artif. Intell..

[6]  Michael Witbrock,et al.  The Comprehensive Terrorism Knowledge Base in Cyc , 2005 .

[7]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[8]  Kelly Domico,et al.  Domain Independent Knowledge Base Population from Structured and Unstructured Data Sources , 2011, FLAIRS Conference.

[9]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[10]  Jer Lang Hong,et al.  Webpage segmentation for extracting images and their surrounding contextual information , 2009, MM '09.

[11]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[12]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[13]  Gerhard Weikum,et al.  YAGO2: exploring and querying world knowledge in time, space, context, and many languages , 2011, WWW.

[14]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[15]  Dan Roth,et al.  Learning to Resolve Natural Language Ambiguities: A Unified Approach , 1998, AAAI/IAAI.

[16]  Harith Alani Ontology Construction from Online Ontologies , 2006 .

[17]  Mohammed Maree,et al.  A Coupled Statistical/Semantic Framework for Merging Heterogeneous Domain-Specific Ontologies , 2010, 2010 22nd IEEE International Conference on Tools with Artificial Intelligence.

[18]  Timothy W. Finin,et al.  Using Wikitology for Cross-Document Entity Coreference Resolution , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[19]  Shubin Zhao,et al.  Corroborate and learn facts from the web , 2007, KDD '07.

[20]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[21]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[22]  David S. Wishart,et al.  HMDB: a knowledgebase for the human metabolome , 2008, Nucleic Acids Res..

[23]  Mark Levene,et al.  Search Engines: Information Retrieval in Practice , 2011, Comput. J..