GRAONTO: A graph-based approach for automatic construction of domain ontology

Extracting domain knowledge and taking its full advantage has been an important way to reducing costs and accelerating processes in domain-related applications. Domain ontology, providing a common and unambiguous understanding of a domain for both the users and the system to communicate with each other via a set of representational primitives, has been proposed as an important and natural approach to represent domain knowledge. Most domain knowledge about domain entities with their properties and relationships is embodied in document collections. Thus, extracting ontologies from these documents is an important means of ontology construction. In this paper, a graph-based approach for automatic construction of domain ontology from domain corpus, named GRAONTO, has been proposed. First, each document in the collection is represented by a graph. After the generation of document graphs, random walk term weighting is employed to estimate the relevance of the information of a term to the corpus from both local and global perspectives. Next, the MCL (Markov Clustering) algorithm is used to disambiguate terms with different meanings and group similar terms to produce concepts. Next, an improved gSpan algorithm constrained by both vertices and informativeness is exploited to find arbitrary latent relations among these concepts. Finally, the domain ontology is output in the OWL format. For ontology evaluation purposes, a method for adaptive adjustment of concepts and relations with respect to its practical effectiveness is conceived. Evaluation experiments show that GRAONTO is a promising approach for domain ontology construction.

[1]  Abolghasem Sadeghi-Niaraki,et al.  Ontology based personalized route planning system using a multi-criteria decision making approach , 2009, Expert Syst. Appl..

[2]  Massimo Marchiori,et al.  The Limits of Web Metadata, and Beyond , 1998, Comput. Networks.

[3]  Yau-Hwang Kuo,et al.  Automated ontology construction for unstructured text documents , 2007, Data & Knowledge Engineering.

[4]  Philip J. Morrow,et al.  Integrating semantically heterogeneous aggregate views of distributed databases , 2008, Distributed and Parallel Databases.

[5]  Kwan Hee Han,et al.  Process-centered knowledge model and enterprise ontology for the development of knowledge management system , 2009, Expert Syst. Appl..

[6]  Wang Qian,et al.  Approach to ontology construction based on text mining , 2007 .

[7]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[8]  Cheng-Hsin Hsu,et al.  Ontology construction for information classification , 2006, Expert Syst. Appl..

[9]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[10]  Stuart Weibel,et al.  The Dublin Core: A Simple Content Description Model for Electronic Resources , 2005 .

[11]  Axel-Cyrille Ngonga Ngomo,et al.  SIGNUM: A Graph Algorithm for Terminology Extraction , 2008, CICLing.

[12]  Stephen E. Robertson,et al.  The TREC-9 filtering track , 1999, SIGF.

[13]  Ian Horrocks,et al.  Ontologies and the semantic web , 2008, CACM.

[14]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[15]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[16]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[17]  Ahmed A. Rafea,et al.  TextOntoEx: Automatic ontology construction from natural English text , 2008, Expert Syst. Appl..

[18]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[19]  Thorsten Meinl,et al.  A Quantitative Comparison of the Subgraph Miners MoFa, gSpan, FFSM, and Gaston , 2005, PKDD.

[20]  Robert Hoehndorf,et al.  General Formal Ontology (GFO) - A Foundational Ontology Integrating Objects and Processes [Version 1.0] , 2006 .

[21]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[22]  Amedeo Napoli,et al.  Mining Frequent Most Informative Subgraphs , 2007 .

[23]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[24]  S. Dongen A cluster algorithm for graphs , 2000 .

[25]  David Sánchez,et al.  Learning non-taxonomic relationships from web documents for domain ontology construction , 2008, Data Knowl. Eng..

[26]  George Karypis,et al.  Frequent subgraph discovery , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  S. K. Ghosh,et al.  A Framework for Semantic Interoperability for Distributed Geospatial Repositories , 2012, Comput. Informatics.

[29]  Rose Dieng,et al.  Semi-automatic Construction of an Ontology and of Semantic Annotations from a Discussion Forum of a Community of Practice , 2008, EKAW.

[30]  Tommy W. S. Chow,et al.  A new document representation using term frequency and vectorized graph connectionists with application to document retrieval , 2009, Expert Syst. Appl..

[31]  Bettina Berendt,et al.  Using and Learning Semantics in Frequent Subgraph Mining , 2005, WEBKDD.

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Peter Sawyer,et al.  A Flexible Framework To Experiment With Ontology Learning Techniques , 2007, SGAI Conf..

[34]  Amedeo Napoli,et al.  The Model of Most Informative Patterns and Its Application to Knowledge Extraction from Graph Databases , 2009, ECML/PKDD.

[35]  Rung Ching Chen,et al.  Automating construction of a domain ontology using a projective adaptive resonance theory neural network and Bayesian network , 2008, Expert Syst. J. Knowl. Eng..

[36]  Lin He,et al.  Research on Semi-Automatic Construction of Domain Ontology Based on Machine Learning and Clustering Technique , 2008, 2008 International Symposium on Intelligent Information Technology Application Workshops.

[37]  陳榮靜,et al.  Using Recursive ART Network to Construction Domain Ontology Based on Term Frequency and Inverse Document Frequency , 2008 .

[38]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[39]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[40]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[41]  Joshua D. Summers,et al.  An Ontology for Representation of Fixture Design Knowledge , 2008 .