CRCTOL: A semantic-based domain ontology learning system

Domain ontologies play an important role in supporting knowledge-based applications in the Semantic Web. To facilitate the building of ontologies, text mining techniques have been used to perform ontology learning from texts. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. In this paper we present a system, known as Concept-RelationConcept Tuple-based Ontology Learning (CRCTOL), for mining ontologies automatically from domain-specific documents. Specifically, CRCTOL adopts a full text parsing technique and employs a combination of statistical and lexico-syntactic methods, including a statistical algorithm that extracts key concepts from a document collection, a word sense disambiguation algorithm that disambiguates words in the key concepts, a rule-based algorithm that extracts relations between the key concepts,and a modified generalized association rule mining algorithm that prunes unimportant relations for ontology learning. As a result, the ontologies learned by CRCTOL are more concise and contain a richer semantics in terms of the range and number of semantic relations compared with alternative systems. We present two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontology. At the component level, quantitative evaluation by comparing with TextTo-Onto and its successor Text2Onto has shown that CRCTOL is able to extract concepts and semantic relations with a significantly higher level of accuracy. At the ontology level, the quality of the learned ontologies is evaluated by either employing a set of quantitative and qualitative methods including analyzing the graph structural property, comparison to WordNet, and expert rating, or directly comparing with a human-edited benchmark ontology, demonstrating the high quality of the ontologies learned.

[1]  James A. Hendler,et al.  A new form of Web content that is meaningful to computers will unleash a revolution of new possibili , 2002 .

[2]  Paola Velardi,et al.  Text Mining Techniques to Automatically Enrich a Domain Ontology , 2003, Applied Intelligence.

[3]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[4]  E. Muñoz-Martínez Small Worlds: The Dynamics of Networks Between Order and Randomness, by Duncan J. Watts, (Princeton Studies in Complexity), Princeton University Press, 1999. $39.50 (hardcover), 262 pp. ISBN: 0-691-00541-9. (Book Reviews) , 2000 .

[5]  Dan Klein,et al.  Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[6]  Béatrice Daille,et al.  Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[7]  Pushpak Bhattacharyya,et al.  Mapping and Structural Analysis of Multi-lingual Wordnets , 2007, IEEE Data Eng. Bull..

[8]  Reinhard Rapp,et al.  The Computation of Word Associations: Comparing Syntagmatic and Paradigmatic Approaches , 2002, COLING.

[9]  Sylvie Szulman,et al.  TERMINAE : a method and a tool to build a domain ontology , 1999 .

[10]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[11]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[12]  Aldo Gangemi,et al.  Unsupervised Learning of Semantic Relations between Concepts of a Molecular Biology Ontology , 2005, IJCAI.

[13]  Ah-Hwee Tan,et al.  OntoSearch: A Full-Text Search Engine for the Semantic Web , 2006, AAAI.

[14]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[15]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[16]  Johanna Völker,et al.  A Framework for Ontology Learning and Data-driven Change Discovery , 2005 .

[17]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[18]  Ah-Hwee Tan,et al.  Mining ontological knowledge from domain-specific text documents , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[19]  Feiyu Xu,et al.  A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping , 2002, LREC.

[20]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[21]  Mill Johannes G.A. Van,et al.  Transmission Of Information , 1961 .

[22]  Paola Velardi,et al.  The Usable Ontology: An Environment for Building and Assessing a Domain Ontology , 2002, SEMWEB.

[23]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[24]  Paul Buitelaar,et al.  A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis , 2004, ESWS.

[25]  Pablo Gamallo,et al.  Mapping Syntactic Dependencies onto Semantic Relations , 2002 .

[26]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[27]  Steffen Staab,et al.  Mining Ontologies from Text , 2000, EKAW.

[28]  Enrique Alfonseca,et al.  Deliverable 1 . 5 : A survey of ontology learning methods and techniques , 2003 .

[29]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[30]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[31]  Nicola Guarino,et al.  OntoSeek: content-based access to the Web , 1999, IEEE Intell. Syst..

[32]  Frank van Harmelen,et al.  Web Ontology Language: OWL , 2004, Handbook on Ontologies.

[33]  Roberto Basili,et al.  Inducing Terminology for Lexical Acquisition , 1997, EMNLP.

[34]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[35]  Marco Baroni,et al.  Using Cooccurrence Statistics and the Web to Discover Synonyms in a Technical Language , 2004, LREC.

[36]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[37]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[38]  Dan I. Moldovan,et al.  Word sense disambiguation of WordNet glosses , 2004, Comput. Speech Lang..

[39]  Li Ma,et al.  Explorations in the use of semantic web technologies for product information management , 2007, WWW '07.

[40]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[41]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, HLT.

[42]  Ah-Hwee Tan,et al.  Mining semantic networks for knowledge discovery , 2003, Third IEEE International Conference on Data Mining.

[43]  Juan C Sager,et al.  English Special Languages: Principles and Practice in Science and Technology , 1980 .

[44]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.