An overview of methods and tools for ontology learning from texts

Ontology learning aims at reducing the time and efforts in the ontology development process. In recent years, several methods and tools have been proposed to speed up this process using different sources of information and different techniques. In this paper, we have reviewed 13 methods and 14 tools for semi-automatically building ontologies from texts and their relationships with the techniques each method follows. The methods have been grouped according to the main techniques followed and three groups have been identified: one based on linguistics, one on statistics, and one on machine learning. Regarding the tools, the criterion for grouping them, which has been the main aim of the tool, is to distinguish what elements of the ontology can be learned with each tool. According to this, we have identified three kinds of tools: tools for learning relations, tools for learning new concepts, and assisting tools for building up taxonomies.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Emmanuelle Martienne,et al.  Vagueness and Data Reduction in Concept Learning , 1998, ECAI.

[3]  Nathalie Aussenac-Gilles,et al.  Structuration de terminologies à l'aide d'outils de TAL avec TERMINAE , 2002 .

[4]  Gilles Bisson Conceptual Clustering in a First Order Logic Representation , 1992, ECAI.

[5]  Udo Hahn,et al.  Joint knowledge capture for grammars and ontologies , 2001, K-CAP '01.

[6]  Suresh Manandhar,et al.  Improving an Ontology Refinement Method with Hyponymy Patterns , 2002, LREC.

[7]  Sanda M. Harabagiu,et al.  Enriching the WordNet taxonomy with contextual knowledge acquired from text , 2000 .

[8]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[9]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[10]  Steffen Staab,et al.  Ontology Learning , 2004, Encyclopedia of Machine Learning and Data Mining.

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[12]  Katharina Morik Balanced cooperative modeling , 2004, Machine Learning.

[13]  Feiyu Xu,et al.  A Domain Adaptive Approach to Automatic Acquisition of Domain Relevant Terms and their Relations with Bootstrapping , 2002, LREC.

[14]  Avigdor Gal,et al.  The Use of Machine-Generated Ontologies in Dynamic Information Seeking , 2001, CoopIS.

[15]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[16]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[17]  Cynthia A. Thompson and Raymond J. Mooney Semantic Lexicon Acquisition for Learning Parsers , 1997 .

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[20]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[21]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[22]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[23]  Ralf Steinmetz,et al.  Ontology enrichment with texts from the WWW , 2002 .

[24]  Eneko Agirre,et al.  Building Accurate Semantic Taxonomies from Monolingual MRDs , 1998, COLING-ACL.

[25]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[26]  Paola Velardi,et al.  Using text processing techniques to automatically enrich a domain ontology , 2001, FOIS.

[27]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[28]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[29]  Antonio Pareja-Lora,et al.  A SEMANTIC WEB PAGE LINGUISTIC ANNOTATION MODEL , 2002 .

[30]  Dan I. Moldovan,et al.  An Interactive Tool for the Rapid Development of Knowledge Bases , 2001, Int. J. Artif. Intell. Tools.

[31]  Gilles Bisson,et al.  Learning in FOL with a Similarity Measure , 1992, AAAI.

[32]  Mark Fischetti,et al.  Weaving the web - the original design and ultimate destiny of the World Wide Web by its inventor , 1999 .

[33]  Feng Luo,et al.  Ontology construction for information selection , 2002, 14th IEEE International Conference on Tools with Artificial Intelligence, 2002. (ICTAI 2002). Proceedings..

[34]  Brigitte Grau,et al.  SVETLAN' a system to classify nouns in context , 2000 .

[35]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[36]  York Sure,et al.  Ontoedit : Collaborative ontology engineering for the semantic web , 2002 .

[37]  Shih-Hung Wu,et al.  SOAT: A Semi-Automatic Domain Ontology Acquisition Tool from Chinese Corpus , 2002, COLING.

[38]  Steffen Staab,et al.  OntoEdit: Collaborative Ontology Development for the Semantic Web , 2002, SEMWEB.

[39]  Paola Velardi,et al.  The Usable Ontology: An Environment for Building and Assessing a Domain Ontology , 2002, SEMWEB.

[40]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[41]  Nathalie Aussenac-Gilles,et al.  Revisiting Ontology Design: A Methodology Based on Corpus Analysis , 2000, EKAW.

[42]  Dan I. Moldovan,et al.  Domain-Specific Knowledge Acquisition and Classification Using WordNet , 2000, FLAIRS Conference.

[43]  David Faure,et al.  First experiences of using semantic knowledge learned by ASIUM for information extraction task using INTEX , 2000, ECAI Workshop on Ontology Learning.

[44]  Suresh Manandhar,et al.  Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures , 2002, EKAW.

[45]  Amílcar Cardoso,et al.  Automatic Reading and Learning from Text , 2001 .

[46]  Paola Velardi,et al.  Integrated approach to Web ontology learning and engineering , 2002, Computer.

[47]  Claude Roux,et al.  An Ontology Enrichment Method for a Pragmatic Information Extraction System gathering Data on Genetic Interactions , 2000, ECAI Workshop on Ontology Learning.

[48]  Nathalie Aussenac-Gilles,et al.  D'une méthode à un guide pratique de modélisation de connaissances à partir de textes , 2003 .

[49]  Marti A. Hearst Automated Discovery of WordNet Relations , 2004 .

[50]  Raphael Volz,et al.  The text-to-onto ontology extraction and maintenance system , 2001 .

[51]  Suresh Manandhar,et al.  An Unsupervised Method for General Named Entity Recognition and Automated Concept Discovery , 2004 .

[52]  Shih-Hung Wu,et al.  Event identification based on the information map-INFOMAP , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[53]  John M. Zelle,et al.  Using inductive logic programming to automate the construction of natural language parsers , 1996 .

[54]  Chung Hee Hwang,et al.  Incompletely and Imprecisely Speaking: Using Dynamic Ontologies for Representing and Retrieving Information , 1999, KRDB.

[55]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.

[56]  Brigitte Grau,et al.  SVETLAN - A System to Classify Words in Context , 2000, ECAI Workshop on Ontology Learning.

[57]  Geoffrey I. Webb Integrating Machine Learning with Knowledge Acquisition , 2002 .

[58]  Stuart C. Shapiro,et al.  Book Reviews: Natural Language Processing and Knowledge Representation: Language for Knowledge and Knowledge for Language , 2001, CL.

[59]  Stefan Schulz,et al.  Towards Very Large Terminological Knowledge Bases: A Case Study from Medicine , 2000, Canadian Conference on AI.

[60]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[61]  Eduard Hovy,et al.  Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[62]  Didier Bourigault,et al.  LEXTER, a Natural Language Processing Tool for Terminology Extraction , 1996 .

[63]  David Faure,et al.  Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM , 1999, EKAW.

[64]  L. Serafini,et al.  D2.1.1 Survey of Scalability Techniques for Reasoning with Ontologies , 2004 .

[65]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[66]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[67]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[68]  Sylvie Szulman,et al.  TERMINAE: A Linguistic-Based Tool for the Building of a Domain Ontology , 1999, EKAW.

[69]  Francisco C. Pereira,et al.  Modelling Divergent Production: A multi-domain approach , 1998, ECAI.

[70]  Nathalie Aussenac-Gilles,et al.  Les relations sémantiques: du linguistique au formel , 2000 .

[71]  Steffen Staab,et al.  Discovering Conceptual Relations from Text , 2000, ECAI.

[72]  W. Z. Liu An integrated approach for different attribute types in nearest neighbour classification , 1996, The Knowledge Engineering Review.

[73]  Raphael Volz,et al.  Semi-automatic Ontology Acquisition from a Corporate Intranet , 2000 .