Ontology learning: state of the art and open issues

Ontology is one of the fundamental cornerstones of the semantic Web. The pervasive use of ontologies in information sharing and knowledge management calls for efficient and effective approaches to ontology development. Ontology learning, which seeks to discover ontological knowledge from various forms of data automatically or semi-automatically, can overcome the bottleneck of ontology acquisition in ontology development. Despite the significant progress in ontology learning research over the past decade, there remain a number of open problems in this field. This paper provides a comprehensive review and discussion of major issues, challenges, and opportunities in ontology learning. We propose a new learning-oriented model for ontology development and a framework for ontology learning. Moreover, we identify and discuss important dimensions for classifying ontology learning approaches and techniques. In light of the impact of domain on choosing ontology learning approaches, we summarize domain characteristics that can facilitate future ontology learning effort. The paper offers a road map and a variety of insights about this fast-growing field.

[1]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[2]  Yiyu Yao,et al.  Computation of term associations by a neural network , 1993, SIGIR.

[3]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[4]  Ellen Riloff,et al.  A Corpus-Based Approach for Building Semantic Lexicons , 1997, EMNLP.

[5]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[6]  Mohand Boughanem,et al.  Semantic cores for representing documents in IR , 2005, SAC '05.

[7]  Aldo Gangemi,et al.  Ontology Learning and Its Application to Automated Terminology Translation , 2003, IEEE Intell. Syst..

[8]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[9]  Scott Bennett,et al.  Evaluating Automated and Manual Acquisition of Anaphora Resolution Strategies , 1995, ACL.

[10]  Lluis Marquez,et al.  Machine Learning and Natural Language Processing , 2000 .

[11]  Nikos Fakotakis,et al.  Automatic Extraction of Semantic Relations from Specialized Corpora , 2000, COLING.

[12]  Hideaki Takeda,et al.  An Ontology-based Cooperative Environment for Real-world Agents , 1996 .

[13]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[14]  Marvin Minsky,et al.  A framework for representing knowledge , 1974 .

[15]  Andrew B. Williams,et al.  Learning to Share Meaning in a Multi-Agent System , 2004, Autonomous Agents and Multi-Agent Systems.

[16]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[17]  Andrew B. Williams,et al.  An Instance-based Approach for Identifying Candidate Ontology Relations within a Multi-Agent System , 2000, ECAI Workshop on Ontology Learning.

[18]  Raphael Volz,et al.  Semi-automatic Ontology Acquisition from a Corporate Intranet , 2000 .

[19]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[20]  Alexiei Dingli,et al.  Integrating Information to Bootstrap Information Extraction from Web Sites , 2003, IIWeb.

[21]  Farshad Hakimpour,et al.  Resolving Semantic Heterogeneity in Schema Integration: an Ontology Based Approach , 2001 .

[22]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[23]  Hinrich Schütze,et al.  A Cooccurrence-Based Thesaurus and Two Applications to Information Retrieval , 1994, Inf. Process. Manag..

[24]  Carolyn J. Crouch,et al.  Experiments in automatic statistical thesaurus construction , 1992, SIGIR '92.

[25]  Stephen Soderland,et al.  Learning to Extract Text-Based Information from the World Wide Web , 1997, KDD.

[26]  Zhang Duo,et al.  Web service annotation using ontology mapping , 2005, IEEE International Workshop on Service-Oriented System Engineering (SOSE'05).

[27]  Nicola Guarino,et al.  Formal ontology, conceptual analysis and knowledge representation , 1995, Int. J. Hum. Comput. Stud..

[28]  Zahir Tari,et al.  The Reengineering of Relational Databases Based on Key and Data Correlations , 1997, DS-7.

[29]  Schubert Foo,et al.  Ontology research and development. Part 1 - a review of ontology generation , 2002, J. Inf. Sci..

[30]  Pedro M. Domingos,et al.  Learning Source Descriptions for Data Integration , 2000 .

[31]  Hwee Tou Ng,et al.  A Machine Learning Approach to Coreference Resolution of Noun Phrases , 2001, CL.

[32]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[33]  Ruslan Mitkov,et al.  Evaluation Tool for Rule-based Anaphora Resolution Methods , 2001, ACL.

[34]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[35]  Dennis McLeod,et al.  Retrieval effectiveness of an ontology-based model for information selection , 2004, The VLDB Journal.

[36]  Mark A. Musen,et al.  Ontology versioning in an ontology management framework , 2004, IEEE Intelligent Systems.

[37]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[38]  Ellen Riloff,et al.  A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction , 1999, Natural Language Engineering.

[39]  Simone Santini,et al.  Emergent Semantics through Interaction in Image Databases , 2001, IEEE Trans. Knowl. Data Eng..

[40]  Leen-Kiat Soh Multiagent Distributed Ontology Learning , 2002 .

[41]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[42]  A. Gomez-Perez,et al.  Some ideas and examples to evaluate ontologies , 1995, Proceedings the 11th Conference on Artificial Intelligence for Applications.

[43]  Richard Fikes,et al.  The Ontolingua Server: a tool for collaborative ontology construction , 1997, Int. J. Hum. Comput. Stud..

[44]  U. GijNTzER AUTOMATIC THESAURUS CONSTRUCTION BY MACHINE LEARNING FROM RETRIEVAL SESSIONS , 2002 .

[45]  Steffen Staab,et al.  Methodology for development and employment of ontology based knowledge management applications , 2002, SGMD.

[46]  Sara Rydin,et al.  Building a hyponymy lexicon with hierarchical structure , 2002, ACL 2002.

[47]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[48]  Kenneth Ward Church,et al.  A comparison of the enhanced Good-Turing and deleted estimation methods for estimating probabilities of English bigrams , 1991 .

[49]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[50]  Thomas Hofmann,et al.  Statistical Models for Co-occurrence Data , 1998 .

[51]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[52]  Asunción Gómez-Pérez,et al.  Why Evaluate Ontology Technologies? Because It Works! , 2004, IEEE Intell. Syst..

[53]  Stefan Decker,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001, IEEE Intell. Syst..

[54]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[55]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[56]  J. Ross Quinlan,et al.  Learning logical definitions from relations , 1990, Machine Learning.

[57]  Tat-Seng Chua,et al.  Unsupervised learning of soft patterns for generating definitions from online news , 2004, WWW '04.

[58]  Samson W. Tu,et al.  Mapping domains to methods in support of reuse , 1994, Int. J. Hum. Comput. Stud..

[59]  Paul Buitelaar,et al.  A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis , 2004, ESWS.

[60]  Marvin Minsky,et al.  A framework for representing knowledge" in the psychology of computer vision , 1975 .

[61]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[62]  Katharina Morik,et al.  Knowledge Acquisition and Machine Learning: Theory, Methods, and Applications , 1993 .

[63]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[64]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[65]  Gio Wiederhold,et al.  Thesaurus entry extraction from an on-line dictionary , 1999 .

[66]  Alexiei Dingli,et al.  Learning to Harvest Information for the Semantic Web , 2004, ESWS.

[67]  Paul S. Jacobs,et al.  Acquiring Lexical Knowledge from Text: A Case Study , 1988, AAAI.

[68]  Graeme Hirst,et al.  Automatic Sense Disambiguation of the Near-Synonyms in a Dictionary Entry , 2003, CICLing.

[69]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[70]  L MercerRobert,et al.  Class-based n-gram models of natural language , 1992 .

[71]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[72]  Nicola Guarino UNDERSTANDING, BUILDING, AND USING ONTOLOGIES , 1997 .

[73]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[74]  John Hale,et al.  A Statistical Approach to Anaphora Resolution , 1998, VLC@COLING/ACL.

[75]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[76]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.

[77]  Feiyu Xu,et al.  Term Extraction and Mining of Term Relations from Unrestricted Texts in the Financial Domain , 2002 .

[78]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[79]  Timothy W. Finin,et al.  Swoogle: a search and metadata engine for the semantic web , 2004, CIKM '04.

[80]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[81]  A MusenMark,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001 .

[82]  Lluís Màrquez Villodre Machine learning and natural language processing , 2000 .

[83]  Yunsong Wang,et al.  Ontology-Based Knowledge Management , 2003 .

[84]  Dongsong Zhang,et al.  ROD - toward rapid ontology development for underdeveloped domains , 2002, Proceedings of the 35th Annual Hawaii International Conference on System Sciences.

[85]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[86]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[87]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[88]  Gilles Bisson,et al.  Designing Clustering Methods for Ontology Building - The Mo'K Workbench , 2000, ECAI Workshop on Ontology Learning.

[89]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[90]  Nathalie Aussenac-Gilles Supervised text analysis for ontology and terminology engineering , 2005 .

[91]  Farshad Hakimpour,et al.  Resolving semantic heterogeneity in schema integration , 2001, FOIS.

[92]  Hang Li,et al.  Clustering Words with the MDL Principle , 1996, COLING.

[93]  Asunción Gómez-Pérez,et al.  Building a chemical ontology using Methontology and the Ontology Design Environment , 1999, IEEE Intell. Syst..

[94]  Michael Denny Ontology Building: A Survey of Editing Tools , 2002 .

[95]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[96]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[97]  David Faure,et al.  Knowledge Acquisition of Predicate Argument Structures from Technical Texts Using Machine Learning: The System ASIUM , 1999, EKAW.

[98]  Steffen Staab,et al.  Mining Ontologies from Text , 2000, EKAW.

[99]  Stefan Schlobach,et al.  Assertional Mining in Description Logics , 2000, Description Logics.

[100]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[101]  Paola Velardi,et al.  Integrated approach to Web ontology learning and engineering , 2002, Computer.

[102]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[103]  Lucy Vanderwende,et al.  Automatically Deriving Structured Knowledge Bases From On-Line Dictionaries , 1993 .

[104]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[105]  Mark A. Musen,et al.  An Algorithm for Merging and Aligning Ontologies: Automation and Tool Support , 1999 .

[106]  Ido Dagan,et al.  Contextual word similarity and estimation from sparse data , 1995, Comput. Speech Lang..

[107]  Kevin Knight,et al.  Toward Distributed Use of Large-Scale Ontologies t , 1997 .

[108]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[109]  Peter Wiemer-Hastings,et al.  Inferring the Meaning of Verbs from Context , 1999 .

[110]  Hang Li,et al.  Word Clustering and Disambiguation Based on Co-occurrence Data , 1998, COLING.

[111]  James F. Allen Natural language understanding (2nd ed.) , 1995 .

[112]  Gerd Stumme,et al.  FCA-merge: a bottom-up approach for merging ontologies , 2001 .

[113]  Dekang Lin Automatic Retrieval and Clustering of Similar Words , 2022, COLING.

[114]  Dan Brickley,et al.  Resource description framework (RDF) schema specification , 1998 .

[115]  Ramanathan V. Guha,et al.  Cyc: toward programs with common sense , 1990, CACM.

[116]  Pedro M. Domingos,et al.  Learning Source Description for Data Integration , 2000, WebDB.

[117]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[118]  Anna Maria Di Sciullo,et al.  Natural Language Understanding , 2009, SoMeT.

[119]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[120]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[121]  Ido Dagan,et al.  Contextual Word Similarity and Estimation from Sparse Data , 1993, ACL.

[122]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[123]  Emmanuel Morin,et al.  Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods , 1999 .