Ontology learning by using text clustering techniques: Method for structuring taxonomies

Finding an appropriate structure that represents the information contained in texts is not a trivial task. There are different structures for modeling the knowledge, such as: ontologies, taxonomies, thesaurus, and semantic networks. Ontologies are especially useful because they support the exchange and sharing of information. An important task in ontology learning is to obtain a set of representative terms to model a domain and organize them as taxonomy. For getting the representative terms on a specific domain, the linguistic analysis on text is essential. The identification of hypernymy/hyponymy relations between terms is imperative for building a taxonomy. This document introduces a novel mechanism to obtain representative terms and to find hypernymy relations between them within a knowledge domain. The basis of this idea is the use of a triple term structure <subject> verb <object> as representation model. This approach uses a set of linguistic patterns to get representative terms and combines WordNet synsets and context information for building a query set. This query set is sent to a web search engine in order to retrieve the most representative hypernym for each term. An implementation of the approach has been applied to two types of content inputs, a document corpus and a set of short texts (tweets), both of them showing promising results. This report of activities corresponds to the second year of doctoral studies.

[1]  David Sánchez,et al.  Domain Ontology Learning from the Web , 2009, The Knowledge Engineering Review.

[2]  Erik F. Tjong Kim Sang,et al.  Extracting Hypernym Pairs from the Web , 2007, ACL.

[3]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993 .

[4]  Sharon A. Caraballo Automatic construction of a hypernym-labeled noun hierarchy from text , 1999, ACL.

[5]  Timothy W. Finin,et al.  Swartout: Enabling technology for knowledge sharing , 1991 .

[6]  Key-Sun Choi,et al.  Taxonomy Learning using Term Specificity and Similarity , 2006, OntologyLearning@COLING/ACL.

[7]  Graeme Hirst,et al.  Distributional Measures as Proxies for Semantic Relatedness , 2012, ArXiv.

[8]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[9]  Andrzej Bargiela,et al.  Probabilistic Topic Models for Learning Terminological Ontologies , 2010, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ivan Lopez-Arevalo,et al.  Discovering hypernyms using linguistic patterns on web search , 2011, 2011 7th International Conference on Next Generation Web Services Practices.

[11]  Johanna Völker,et al.  Acquisition of OWL DL Axioms from Lexical Resources , 2007, ESWC.

[12]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[13]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[14]  Cheng-Hsin Hsu,et al.  Ontology construction for information classification , 2006, Expert Syst. Appl..

[15]  Carlos Rodriguez Metalinguistic Information Extraction for Terminology , 2005, ArXiv.

[16]  York Sure-Vetter,et al.  Learning Disjointness , 2007, ESWC.

[17]  Stephan Bloehdorn,et al.  Clustering of Polysemic Words , 2006, GfKl.

[18]  Manuel Montes-y-Gómez,et al.  Using Lexical Patterns for Extracting Hyponyms from the Web , 2007, MICAI.

[19]  Víctor Jesús Sosa Sosa,et al.  Structuring Taxonomies by using Linguistic Patterns and WordNet on Web Search , 2011, KEOD.

[20]  Philipp Cimiano,et al.  Ontology Learning from Text: Methods, Evaluation and Applications , 2005 .

[21]  Steffen Staab,et al.  Learning by googling , 2004, SKDD.

[22]  Paul Buitelaar,et al.  RelExt: A Tool for Relation Extraction from Text in Ontology Extension , 2005, SEMWEB.

[23]  Mehrnoush Shamsfard,et al.  Learning ontologies from natural language texts , 2004, Int. J. Hum. Comput. Stud..

[24]  Timothy W. Finin,et al.  Enabling Technology for Knowledge Sharing , 1991, AI Mag..

[25]  Philipp Cimiano,et al.  Learning Patterns from the Web - Evaluating the Evaluation Functions - Extended Abstract , 2006 .

[26]  David Snchez Domain Ontology Learning from the Web , 2008 .

[27]  David Sánchez,et al.  Discovery of Relation Axioms from the Web , 2010, KSEM.

[28]  Steffen Staab,et al.  International Handbooks on Information Systems , 2013 .

[29]  Ed Hare,et al.  What Is This? , 2020, PsycTESTS Dataset.

[30]  Eduard Hovy,et al.  Towards terascale knowledge acquisition , 2004, COLING 2004.

[31]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[32]  Dieter Fensel,et al.  Knowledge Engineering: Principles and Methods , 1998, Data Knowl. Eng..

[33]  Patrick Pantel,et al.  Clustering by committee , 2003 .

[34]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[35]  Oren Etzioni,et al.  What Is This, Anyway: Automatic Hypernym Discovery , 2009, AAAI Spring Symposium: Learning by Reading and Learning to Read.

[36]  Steffen Staab,et al.  Ontology Learning for the Semantic Web , 2002, IEEE Intell. Syst..

[37]  Johanna Völker,et al.  Lexico-Logical Acquisition of OWL DL Axioms , 2008, ICFCA.

[38]  James Mayfield,et al.  Learning Named Entity Hyponyms for Question Answering , 2008, IJCNLP.

[39]  Gerhard Weikum,et al.  YAGO: A Large Ontology from Wikipedia and WordNet , 2008, J. Web Semant..

[40]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .