Growing Fields of Interest - Using an Expand and Reduce Strategy for Domain Model Extraction

Domain hierarchies are widely used as models underlying information retrieval tasks. Formal ontologies and taxonomies enrich such hierarchies further with properties and relationships but require manual effort; therefore they are costly to maintain, and often stale. Folksonomies and vocabularies lack rich category structure. Classification and extraction require the coverage of vocabularies and the alterability of folksonomies and can largely benefit from category relationships and other properties. With Doozer, a program for building conceptual models of information domains, we want to bridge the gap between the vocabularies and Folksonomies on the one side and the rich, expert-designed ontologies and taxonomies on the other. Doozer mines Wikipedia to produce tight domain hierarchies, starting with simple domain descriptions. It also adds relevancy scores for use in automated classification of information. The output model is described as a hierarchy of domain terms that can be used immediately for classifiers and IR systems or as a basis for manual or semi-automatic creation of formal ontologies.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[3]  William A. Woods,et al.  Conceptual Indexing: A Better Way to Organize Knowledge , 1997 .

[4]  C. Fellbaum An Electronic Lexical Database , 1998 .

[5]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[6]  Steffen Staab,et al.  Comparing Conceptual, Divise and Agglomerative Clustering for Learning Taxonomies from Text , 2004, ECAI.

[7]  Asunción Gómez-Pérez,et al.  Why Evaluate Ontology Technologies? Because It Works! , 2004, IEEE Intell. Syst..

[8]  Dennis N. Ocholla,et al.  Proceedings of ISSI 2007 - 11th International Conference of the International Society for Scientometrics and Informetrics , 2005 .

[9]  J. Voß Measuring Wikipedia , 2005 .

[10]  Vipul Kashyap,et al.  TaxaMiner: an experimentation framework for automated taxonomy bootstrapping , 2005, Int. J. Web Grid Serv..

[11]  Marko Grobelnik,et al.  A SURVEY OF ONTOLOGY EVALUATION TECHNIQUES , 2005 .

[12]  Stephan Bloehdorn,et al.  Learning Ontologies to Improve Text Clustering and Classification , 2005, GfKl.

[13]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[14]  How Contents Influence Clustering Features in the Web , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[15]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[16]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[17]  Somnath Banerjee,et al.  Clustering short texts using wikipedia , 2007, SIGIR.

[18]  Mohammad Nauman,et al.  Using Personalized Web Search for Enhancing Common Sense and Folksonomy Based Intelligent Search Systems , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[19]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[20]  Martin Hepp,et al.  Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management , 2007, IEEE Internet Computing.

[21]  Nassim Nicholas Taleb,et al.  The Black Swan: The Impact of the Highly Improbable , 2007 .

[22]  Pavel Velikhov,et al.  Semantic Relatedness Metric for Wikipedia Concepts Based on Link Analysis and its Application to Word Sense Disambiguation , 2008, SYRCoDIS.

[23]  Steffen Staab Why Evaluate Ontology Technologies ? , 2009 .