University of Birmingham TIE algorithm: A layer over clustering-based taxonomy generation for handling an evolving data

Taxonomy is generated to effectively organize and access data that is large in volume, as taxonomy is a way of representing concepts that exist in data. It needs to be evolved to reflect changes occur continuously in data. Existing automatic taxonomy generation techniques do not handle the evolution of data, therefore their generated taxonomies do not truly represent the data. The evolution of data can be handled either by regenerating taxonomy from scratch, or incrementally evolving taxonomy whenever changes occur in the data. The former approach is not economical subject to time and resources. Taxonomy incremental evolution (TIE) algorithm, proposed in this paper, is a novel attempt to handle an evolving data. It serves as a layer over an existing clustering-based taxonomy generation technique and incrementally evolves an existing taxonomy. The algorithm was evaluated on scholarly articles selected from computing domain. It was found that the algorithm evolves taxonomy in a considerably shorter period of time, having better quality per unit time as compared to the taxonomy regenerated from scratch.

[1]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Jochen Dörre,et al.  The TaxGen Framework: Automating the Generation of a Taxonomy for a Large Document Collection , 1999, HICSS.

[4]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[5]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[6]  R. Blumberg The Problem with Unstructured Data , 2003 .

[7]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[8]  Sung-Shun Weng,et al.  Using text classification and multiple concepts to answer e-mails , 2004, Expert Syst. Appl..

[9]  Vipul Kashyap,et al.  TaxaMiner: an experimentation framework for automated taxonomy bootstrapping , 2005, Int. J. Web Grid Serv..

[10]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[11]  W. Scott Spangler,et al.  Machines in the conversation: Detecting themes and trends in informal communication streams , 2006, IBM Syst. J..

[12]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.

[13]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[14]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Common Terminology of Interest Groups and Research Communities , 2007 .

[15]  Mehdi Hosseini,et al.  Taxonomy Learning Using Compound Similarity Measure , 2007, International Conference on Wirtschaftsinformatik.

[16]  David Carmel,et al.  Enhancing cluster labeling using wikipedia , 2009, SIGIR.

[17]  Tao Li,et al.  Exploiting Domain Knowledge by Automated Taxonomy Generation in Recommender Systems , 2009, EC-Web.

[18]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[19]  Steven L Camiña,et al.  A comparison of taxonomy generation techniques using bibliometric methods : applied to research strategy formulation , 2010 .

[20]  Wei Lee Woon,et al.  Taxonomy based trend discovery of renewable energy technologies in desalination and power generation , 2010, PICMET 2010 TECHNOLOGY MANAGEMENT FOR GLOBAL ECONOMIC GROWTH.

[21]  Ricardo M. Marcacini,et al.  Incremental Construction of Topic Hierarchies using Hierarchical Term Clustering , 2010, SEKE.

[22]  Brian D. Davison,et al.  Choosing your own adventure: automatic taxonomy generation to permit many paths , 2010, CIKM.

[23]  Junjie Yao,et al.  Evolutionary taxonomy construction from dynamic tag space , 2010, World Wide Web.

[24]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[25]  R. K. Rao,et al.  TAXONOMY CONSTRUCTION TECHNIQUES – ISSUES AND CHALLENGES , 2011 .

[26]  ชมนาด บุญอารีย์,et al.  The Accidental Taxonomist , 2012 .

[27]  Flavius Frasincar,et al.  TaxoLearn: A Semantic Approach to Domain Taxonomy Learning , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[28]  Timo Honkela,et al.  Learning a taxonomy from a set of text documents , 2012, Appl. Soft Comput..

[29]  Mirella Lapata,et al.  Taxonomy Induction Using Hierarchical Random Graphs , 2012, NAACL.

[30]  S.Chandrasekhar A. Anil Kumar,et al.  Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering , 2012 .

[31]  Ian H. Witten,et al.  Constructing a Focused Taxonomy from a Document Collection , 2013, ESWC.

[32]  V. Thada,et al.  Comparison of Jaccard, Dice, Cosine Similarity Coefficient To Find Best Fitness Value for Web Retrieved Documents Using Genetic Algorithm , 2013 .

[33]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[34]  Flavius Frasincar,et al.  A semantic approach for extracting domain taxonomies from text , 2014, Decis. Support Syst..

[35]  Han-Wei Hsiao,et al.  Incorporating self-organizing map with text mining techniques for text hierarchy generation , 2015, Appl. Soft Comput..

[36]  Els Lefever,et al.  LT3: A Multi-modular Approach to Automatic Taxonomy Construction , 2015, *SEMEVAL.

[37]  Sharifullah Khan,et al.  TIE: An Algorithm for Incrementally Evolving Taxonomy for Text Data , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).