TIE algorithm: a layer over clustering-based taxonomy generation for handling evolving data

Taxonomy is generated to effectively organize and access large volume of data. A taxonomy is a way of representing concepts that exist in data. It needs to continuously evolve to reflect changes in data. Existing automatic taxonomy generation techniques do not handle the evolution of data; therefore, the generated taxonomies do not truly represent the data. The evolution of data can be handled by either regenerating taxonomy from scratch, or allowing taxonomy to incrementally evolve whenever changes occur in the data. The former approach is not economical in terms of time and resources. A taxonomy incremental evolution (TIE) algorithm, as proposed, is a novel attempt to handle the data that evolve in time. It serves as a layer over an existing clustering-based taxonomy generation technique and allows an existing taxonomy to incrementally evolve. The algorithm was evaluated in research articles selected from the computing domain. It was found that the taxonomy using the algorithm that evolved with data needed considerably shorter time, and had better quality per unit time as compared to the taxonomy regenerated from scratch.

[1]  Sharifullah Khan,et al.  TIE: An Algorithm for Incrementally Evolving Taxonomy for Text Data , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[2]  Han-Wei Hsiao,et al.  Incorporating self-organizing map with text mining techniques for text hierarchy generation , 2015, Appl. Soft Comput..

[3]  Els Lefever,et al.  LT3: A Multi-modular Approach to Automatic Taxonomy Construction , 2015, *SEMEVAL.

[4]  Flavius Frasincar,et al.  A semantic approach for extracting domain taxonomies from text , 2014, Decis. Support Syst..

[5]  Simon Chadwick,et al.  The Data Revolution , 2013, The Chief Data Officer's Playbook.

[6]  Stefano Faralli,et al.  OntoLearn Reloaded: A Graph-Based Algorithm for Taxonomy Induction , 2013, CL.

[7]  Ian H. Witten,et al.  Constructing a Focused Taxonomy from a Document Collection , 2013, ESWC.

[8]  Flavius Frasincar,et al.  TaxoLearn: A Semantic Approach to Domain Taxonomy Learning , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[9]  Mirella Lapata,et al.  Taxonomy Induction Using Hierarchical Random Graphs , 2012, NAACL.

[10]  Timo Honkela,et al.  Learning a taxonomy from a set of text documents , 2012, Appl. Soft Comput..

[11]  S.Chandrasekhar A. Anil Kumar,et al.  Text Data Pre-processing and Dimensionality Reduction Techniques for Document Clustering , 2012 .

[12]  Lucila Ohno-Machado,et al.  Natural language processing: an introduction , 2011, J. Am. Medical Informatics Assoc..

[13]  Junjie Yao,et al.  Evolutionary taxonomy construction from dynamic tag space , 2010, World Wide Web.

[14]  Brian D. Davison,et al.  Choosing your own adventure: automatic taxonomy generation to permit many paths , 2010, CIKM.

[15]  Wei Lee Woon,et al.  Taxonomy based trend discovery of renewable energy technologies in desalination and power generation , 2010, PICMET 2010 TECHNOLOGY MANAGEMENT FOR GLOBAL ECONOMIC GROWTH.

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Tao Li,et al.  Exploiting Domain Knowledge by Automated Taxonomy Generation in Recommender Systems , 2009, EC-Web.

[18]  David Carmel,et al.  Enhancing cluster labeling using wikipedia , 2009, SIGIR.

[19]  Ali A. Ghorbani,et al.  Taxonomy Learning Using Compound Similarity Measure , 2007, IEEE/WIC/ACM International Conference on Web Intelligence (WI'07).

[20]  W. Scott Spangler,et al.  Machines in the conversation: Detecting themes and trends in informal communication streams , 2006, IBM Syst. J..

[21]  James P. Callan,et al.  Automatically labeling hierarchical clusters , 2006, DG.O.

[22]  Vipul Kashyap,et al.  TaxaMiner: an experimentation framework for automated taxonomy bootstrapping , 2005, Int. J. Web Grid Serv..

[23]  Steffen Staab,et al.  Learning Concept Hierarchies from Text Corpora using Formal Concept Analysis , 2005, J. Artif. Intell. Res..

[24]  David Sánchez,et al.  Automatic Generation of Taxonomies from the WWW , 2004, PAKM.

[25]  Sung-Shun Weng,et al.  Using text classification and multiple concepts to answer e-mails , 2004, Expert Syst. Appl..

[26]  David M. Pennock,et al.  Inferring hierarchical descriptions , 2002, CIKM '02.

[27]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[28]  A. Muller,et al.  The TaxGen framework: automating the generation of a taxonomy for a large document collection , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[29]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[30]  V. Thada,et al.  Comparison of Jaccard, Dice, Cosine Similarity Coefficient To Find Best Fitness Value for Web Retrieved Documents Using Genetic Algorithm , 2013 .

[31]  ชมนาด บุญอารีย์,et al.  The Accidental Taxonomist , 2012 .

[32]  R. K. Rao,et al.  TAXONOMY CONSTRUCTION TECHNIQUES – ISSUES AND CHALLENGES , 2011 .

[33]  Ricardo M. Marcacini,et al.  Incremental Construction of Topic Hierarchies using Hierarchical Term Clustering , 2010, SEKE.

[34]  Steven L Camiña,et al.  A comparison of taxonomy generation techniques using bibliometric methods : applied to research strategy formulation , 2010 .

[35]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[36]  Paola Velardi,et al.  TermExtractor: a Web Application to Learn the Common Terminology of Interest Groups and Research Communities , 2007 .

[37]  R. Blumberg The Problem with Unstructured Data , 2003 .

[38]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .