Multi-objective frequent termset clustering

Large media collections rapidly evolve in the World Wide Web. In addition to the targeted retrieval as is performed by search engines, browsing and explorative navigation is an important issue. Since the collections grow fast and authors most often do not annotate their web pages according to a given ontology, automatic structuring is in demand as a prerequisite for any pleasant human–computer interface. In this paper, we investigate the problem of finding alternative high-quality structures for navigation in a large collection of high-dimensional data. We express desired properties of frequent termset clustering (FTS) in terms of objective functions. In general, these functions are conflicting. This leads to the formulation of FTS clustering as a multi-objective optimization problem. The optimization is solved by a genetic algorithm. The result is a set of Pareto-optimal solutions. Users may choose their favorite type of a structure for their navigation through a collection or explore the different views given by the different optimal solutions. We explore the capability of the new approach to produce structures that are well suited for browsing on a social bookmarking data set.

[1]  Filippo Menczer,et al.  Evolutionary model selection in unsupervised learning , 2002, Intell. Data Anal..

[2]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[3]  Dominik Benz,et al.  Stop thinking, start tagging: tag semantics emerge from collaborative verbosity , 2010, WWW '10.

[4]  Tony Hammond,et al.  Social Bookmarking Tools (I): A General Overview , 2005, D Lib Mag..

[5]  Klemens Böhm,et al.  Proceedings of the International Conference on Very Large Data Bases , 2005 .

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Nikolaj Tatti,et al.  Maximum entropy based significance of itemsets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[8]  Felix Jungermann,et al.  Stream-based Community Discovery via Relational Hypergraph Factorization on Evolving Networks , 2010, LWA.

[9]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[10]  Ingo Mierswa,et al.  Information preserving multi-objective feature selection for unsupervised learning , 2006, GECCO.

[11]  Martin Ester,et al.  Frequent term-based text clustering , 2002, KDD.

[12]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[15]  Benjamin C. M. Fung,et al.  Hierarchical Document Clustering using Frequent Itemsets , 2003, SDM.

[16]  Owen Kaser,et al.  Tag-Cloud Drawing: Algorithms for Cloud Visualization , 2007, ArXiv.

[17]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[18]  Masaki Aono,et al.  Exploring overlapping clusters using dynamic re-scaling and sampling , 2006, Knowledge and Information Systems.

[19]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[20]  Andreas Hotho,et al.  BibSonomy: a social bookmark and publication sharing system , 2006 .

[21]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[22]  Filippo Menczer,et al.  Feature selection in unsupervised learning via evolutionary search , 2000, KDD '00.

[23]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[24]  Lothar Thiele,et al.  Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach , 1999, IEEE Trans. Evol. Comput..

[25]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[26]  Andreas Hotho,et al.  Mining Association Rules in Folksonomies , 2006, Data Science and Classification.

[27]  Ke Wang,et al.  Clustering transactions using large items , 1999, CIKM '99.

[28]  Yusef Hassan-Montero,et al.  Improving Tag-Clouds as Visual Information Retrieval Interfaces , 2024, 2401.04947.

[29]  JäschkeRobert,et al.  The social bookmark and publication management system bibsonomy , 2010, VLDB 2010.

[30]  Dominik Benz,et al.  The social bookmark and publication management system bibsonomy , 2010, The VLDB Journal.

[31]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[32]  Douglas H. Norrie,et al.  Agent-Based Systems for Intelligent Manufacturing: A State-of-the-Art Survey , 1999, Knowledge and Information Systems.

[33]  Carlos A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[34]  Bart Goethals,et al.  Survey on Frequent Pattern Mining , 2003 .

[35]  Ingo Mierswa,et al.  Sound Multi-objective Feature Space Transformation for Clustering , 2006, LWA.