Semi-supervised Clustering Ensemble Evolved by Genetic Algorithm for Web Video Categorization

Genetic Algorithms GAs have been widely used in optimization problems for their high ability in seeking better and acceptable solutions within limited time. Clustering ensemble has emerged as another flavor of optimal solutions for generating more stable and robust partition from existing clusters. GAs have proved a major contribution to find consensus cluster partitions during clustering ensemble. Currently, web video categorization has been an ever challenging research area with the popularity of the social web. In this paper, we propose a framework for web video categorization using their textual features, video relations and web support. There are three contributions in this research work. First, we expand the traditional Vector Space Model VSM in a more generic manner as Semantic VSM S-VSM by including the semantic similarity between the feature terms. This new model has improved the clustering quality in terms of compactness high intra-cluster similarity and clearness low inter-cluster similarity. Second, we optimize the clustering ensemble process with the help of GA using a novel approach of the fitness function. We define a new measure, Pre-Paired Percentage PPP, to be used as the fitness function during the genetic cycle for optimization of clustering ensemble process. Third, the most important and crucial step of the GA is to define the genetic operators, crossover and mutation. We express these operators by an intelligent mechanism of clustering ensemble. This approach has produced more logical offspring solutions. Above stated all three contributions have shown remarkable results in their corresponding areas. Experiments on real world social-web data have been performed to validate our new incremental novelties.

[1]  Victor Maojo,et al.  Biological and Medical Data Analysis, 6th International Symposium, ISBMDA 2005, Aveiro, Portugal, November 10-11, 2005, Proceedings , 2005, ISBMDA.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Da Ruan,et al.  Consensus clustering based on constrained self-organizing map and improved Cop-Kmeans ensemble in intelligent decision support systems , 2012, Knowl. Based Syst..

[4]  Carlos Eduardo Ferreira,et al.  Advances in Bioinformatics and Computational Biology, 5th Brazilian Symposium on Bioinformatics, BSB 2010, Rio de Janeiro, Brazil, August 31-September 3, 2010. Proceedings , 2010, BSB.

[5]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[6]  Rahul Malik,et al.  VideoMule: a consensus learning approach to multi-label classification from noisy user-generated videos , 2009, MM '09.

[7]  Dragomir R. Radev,et al.  Learning cross-document structural relationships using boosting , 2003, CIKM '03.

[8]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[9]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[10]  Zhou Peng,et al.  Semi-Supervised Cluster Ensemble Model Based on Bayesian Network , 2010 .

[11]  Grant Schindler,et al.  Internet video category recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[12]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble with Prior Knowledge , 2007, BSB.

[13]  Chong-Wah Ngo,et al.  Towards google challenge: combining contextual and social information for web video categorization , 2009, ACM Multimedia.

[14]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[15]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[16]  Peng Zhou,et al.  Semi-Supervised Cluster Ensemble Model Based on Bayesian Network: Semi-Supervised Cluster Ensemble Model Based on Bayesian Network , 2011 .

[17]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[18]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[19]  Tianrui Li,et al.  Semi-supervised Clustering Ensemble for Web Video Categorization , 2013, MCS.

[20]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[21]  Tansel Özyer,et al.  Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer , 2009, Applied Intelligence.

[22]  Yi Hong,et al.  To combine steady-state genetic algorithm and ensemble learning for data clustering , 2008, Pattern Recognit. Lett..

[23]  Sang-Ho Lee,et al.  Integration Analysis of Diverse Genomic Data Using Multi-clustering Results , 2006, ISBMDA.

[24]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[25]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[26]  Yan Yang,et al.  Semi-supervised Clustering Ensemble Based on Multi-ant Colonies Algorithm , 2012, RSKT.

[27]  Pietro Perona,et al.  A walk through the web’s video clips , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[28]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[29]  M. Analoui,et al.  Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[30]  Diana McCarthy,et al.  Ranking WordNet Senses Automatically , 2004 .

[31]  Claire Cardie,et al.  Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[32]  K. Ramanathan,et al.  Recursive Self Organizing Maps with Hybrid Clustering , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[33]  Yongdong Zhang,et al.  VideoMap: an interactive video retrieval system of MCG-ICT-CAS , 2009, CIVR '09.