Semi-supervised evolutionary ensembles for Web video categorization

Evolutionary Algorithms (EA) have been developing rapidly as a powerful and general learning approach which has been used successfully to find a reasonable solution for data mining and knowledge discovery. Genetic algorithm (GA) is a kind of mainstream EA paradigm with a purpose of developing solutions for optimization problems. Clustering ensembles have emerged as an outstanding algorithm in machine learning to leverage the consensus across multiple clustering solutions and combines their predictions into a single solution with improved robustness, stability and accuracy. Multimedia advancement and popularity of the social Web has collectively provided an easy way to generate bulk of videos. Categorization of such Web videos has become a hot research challenge. In this paper, we propose a Semi-supervised Evolutionary Ensemble (SS-EE) framework for social media mining, e.g., Web Video Categorization (WVC), using their low cost textual features, intrinsic relations and extrinsic Web support. The contributions of this research work are as follows. First, we extend the traditional Vector Space Model (VSM) to Semantic VSM (S-VSM) by considering the semantic similarity between the feature terms using Normalized Google Distance (NGD) approach. Second, we define a new distance measure, Triangular Similarity (TrS) between two Textual Feature Vectors (TFV) based on the frequencies of most relevant terms in each category. Third, we iterate the clustering ensemble process with the help of GA guided by a new measure, Pre-Paired Percentage (PPP), to be used as the fitness function during the genetic cycle. Fourth, in the key steps of the GA, crossover and mutation genetic operators, we define them by an intelligent mechanism of clustering ensemble. Fifth, in order to terminate the genetic cycle, we define another new measure, Clustering Quality (Cq), based on similarity matrix and clustering labels. Experiments on real world social-Web data (YouTube) have been performed to validate the SS-EE framework.

[1]  Baoxin Li,et al.  YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Chong-Wah Ngo,et al.  Towards google challenge: combining contextual and social information for web video categorization , 2009, ACM Multimedia.

[3]  Zhi-Hua Zhou,et al.  Semi-supervised learning by disagreement , 2010, Knowledge and Information Systems.

[4]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[5]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-Objective Clustering Ensemble with Prior Knowledge , 2007, BSB.

[6]  M. Mohammadi,et al.  Clustering Ensembles Using Genetic Algorithm , 2007, 2006 International Workshop on Computer Architecture for Machine Perception and Sensing.

[7]  Xian-Sheng Hua,et al.  Multi-modality web video categorization , 2007, MIR '07.

[8]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[9]  Peng Zhou,et al.  Semi-Supervised Cluster Ensemble Model Based on Bayesian Network: Semi-Supervised Cluster Ensemble Model Based on Bayesian Network , 2011 .

[10]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[11]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[12]  Diane J. Cook,et al.  Automatic Video Classification: A Survey of the Literature , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[14]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[16]  Paul M. B. Vitányi,et al.  Normalized Google Distance of Multisets with Applications , 2013, ArXiv.

[17]  Rahul Malik,et al.  VideoMule: a consensus learning approach to multi-label classification from noisy user-generated videos , 2009, MM '09.

[18]  Hamidah Ibrahim,et al.  A review: accuracy optimization in clustering ensembles using genetic algorithms , 2011, Artificial Intelligence Review.

[19]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[20]  Yan Yang,et al.  Semi-supervised Clustering Ensemble Based on Multi-ant Colonies Algorithm , 2012, RSKT.

[21]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[22]  M. Analoui,et al.  Automatic Generation and Optimisation of Reconfigurable Financial Monte-Carlo Simulations , 2007, 2007 IEEE International Conf. on Application-specific Systems, Architectures and Processors (ASAP).

[23]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[24]  Tianrui Li,et al.  Semi-supervised Clustering Ensemble for Web Video Categorization , 2013, MCS.

[25]  Zhi-Hua Zhou,et al.  Ensemble Methods: Foundations and Algorithms , 2012 .

[26]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[27]  Timothy Baldwin,et al.  An Empirical Model of Multiword Expression Decomposability , 2003, ACL 2003.

[28]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[29]  Tansel Özyer,et al.  Parallel clustering of high dimensional data by integrating multi-objective genetic algorithm with divide and conquer , 2009, Applied Intelligence.

[30]  Yongdong Zhang,et al.  Web video categorization based on Wikipedia categories and content-duplicated open resources , 2010, ACM Multimedia.

[31]  Yi Hong,et al.  To combine steady-state genetic algorithm and ensemble learning for data clustering , 2008, Pattern Recognit. Lett..

[32]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[33]  Christopher Leckie,et al.  An Evaluation of Criteria for Measuring the Quality of Clusters , 1999, IJCAI.

[34]  K. Ramanathan,et al.  Recursive Self Organizing Maps with Hybrid Clustering , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[35]  Zhou Peng,et al.  Semi-Supervised Cluster Ensemble Model Based on Bayesian Network , 2010 .

[36]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[37]  Sang-Ho Lee,et al.  Integration Analysis of Diverse Genomic Data Using Multi-clustering Results , 2006, ISBMDA.

[38]  Grant Schindler,et al.  Internet video category recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[39]  Kurt Hornik,et al.  Voting-Merging: An Ensemble Method for Clustering , 2001, ICANN.

[40]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[41]  Tianrui Li,et al.  Semi-supervised Clustering Ensemble Evolved by Genetic Algorithm for Web Video Categorization , 2013, ADMA.

[42]  Alberto Del Bimbo,et al.  Tag suggestion and localization in user-generated videos based on social knowledge , 2010, WSM@MM.

[43]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[44]  Shai Ben-David,et al.  Measures of Clustering Quality: A Working Set of Axioms for Clustering , 2008, NIPS.

[45]  Anil K. Jain,et al.  Adaptive clustering ensembles , 2004, ICPR 2004.

[46]  Nenghai Yu,et al.  Dual linkage refinement for YouTube video topic discovery , 2010, 2010 IEEE International Conference on Multimedia and Expo.

[47]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[48]  Bin Zhu,et al.  Patterns of news dissemination through online news media: A case study in China , 2014, Inf. Syst. Frontiers.