Social web video clustering based on multi-modal and clustering ensemble

Abstract Web videos are rich resources for people to satisfy their information and entertainment needs. Previous studies have applied clustering methods by using textual information of videos tagged by the up-loaders to perform web video categorization, which helps users easily find videos that they really want and then increase the user’s satisfaction. However, web video categorization remains a challenging task due to the difficulties in accuracy measuring the semantic relation between terms in videos. In this paper, a novel framework for social web video clustering is proposed by improving the similarity calculation method of web videos and using the clustering ensemble. It consists of the following steps: 1) A new semantic based on Vector Space Model (VSM) is defined by considering the semantic relation of terms obtained from the lexical reference system (WordNet). 2) Word2vec is used to capture the continuous vectors as semantic information in the form of vector set of terms in a document. 3) The comprehensive extension of Semantic VSM by utilizing the Normalized Google Distance is presented. 4) The linear combining function is embodied to combine the similarity based on the optimal values of the parameter to control the weights of models before applying them to clustering paradigms and the Clustering Ensemble is employed to integrate the results of each clustering with Must-Link constraint. Experimental evaluations on real-world social web video datasets demonstrate that the proposed method effectively facilitates the clustering and achieves promising performance.

[1]  Can Wang,et al.  A link-based approach to semantic relation analysis , 2015, Neurocomputing.

[2]  Thanh Duc Ngo,et al.  Clustering web video search results with convolutional neural networks , 2016, 2016 3rd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS).

[3]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[4]  Xiao Wu,et al.  Web video categorization using category-predictive classifiers and category-specific concept classifiers , 2016, Neurocomputing.

[5]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[6]  Jiaheng Lu,et al.  Clustering Web video search results based on integration of multiple features , 2010, World Wide Web.

[7]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[8]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[9]  Ertunc Erdil,et al.  An efficient and scalable family of algorithms for combining clusterings , 2013, Eng. Appl. Artif. Intell..

[10]  Grant Schindler,et al.  Internet video category recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[11]  Xian-Sheng Hua,et al.  Multi-modality web video categorization , 2007, MIR '07.

[12]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[13]  Hao Wang,et al.  Parallel Semi-Supervised Multi-Ant Colonies Clustering Ensemble Based on MapReduce Methodology , 2018, IEEE Transactions on Cloud Computing.

[14]  Lijun Liu,et al.  An Efficient Method for Document Categorization Based on Word2vec and Latent Semantic Analysis , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[15]  Yongdong Zhang,et al.  Web video categorization based on Wikipedia categories and content-duplicated open resources , 2010, ACM Multimedia.

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Paul M. B. Vitányi,et al.  Automatic Meaning Discovery Using Google , 2006, Kolmogorov Complexity and Applications.

[18]  Hiroyuki Kitagawa,et al.  Effective web video clustering using playlist information , 2012, SAC '12.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[22]  Xiao Hua Chen,et al.  A WordNet-based semantic similarity measurement combining edge-counting and information content theory , 2015, Eng. Appl. Artif. Intell..

[23]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[24]  Mohamed S. Kamel,et al.  Statistical semantics for enhancing document clustering , 2011, Knowledge and Information Systems.

[25]  Luciano Sbaiz,et al.  Finding meaning on YouTube: Tag recommendation and category discovery , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Wenyin Liu,et al.  A short text modeling method combining semantic and statistical information , 2010, Inf. Sci..

[27]  Amin Nikanjam,et al.  An Evolutionary Approach to Clustering Ensemble , 2008, 2008 Fourth International Conference on Natural Computation.

[28]  Tianrui Li,et al.  Semi-supervised evolutionary ensembles for Web video categorization , 2015, Knowl. Based Syst..

[29]  Igor D. D. Curcio,et al.  Detecting events by clustering videos from large media databases , 2010, EiMM '10.

[30]  Yujie Liu,et al.  Newdle: Interactive Visual Exploration of Large Online News Collections , 2010, IEEE Computer Graphics and Applications.

[31]  Baoxin Li,et al.  YouTubeCat: Learning to categorize wild web videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Chong-Wah Ngo,et al.  Towards google challenge: combining contextual and social information for web video categorization , 2009, ACM Multimedia.

[33]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[34]  Victor Maojo,et al.  A context vector model for information retrieval , 2002, J. Assoc. Inf. Sci. Technol..

[35]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[36]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[37]  Yang Song,et al.  Taxonomic classification for web-based videos , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Chong-Wah Ngo,et al.  Boosting web video categorization with contextual information from social web , 2012, World Wide Web.

[39]  Yen-Liang Chen,et al.  A novel recommendation model with Google similarity , 2016, Decis. Support Syst..

[40]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Zahoor Ali Khan,et al.  Semi-supervised Clustering Ensemble by Voting , 2012, ArXiv.

[42]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Hareton K. N. Leung,et al.  Incremental Semi-Supervised Clustering Ensemble for High Dimensional Data Clustering , 2016, IEEE Trans. Knowl. Data Eng..

[44]  Tianrui Li,et al.  Social Web Videos Clustering Based on Ensemble Technique , 2016, IJCRS.

[45]  Vahab S. Mirrokni,et al.  Large-Scale Community Detection on YouTube for Topic Discovery and Exploration , 2011, ICWSM.

[46]  Yan Yang,et al.  Semi-supervised Clustering Ensemble Based on Multi-ant Colonies Algorithm , 2012, RSKT.

[47]  Xiaoli Z. Fern,et al.  Clustering Ensembles Using Ants Algorithm , 2009, IWINAC.

[48]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[49]  Emilio L. Zapata,et al.  A Clustering Technique for Video Copy Detection , 2007, IbPRIA.

[50]  Tao Mei,et al.  Automatic Video Genre Categorization using Hierarchical SVM , 2006, 2006 International Conference on Image Processing.

[51]  Wei Tang,et al.  Clusterer ensemble , 2006, Knowl. Based Syst..

[52]  Tu-Anh Nguyen-Hoang,et al.  Using Textual Semantic Similarity to Improve Clustering Quality of Web Video Search Results , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).