Clustering Web video search results based on integration of multiple features

The usage of Web video search engines has been growing at an explosive rate. Due to the ambiguity of query terms and duplicate results, a good clustering of video search results is essential to enhance user experience as well as improve retrieval performance. Existing systems that cluster videos only consider the video content itself. This paper presents the first system that clusters Web video search results by fusing the evidences from a variety of information sources besides the video content such as title, tags and description. We propose a novel framework that can integrate multiple features and enable us to adopt existing clustering algorithms. We discuss our careful design of different components of the system and a number of implementation decisions to achieve high effectiveness and efficiency. A thorough user study shows that with an innovative interface showing the clustering output, our system delivers a much better presentation of search results and hence increases the usability of video search engines significantly.

[1]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[2]  Wei-Ying Ma,et al.  Learning to cluster web search results , 2004, SIGIR '04.

[3]  Amanda Spink,et al.  Real life, real users, and real needs: a study and analysis of user queries on the web , 2000, Inf. Process. Manag..

[4]  Tao Jin,et al.  A new visual search interface for web browsing , 2009, WSDM '09.

[5]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[6]  Tao Qin,et al.  Web image clustering by consistent utilization of visual features and surrounding texts , 2005, MULTIMEDIA '05.

[7]  Marcel Worring,et al.  Multimodal Video Indexing : A Review of the State-ofthe-art , 2001 .

[8]  Allison Woodruff,et al.  A comparison of the use of text summaries, plain thumbnails, and enhanced thumbnails for Web search tasks , 2002, J. Assoc. Inf. Sci. Technol..

[9]  Dawid Weiss,et al.  A survey of Web clustering engines , 2009, CSUR.

[10]  David R. Karger,et al.  Scatter/Gather: a cluster-based approach to browsing large document collections , 1992, SIGIR '92.

[11]  Yueting Zhuang,et al.  Searching for Flash Movies on the Web: A Content and Context Based Framework , 2005, World Wide Web.

[12]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[13]  Chirag Shah Tubekit: a query-based youtube crawling toolkit , 2008, JCDL '08.

[14]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[15]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[16]  Richard Chbeir,et al.  Semantic-based Merging of RSS Items , 2010, World Wide Web.

[17]  Wei-Ying Ma,et al.  IGroup: web image search results clustering , 2006, MM '06.

[18]  Takeharu Eda,et al.  The Effectiveness of Latent Semantic Analysis for Building Up a Bottom-up Taxonomy from Folksonomy Tags , 2009, World Wide Web.

[19]  David C. Gibbon,et al.  Introduction to video search engines , 2008 .

[20]  Wei-Ying Ma,et al.  Hierarchical clustering of WWW image search results using visual, textual and link information , 2004, MULTIMEDIA '04.

[21]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[22]  Yong Yu,et al.  Social Propagation: Boosting Social Annotations for Web Mining , 2009, World Wide Web.

[23]  Ming Zhu,et al.  Mining Similarities for Clustering Web Video Clips , 2008, 2008 International Conference on Computer Science and Software Engineering.

[24]  David R. Karger,et al.  Constant interaction-time scatter/gather browsing of very large document collections , 1993, SIGIR.

[25]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[26]  Beng Chin Ooi,et al.  Towards effective indexing for very large video sequence database , 2005, SIGMOD '05.

[27]  Shourya Roy,et al.  A hierarchical monothetic document clustering algorithm for summarization and browsing search results , 2004, WWW '04.

[28]  Afra Pascual,et al.  Building a Usable and Accessible Semantic Web Interaction Platform , 2010, World Wide Web.

[29]  Giansalvatore Mecca,et al.  A new algorithm for clustering search results , 2007, Data Knowl. Eng..

[30]  Wei-Ying Ma,et al.  Iteratively clustering web images based on link and attribute reinforcements , 2005, ACM Multimedia.

[31]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[32]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[33]  Jing Hua,et al.  Graph theoretical framework for simultaneously integrating visual and textual features for efficient web image clustering , 2008, WWW.

[34]  Bernard J. Jansen,et al.  Real time search user behavior , 2010, CHI EA '10.

[35]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Heng Tao Shen,et al.  Indexing and Integrating Multiple Features for WWW Images , 2006, World Wide Web.

[37]  Shih-Fu Chang,et al.  Survey of compressed-domain features used in audio-visual indexing and analysis , 2003, J. Vis. Commun. Image Represent..

[38]  Elena Paslaru Bontas Simperl,et al.  Human Intelligence in the Process of Semantic Content Creation , 2010, World Wide Web.

[39]  Dawid Weiss,et al.  A concept-driven algorithm for clustering search results , 2005, IEEE Intelligent Systems.

[40]  Zi Huang,et al.  Bounded coordinate system indexing for real-time video clip search , 2009, TOIS.