Web topic detection using a ranked clustering-like pattern across similarity cascades

In multi-media and social media communities, web topic detection poses two main difficulties that conventional approaches can barely handle: 1) there are large inter-topic variations among web topics; 2) supervised information is rare to identify the real topics. In this paper, we address these problems from the similarity diffusion perspective among objects on web, and present a clustering-like pattern across similarity cascades (SCs). SCs are a series of subgraphs generated by truncating a weighted graph with a set of thresholds, and then maximal cliques are used to describe the topic candidates. Poisson deconvolution is adopted to efficiently identify the real topics from these topic candidates. Experiments demonstrate that our approach outperforms the state-of-the-arts on two datasets. In addition, we report accuracy v.s. false positives per topic (FPPT) curves for performance evaluation. To our knowledge, this is the first complete evaluation of web topic detection at the topic-wise level, and it establishes a new benchmark for this problem.

[1]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .

[2]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[3]  Kazuhisa Makino,et al.  New Algorithms for Enumerating All Maximal Cliques , 2004, SWAT.

[4]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[5]  Akira Tanaka,et al.  The worst-case time complexity for generating all maximal cliques and computational experiments , 2006, Theor. Comput. Sci..

[6]  Qingming Huang,et al.  An effective multi-clue fusion approach for web video topic detection , 2012, ACM Multimedia.

[7]  David Easley,et al.  Networks, Crowds, and Markets - Reasoning About a Highly Connected World , 2010 .

[8]  S. Bikhchandani,et al.  You have printed the following article : A Theory of Fads , Fashion , Custom , and Cultural Change as Informational Cascades , 2007 .

[9]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[10]  Qingming Huang,et al.  Cross-media topic detection: A multi-modality fusion framework , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Niall M. Adams,et al.  Canonical Correlation Analysis for Detecting Changes in Network Structure , 2012, 2012 IEEE 12th International Conference on Data Mining Workshops.

[12]  William H. Richardson,et al.  Bayesian-Based Iterative Method of Image Restoration , 1972 .

[13]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[14]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[15]  Shih-Fu Chang,et al.  Semantic video clustering across sources using bipartite spectral clustering , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).