Robust Latent Poisson Deconvolution From Multiple Features for Web Topic Detection

Detecting “hot” topics from the enormous usergenerated content (UGC) data on web poses two main difficulties that the conventional approaches can barely handle:1) poor feature representations from noisy images or short texts, and 2) uncertain roles of modalities where the visual content is either highly or weakly relevant to the textual cues due to the less-constrained UGC. In this paper, following the detection-by-ranking approach, we address above challenges by learning a robust latent representation from multiple, noisy and a high probability of the complementary features. Both the textual features and the visual ones are encoded into a k-nearest neighbor hybrid similarity graph (HSG), where nonnegative matrix factorization using random walk is introduced to generate topic candidates. An efficient fusion of multiple HSGs is then done by a latent poisson deconvolution, which consists of a poisson deconvolution with sparse basis similarity for each edge. Experiments show significantly improved accuracy of the proposed approach in comparison with the state-of-the-art methods on two public datasets.

[1]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[2]  Yi Ma,et al.  The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices , 2010, Journal of structural biology.

[3]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[4]  Yousef Saad,et al.  Fast Approximate kNN Graph Construction for High Dimensional Data via Recursive Lanczos Bisection , 2009, J. Mach. Learn. Res..

[5]  Qingming Huang,et al.  An effective multi-clue fusion approach for web video topic detection , 2012, ACM Multimedia.

[6]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[7]  Qingming Huang,et al.  Cross-media topic detection: A multi-modality fusion framework , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[8]  Yiannis Kompatsiaris,et al.  Cluster-Based Landmark and Event Detection for Tagged Photo Collections , 2011, IEEE MultiMedia.

[9]  Arindam Banerjee,et al.  Topic Models over Text Streams: A Study of Batch and Online Unsupervised Learning , 2007, SDM.

[10]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[11]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[13]  Qingming Huang,et al.  Unsupervised Web Topic Detection Using A Ranked Clustering-Like Pattern Across Similarity Cascades , 2015, IEEE Transactions on Multimedia.

[14]  Min Zhang,et al.  Automatic online news issue construction in web environment , 2008, WWW.

[15]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[16]  Philip S. Yu,et al.  A General Model for Multiple View Unsupervised Learning , 2008, SDM.

[17]  Dafna Shahaf,et al.  Connecting the dots between news articles , 2011, IJCAI 2011.

[18]  Edwin V. Bonilla,et al.  Improving Topic Coherence with Regularized Topic Models , 2011, NIPS.

[19]  Timothy Baldwin,et al.  Automatically Constructing a Normalisation Dictionary for Microblogs , 2012, EMNLP.

[20]  Avideh Zakhor,et al.  Efficient video similarity measurement with video signature , 2002, Proceedings. International Conference on Image Processing.

[21]  Christopher J. C. Burges,et al.  Spectral clustering and transductive learning with multiple views , 2007, ICML '07.

[22]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[23]  Sham M. Kakade,et al.  Multi-view clustering via canonical correlation analysis , 2009, ICML '09.

[24]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[25]  Dong Liu,et al.  Robust late fusion with rank minimization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[27]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[28]  Yong Tang,et al.  Rank Aggregation via Low-Rank and Structured-Sparse Decomposition , 2013, AAAI.

[29]  Erkki Oja,et al.  Clustering by Nonnegative Matrix Factorization Using Graph Random Walk , 2012, NIPS.

[30]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[31]  Yongdong Zhang,et al.  Tracking Web Video Topics: Discovery, Visualization, and Monitoring , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[32]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[33]  Yiannis Kompatsiaris,et al.  Sensing Trending Topics in Twitter , 2013, IEEE Transactions on Multimedia.

[34]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Chong-Wah Ngo,et al.  Novelty detection for cross-lingual news stories with visual duplicates and speech transcripts , 2007, ACM Multimedia.

[37]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[38]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[39]  Qi He,et al.  Keep It Simple with Time: A Reexamination of Probabilistic Topic Detection Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Aixin Sun,et al.  Query-Guided Event Detection From News and Blog Streams , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[41]  Leon Wenliang Zhong,et al.  Fast Stochastic Alternating Direction Method of Multipliers , 2013, ICML.

[42]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Dacheng Tao,et al.  A Survey on Multi-view Learning , 2013, ArXiv.

[44]  Stéphane Marchand-Maillet,et al.  Information Fusion in Multimedia Information Retrieval , 2007, Adaptive Multimedia Retrieval.

[45]  Jintao Li,et al.  The use of topic evolution to help users browse and find answers in news video corpus , 2007, ACM Multimedia.

[46]  Ee-Peng Lim,et al.  Analyzing feature trajectories for event detection , 2007, SIGIR.

[47]  Zhi-Quan Luo,et al.  On the linear convergence of the alternating direction method of multipliers , 2012, Mathematical Programming.