Joint Image-Text News Topic Detection and Tracking with And-Or Graph Representation

In this paper, we aim to develop a method for automatically detecting and tracking topics in broadcast news. We present a hierarchical And-Or graph (AOG) to jointly represent the latent structure of both texts and visuals. The AOG embeds a context sensitive grammar that can describe the hierarchical composition of news topics by semantic elements about people involved, related places and what happened, and model contextual relationships between elements in the hierarchy. We detect news topics through a cluster sampling process which groups stories about closely related events. Swendsen-Wang Cuts (SWC), an effective cluster sampling algorithm, is adopted for traversing the solution space and obtaining optimal clustering solutions by maximizing a Bayesian posterior probability. Topics are tracked to deal with the continuously updated news streams. We generate topic trajectories to show how topics emerge, evolve and disappear over time. The experimental results show that our method can explicitly describe the textual and visual data in news videos and produce meaningful topic trajectories. Our method achieves superior performance compared to state-of-the-art methods on both a public dataset Reuters-21578 and a self-collected dataset named UCLA Broadcast News Dataset.

[1]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[2]  Song-Chun Zhu,et al.  Visual Persuasion: Inferring Communicative Intents of Images , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Kewei Tu,et al.  Mapping Energy Landscapes of Non-Convex Learning Problems , 2014, ArXiv.

[4]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[5]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[6]  Chong-Wah Ngo,et al.  Multimodal News Story Clustering With Pairwise Visual Near-Duplicate Constraint , 2008, IEEE Transactions on Multimedia.

[7]  Shih-Fu Chang,et al.  News rover: exploring topical structures and serendipity in heterogeneous multimedia news , 2013, MM '13.

[8]  Christopher D. Manning,et al.  Incorporating Non-local Information into Information Extraction Systems by Gibbs Sampling , 2005, ACL.

[9]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[11]  Bin Wang,et al.  A probabilistic model for retrospective news event detection , 2005, SIGIR '05.

[12]  Hagai Attias,et al.  Topic regression multi-modal Latent Dirichlet Allocation for image annotation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Saif Mohammad,et al.  CROWDSOURCING A WORD–EMOTION ASSOCIATION LEXICON , 2013, Comput. Intell..

[14]  Tim Groeling,et al.  Who's the Fairest of them All? An Empirical Test for Partisan Bias on ABC, CBS, NBC, and Fox News , 2008 .

[15]  David M. Blei,et al.  Syntactic Topic Models , 2008, NIPS.

[16]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[17]  Huizhong Chen,et al.  EigenNews: a personalized news video delivery platform , 2013, MM '13.

[18]  Chong-Wah Ngo,et al.  Threading and autodocumenting news videos: a promising solution to rapidly browse news topics , 2006, IEEE Signal Processing Magazine.

[19]  Matti Pietikäinen,et al.  Face Description with Local Binary Patterns: Application to Face Recognition , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qingming Huang,et al.  Effective Multimodality Fusion Framework for Cross-Media Topic Detection , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[22]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[23]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[24]  Jiebo Luo,et al.  Geo-location inference on news articles via multimodal pLSA , 2012, ACM Multimedia.

[25]  Jing Zhao,et al.  Document Clustering Based on Nonnegative Sparse Matrix Factorization , 2005, ICNC.

[26]  Mubarak Shah,et al.  Tracking news stories across different sources , 2005, MULTIMEDIA '05.

[27]  Adrian Barbu,et al.  Generalizing Swendsen-Wang to sampling arbitrary posterior probabilities , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yiming Yang,et al.  Topic Detection and Tracking Pilot Study Final Report , 1998 .

[29]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[30]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[31]  George Karypis,et al.  A Comparison of Document Clustering Techniques , 2000 .

[32]  R. Masters,et al.  "Happy Warriors": Leaders' Facial Displays, Viewers' Emotions, and Political Support , 1988 .

[33]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[34]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[35]  Dongwoo Kim,et al.  Topic Chains for Understanding a News Corpus , 2011, CICLing.

[36]  J. Stryker,et al.  Articles Media and Marijuana: A Longitudinal Analysis of News Media Effects on Adolescents' Marijuana Use and Related Outcomes, 1977-1999 , 2003, Journal of health communication.

[37]  Song-Chun Zhu,et al.  Automated Facial Trait Judgment and Election Outcome Prediction: Social Dimensions of Face , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  Jonathan S. Morris Slanted Objectivity? Perceived Media Bias, Cable News Exposure, and Political Attitudes* , 2007 .

[39]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[40]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[41]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[42]  Shih-Fu Chang,et al.  Structured exploration of who, what, when, and where in heterogeneous multimedia news sources , 2013, ACM Multimedia.

[43]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.

[44]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[45]  Jonathan Yamron,et al.  Topic Tracking in a News Stream , 1999 .

[46]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Huizhong Chen,et al.  Eigennews: Generating and delivering personalized news video , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[48]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[49]  Shih-Fu Chang,et al.  Topic Tracking Across Broadcast News Videos with Visual Duplicates and Semantic Concepts , 2006, 2006 International Conference on Image Processing.

[50]  Gang Hua,et al.  Semi-supervised Relational Topic Model for Weakly Annotated Image Recognition in Social Media , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Wen Gao,et al.  Local Gabor binary pattern histogram sequence (LGBPHS): a novel non-statistical model for face representation and recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[52]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[53]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[54]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.