"Picture the scene...";: Visually Summarising Social Media Events

Due to the advent of social media and web 2.0, we are faced with a deluge of information; recently, research efforts have focused on filtering out noisy, irrelevant information items from social media streams and in particular have attempted to automatically identify and summarise events. However, due to the heterogeneous nature of such social media streams, these efforts have not reached fruition. In this paper, we investigate how images can be used as a source for summarising events. Existing approaches have considered only textual summaries which are often poorly written, in a different language and slow to digest. Alternatively, images are "worth 1,000 words" and are able to quickly and easily convey an idea or scene. Since images in social media can also be noisy, irrelevant and repetitive, we propose new techniques for their automatic selection, ranking and presentation. We evaluate our approach on a recently created social media event data set containing 365k tweets and 50 events, for which we extend by collecting 625k related images. By conducting two crowdsourced evaluations, we firstly show how our approach overcomes the problems of automatically collecting relevant and diverse images from noisy microblog data, before highlighting the advantages of multimedia summarisation over text based approaches.

[1]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[2]  Julio Gonzalo,et al.  Towards real-time summarization of scheduled events from twitter streams , 2012, HT '12.

[3]  Charu C. Aggarwal,et al.  Event Detection in Social Streams , 2012, SDM.

[4]  Joemon M. Jose,et al.  Building a large-scale corpus for evaluating event detection on twitter , 2013, CIKM.

[5]  Stefanie Nowak,et al.  How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation , 2010, MIR '10.

[6]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[7]  Justin Zobel,et al.  Detection of near-duplicate images for web search , 2007, CIVR '07.

[8]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.

[11]  Bu-Sung Lee,et al.  Event Detection in Twitter , 2011, ICWSM.

[12]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[13]  Benjamin M. Good,et al.  Tag clouds for summarizing web search results , 2007, WWW '07.

[14]  Subbarao Kambhampati,et al.  Ranking tweets considering trust and relevance , 2012, IIWeb '12.

[15]  Jeffrey Nichols,et al.  Summarizing sporting events using twitter , 2012, IUI '12.

[16]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[17]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[18]  Jugal K. Kalita,et al.  Summarization of Twitter Microblogs , 2014, Comput. J..

[19]  Khaled Shaalan,et al.  A Survey of Web Information Extraction Systems , 2006, IEEE Transactions on Knowledge and Data Engineering.

[20]  Keith C. Clarke,et al.  Interactive Visual Exploration of a Large Spatio-temporal Dataset: Reflections on a Geovisualization Mashup. , 2007, IEEE Transactions on Visualization and Computer Graphics.

[21]  Haofen Wang,et al.  Towards Effective Event Detection, Tracking and Summarization on Microblog Data , 2011, WAIM.

[22]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[23]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[24]  Michael S. Bernstein,et al.  Twitinfo: aggregating and visualizing microblogs for event exploration , 2011, CHI.

[25]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[26]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[27]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[28]  Deepayan Chakrabarti,et al.  Event Summarization Using Tweets , 2011, ICWSM.

[29]  Joemon M. Jose,et al.  Exploiting Twitter and Wikipedia for the annotation of event images , 2014, SIGIR.

[30]  Matthias Hirth,et al.  Cheat-Detection Mechanisms for Crowdsourcing , 2012 .

[31]  Fei Wang,et al.  NPIC: Hierarchical Synthetic Image Classification Using Image Search and Generic Features , 2006, CIVR.

[32]  László Böszörményi,et al.  Summarization and Presentation of Real-Life Events Using Community-Contributed Content , 2012, MMM.

[33]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[34]  Jisup Hong,et al.  How Good is the Crowd at "real" WSD? , 2011, Linguistic Annotation Workshop.

[35]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[36]  Benoit Huet,et al.  Socially motivated multimedia topic timeline summarization , 2013, SAM '13.

[37]  Jeroen B. P. Vuurens,et al.  How Much Spam Can You Take? An Analysis of Crowdsourcing Results to Increase Accuracy , 2011 .

[38]  Zhenjun Tang,et al.  Perceptual Hashing for Color Images Using Invariant Moments , 2012 .

[39]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.