Event Identification in Social Media

Social media sites such as Flickr, YouTube, and Facebook host substantial amounts of user-contributed materials (e.g., photographs, videos, and textual content) for a wide variety of real-world events. These range from widely known events, such as the presidential inauguration, to smaller, community-specific events, such as annual conventions and local gatherings. By identifying these events and their associated user-contributed social media documents, which is the focus of this paper, we can greatly improve local event browsing and search in state-of-the-art search engines. To address our problem of focus, we exploit the rich “context” associated with social media content, including user-provided annotations (e.g., title, tags) and automatically generated information (e.g., content creation time). We form a variety of representations of social media documents using different context dimensions, and combine these dimensions in a principled way into a single clustering solution—where each document cluster ideally corresponds to one event—using a weighted ensemble approach. We evaluate our approach on a large-scale, real-world dataset of event images, and report promising performance with respect to several baseline approaches. Our preliminary experiments suggest that our ensemble approach identifies events, and their associated images, more effectively than the state-of-the-art strategies on which we build.

[1]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[2]  James Allan,et al.  Topic detection and tracking: event-based information organization , 2002 .

[3]  Mor Naaman,et al.  Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[4]  Carlotta Domeniconi,et al.  Weighted cluster ensembles: Methods and analysis , 2009, TKDD.

[5]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Gilad Mishne,et al.  Finding high-quality content in social media , 2008, WSDM '08.

[7]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[8]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[9]  James Allan,et al.  Text classification and named entities for new event detection , 2004, SIGIR '04.

[10]  Stelios C. A. Thomopoulos,et al.  Dignet: an unsupervised-learning clustering algorithm for clustering and data fusion , 1995 .

[11]  Yiming Yang,et al.  Learning approaches for detecting and tracking news events , 1999, IEEE Intell. Syst..

[12]  James Allan,et al.  Introduction to topic detection and tracking , 2002 .

[13]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[14]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[15]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[16]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[17]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[18]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[19]  R. Sinnott Virtues of the Haversine , 1984 .

[20]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[21]  Helena Ahonen-Myka,et al.  Simple Semantics in Topic Detection and Tracking , 2004, Information Retrieval.

[22]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[23]  Luis Gravano,et al.  An investigation of linguistic features and clustering algorithms for topical document clustering , 2000, SIGIR '00.

[24]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[25]  Mor Naaman,et al.  Towards automatic extraction of event and place semantics from flickr tags , 2007, SIGIR.

[26]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[27]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[28]  Kuo Zhang,et al.  New event detection based on indexing-tree and named entity , 2007, SIGIR.