Burst-aware data fusion for microblog search

We consider the problem of searching posts in microblog environments. We frame this microblog post search problem as a late data fusion problem. Previous work on data fusion has mainly focused on aggregating document lists based on retrieval status values or ranks of documents without fully utilizing temporal features of the set of documents being fused. Additionally, previous work on data fusion has often worked on the assumption that only documents that are highly ranked in many of the lists are likely to be of relevance. We propose BurstFuseX, a fusion model that not only utilizes a microblog post’s ranking information but also exploits its publication time. BurstFuseX builds on an existing fusion method and rewards posts that are published in or near a burst of posts that are highly ranked in many of the lists being aggregated. We experimentally verify the effectiveness of the proposed late data fusion algorithm, and demonstrate that in terms of mean average precision it significantly outperforms the standard, state-of-the-art fusion approaches as well as burst or time-sensitive retrieval methods.

[1]  Luis Gravano,et al.  Answering General Time-Sensitive Queries , 2012, IEEE Trans. Knowl. Data Eng..

[2]  Ronald Fagin,et al.  Efficient similarity search and classification via rank aggregation , 2003, SIGMOD '03.

[3]  Miles Efron,et al.  Information search and retrieval in microblogs , 2011, J. Assoc. Inf. Sci. Technol..

[4]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[5]  Fernando Diaz,et al.  Regularizing ad hoc retrieval scores , 2005, CIKM '05.

[6]  M. M. Sufyan Beg Parallel rank aggregation for theWorld Wide Web , 2004 .

[7]  Javed A. Aslam,et al.  Condorcet fusion for improved retrieval , 2002, CIKM '02.

[8]  Iadh Ounis,et al.  Overview of the TREC 2011 Microblog Track , 2011, TREC.

[9]  Nick Koudas,et al.  Identifying, attributing and describing spatial bursts , 2010, Proc. VLDB Endow..

[10]  Yue Liu,et al.  ICTNET at Microblog Track TREC 2012 , 2012, TREC.

[11]  Ben He,et al.  GUCAS at TREC 2011 Microblog Track , 2011, TREC.

[12]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[13]  Fabio Crestani,et al.  Qualitative , and Quantitative Analyses of Small-Document Approaches to Resource Selection , 2014 .

[14]  Fernando Diaz,et al.  Regularizing query-based retrieval scores , 2007, Information Retrieval.

[15]  Luo Si,et al.  Mixture model with multiple centralized retrieval algorithms for result merging in federated search , 2012, SIGIR '12.

[16]  W. Bruce Croft Advances in Informational Retrieval: Recent Research from the Center for Intelligent Information Retrieval , 2000 .

[17]  Patrick Lambert,et al.  Data fusion for the management of multimedia documents , 2007, 2007 10th International Conference on Information Fusion.

[18]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[19]  Tiejun Zhao,et al.  HIT at TREC 2012 Microblog Track , 2012, TREC.

[20]  Kazuhiro Seki,et al.  Improving pseudo-relevance feedback via tweet selection , 2013, CIKM.

[21]  Lambert Schomaker,et al.  Variants of the Borda count method for combining ranked classifier hypotheses , 2000 .

[22]  Jungyun Seo,et al.  SiteQ: Engineering High Performance QA System Using Lexico-Semantic Pattern Matching and Shallow NLP , 2001, TREC.

[23]  W. Bruce Croft,et al.  Evaluating Text Representations for Retrieval of the Best Group of Documents , 2008, ECIR.

[24]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[25]  Rui Li,et al.  A Time-Aware Language Model for Microblog Retrieval , 2012, TREC.

[26]  Dimitrios Gunopulos,et al.  On burstiness-aware search for document sequences , 2009, KDD.

[27]  Dan Roth,et al.  Unsupervised rank aggregation with distance-based models , 2008, ICML '08.

[28]  Maarten de Rijke,et al.  Finding knowledgeable groups in enterprise corpora , 2013, SIGIR.

[29]  Prasenjit Majumder,et al.  Query Expansion for Microblog Retrieval , 2011, TREC.

[30]  M. de Rijke,et al.  Linking online news and social media , 2011, WSDM '11.

[31]  Giorgio Gambosi,et al.  FUB, IASI-CNR, UNIVAQ at TREC 2011 Microblog Track , 2011, Text Retrieval Conference.

[32]  Donna Harman,et al.  Information Processing and Management , 2022 .

[33]  Pinar Donmez,et al.  On the local optimality of LambdaRank , 2009, SIGIR.

[34]  Michael Culbertson Information Search and Retrieval , 2007 .

[35]  W. Bruce Croft,et al.  Time-based language models , 2003, CIKM '03.

[36]  M. de Rijke,et al.  Fusion helps diversification , 2014, SIGIR.

[37]  Walter L. Ruzzo,et al.  A Linear Time Algorithm for Finding All Maximal Scoring Subsequences , 1999, ISMB.

[38]  Mary Beth Rosson,et al.  How and why people Twitter: the role that micro-blogging plays in informal communication at work , 2009, GROUP.

[39]  Brendan T. O'Connor,et al.  TweetMotif: Exploratory Search and Topic Summarization for Twitter , 2010, ICWSM.

[40]  Chun Chen,et al.  Online detection of bursty events and their evolution in news streams , 2010, Journal of Zhejiang University SCIENCE C.

[41]  Avi Arampatzis,et al.  Unsupervised linear score normalization revisited , 2012, SIGIR '12.

[42]  Mohand Boughanem,et al.  IRIT at TREC Microblog 2012: adhoc Task , 2012, TREC.

[43]  Harry Shum,et al.  An Empirical Study on Learning to Rank of Tweets , 2010, COLING.

[44]  Luo Si,et al.  A weighted curve fitting method for result merging in federated search , 2011, SIGIR '11.

[45]  Mohan S. Kankanhalli,et al.  Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.

[46]  M. de Rijke,et al.  Incorporating Query Expansion and Quality Indicators in Searching Microblog Posts , 2011, ECIR.

[47]  Luo Si,et al.  Search result diversification in resource selection for federated search , 2013, SIGIR.

[48]  W. Bruce Croft,et al.  Quality models for microblog retrieval , 2012, CIKM.

[49]  Joon Ho Lee,et al.  Combining multiple evidence from different properties of weighting schemes , 1995, SIGIR '95.

[50]  Oren Kurland,et al.  Utilizing inter-document similarities in federated search , 2012, SIGIR '12.

[51]  M. de Rijke,et al.  Personalized search result diversification via structured learning , 2014, KDD.

[52]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[53]  M. de Rijke,et al.  Online Exploration for Detecting Shifts in Fresh Intent , 2014, CIKM.

[54]  Divesh Srivastava,et al.  Compact explanation of data fusion decisions , 2013, WWW.

[55]  Dan Wu,et al.  Toward a Robust data fusion for document retrieval , 2008, 2008 International Conference on Natural Language Processing and Knowledge Engineering.

[56]  Iadh Ounis,et al.  Overview of the TREC-2012 Microblog Track | NIST , 2013 .

[57]  Oren Kurland,et al.  Predicting query performance for fusion-based retrieval , 2012, CIKM.

[58]  Witold Pedrycz,et al.  Semantic Web Content Analysis: A Study in Proximity-Based Collaborative Clustering , 2007, IEEE Transactions on Fuzzy Systems.

[59]  M. de Rijke,et al.  Credibility-inspired ranking for blog post retrieval , 2012, Information Retrieval.

[60]  R. Kustra,et al.  Data-Fusion in Clustering Microarray Data: Balancing Discovery and Interpretability , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[61]  Fernando Diaz,et al.  Improving recency ranking using twitter data , 2013, TIST.

[62]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[63]  Yubin Kim,et al.  Overcoming Vocabulary Limitations in Twitter Microblogs , 2012, TREC.

[64]  Michael Granitzer,et al.  Realtime Ad Hoc Search in Twitter: Know-Center at TREC Microblog Track 2011 , 2011, TREC.

[65]  Milad Shokouhi,et al.  LambdaMerge: merging the results of query reformulations , 2011, WSDM '11.

[66]  Dimitrios Gunopulos,et al.  Identifying similarities, periodicities and bursts for online search queries , 2004, SIGMOD '04.

[67]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[68]  Avi Arampatzis,et al.  On CORI Results Merging , 2013, ECIR.

[69]  Fabio Crestani,et al.  Distributed Information Retrieval and Applications , 2013, ECIR.

[70]  Ophir Frieder,et al.  Disproving the fusion hypothesis: an analysis of data fusion via effective information retrieval strategies , 2003, SAC '03.

[71]  Fabio Crestani,et al.  Reducing the Uncertainty in Resource Selection , 2013, ECIR.

[72]  Wei Gao,et al.  Exploring Tweets Normalization and Query Time Sensitivity for Twitter Search , 2011, TREC.

[73]  Luo Si,et al.  An effective and efficient results merging strategy for multilingual information retrieval in federated search environments , 2007, Information Retrieval.

[74]  Stéphane Marchand-Maillet,et al.  Multiview clustering: a late fusion approach using latent models , 2009, SIGIR.

[75]  Kazuhiro Seki,et al.  Combining Recency and Topic-Dependent Temporal Variation for Microblog Search , 2013, ECIR.

[76]  Boleslaw K. Szymanski,et al.  DOCUMENT CLUSTERING WITH BURSTY INFORMATION , 2013 .

[77]  Miles Efron,et al.  Hashtag retrieval in a microblogging environment , 2010, SIGIR.

[78]  Tao Qin,et al.  Supervised rank aggregation , 2007, WWW '07.

[79]  M. de Rijke,et al.  Adaptive Temporal Query Modeling , 2012, ECIR.

[80]  W. Bruce Croft,et al.  Temporal models for microblogs , 2012, CIKM.

[81]  W. Bruce Croft,et al.  Geometric representations for multiple documents , 2010, SIGIR.

[82]  Tao Qin,et al.  A New Probabilistic Model for Rank Aggregation , 2010, NIPS.

[83]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[84]  Oren Kurland,et al.  Corpus structure, language models, and ad hoc information retrieval , 2004, SIGIR '04.

[85]  Giorgio Gambosi,et al.  FUB, IASI-CNR, UNIVAQ at TREC 2011 , 2011 .

[86]  Hsin-Hsi Chen,et al.  A study of learning a merge model for multilingual information retrieval , 2008, SIGIR '08.

[87]  Milad Shokouhi,et al.  Federated Search , 2011, Found. Trends Inf. Retr..

[88]  Oren Kurland,et al.  Cluster-based fusion of retrieved lists , 2011, SIGIR.

[89]  Thomas Gottron,et al.  Searching microblogs: coping with sparsity and document quality , 2011, CIKM '11.

[90]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[91]  Donald Metzler,et al.  USC/ISI at TREC 2011: Microblog Track , 2011, TREC.

[92]  Shengli Wu,et al.  Data Fusion in Information Retrieval , 2012, Adaptation, Learning, and Optimization.

[93]  Ting Wang,et al.  Improving Twitter Retrieval by Exploiting Structural Information , 2012, AAAI.