IRIT at TREC Real-Time Summarization 2018

This paper presents the participation of the IRIT laboratory (University of Toulouse) to the Real Time Summarization track of TREC 2016. This track consists in a real-time filtering the tweet stream and identifying both relevant and novel tweets to be pushed to user in real-time. Our team proposes three different approaches: (1) The first approach consist of a filtering model that combines several summarization constraints (2) The second approach for the scenario A is composed of three filters adjusted sequentially in which we use word similarity based function to evaluate the relevance of an incoming tweet. The generation of a batch of up to 100 ranked tweets is formulate as an optimization problem. (3) The third approach consist of a step by step stream selection method focusing on rapidity, and taking into account tweet similarity as well as several features including content, entities and user-related aspects. We describe in this paper the three proposed approaches and we discuss official obtained results for each of them.

[1]  Bernard Dousset,et al.  Multi-criterion Real Time Tweet Summarization Based upon Adaptive Threshold , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[2]  Mohand Boughanem,et al.  Effectiveness of state-of-the-art features for microblog search , 2013, SAC '13.

[3]  Jimmy J. Lin,et al.  Overview of the TREC-2013 Microblog Track , 2013, TREC.

[4]  George Papadakis,et al.  Content vs. context for sentiment analysis: a comparative analysis over microblogs , 2012, HT '12.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Jimmy J. Lin,et al.  Overview of the TREC-2015 Microblog Track , 2015, TREC.

[7]  Craig MacDonald,et al.  Overview of the TREC-2012 Microblog Track , 2012, Text Retrieval Conference.

[8]  Juraj Hromkovic,et al.  Algorithmics for Hard Problems , 2002, Texts in Theoretical Computer Science An EATCS Series.

[9]  Charles L. A. Clarke,et al.  Simple Dynamic Emission Strategies for Microblog Filtering , 2016, SIGIR.

[10]  Ben He,et al.  A Survey of Learning to Rank for Real-Time Twitter Search , 2012, ICPCA/SWS.

[11]  Edward A. Fox,et al.  Research Contributions , 2014 .

[12]  Bernard Dousset,et al.  Word Similarity Based Model for Tweet Stream Prospective Notification , 2017, ECIR.

[13]  Jimmy J. Lin,et al.  Online In-Situ Interleaved Evaluation of Real-Time Push Notification Systems , 2017, SIGIR.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.