In the 2017 TREC (Text Retrieval Conference) Real-Time Summarization (RTS) track, we explored supervised methods for identifying relevant tweets based on a user’s interest profile. We primarily focused on two approaches: profile-specific and profile-independent. For profile-specific, we trained a model for each interest profile with features specific to the target profile. In case of profileindependent, a single model was trained with features that were general across all profiles. For training the supervised models, we used labeled data from the previous year’s challenge. We additionally introduced a novel method for automatically labeling tweets with relevance scores. The method treated keywords from titles as an essential information and penalized the relevance score for a tweet when the keywords were absent; while treating keywords from description as supporting information, and rewarding the relevance score when these keywords were present. In scenario A (real-time push notification), our best run yielded 9.95% EG-p and 11.11% nDCG-p improvements over the median in batch evaluation. In scenario B (daily digest), our best run achieved 25.43% nDCGp improvement over the median.
[1]
Oladimeji Farri,et al.
Assorted Textual Features and Dynamic Push Strategies for Real-time Tweet Notification
,
2016,
TREC.
[2]
Oladimeji Farri,et al.
Exploiting Neural Embeddings for Social Media Data Analysis
,
2015,
TREC.
[3]
Chris Callison-Burch,et al.
PPDB: The Paraphrase Database
,
2013,
NAACL.
[4]
Chih-Jen Lin,et al.
LIBLINEAR: A Library for Large Linear Classification
,
2008,
J. Mach. Learn. Res..
[5]
Jimmy J. Lin,et al.
Overview of the TREC 2017 Real-Time Summarization Track
,
2017,
TREC.
[6]
Bowen Zhou,et al.
ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs
,
2015,
TACL.