Background Linking: Joining Entity Linking with Learning to Rank Models

The recent years have been characterized by a strong democratization of news production on the web. In this scenario it is rare to find self-contained news articles that provide useful background and context information. The problem of finding information providing context to news articles has been tackled by the Background Linking task of the TREC News Track. In this paper, we propose a system to address the background linking task. Our system relies on LambdaMART learning to rank algorithm trained on classic textual features and on entity-based features. The idea is that the entities extracted from the documents as well as their relationships provide valuable context to the documents. We analyzed how this idea can be used to improve the effectiveness of (re-)ranking methods for the background linking task.

[1]  Donna Harman,et al.  TREC 2018 News Track Overview , 2018, TREC.

[2]  Christopher J. C. Burges,et al.  From RankNet to LambdaRank to LambdaMART: An Overview , 2010 .

[3]  Tie-Yan Liu,et al.  Word-Entity Duet Representations for Document Ranking , 2017, SIGIR.

[4]  Hui Fang,et al.  Paragraph as Lead - Finding Background Documents for News Articles , 2018, TREC.

[5]  Klaus Berberich,et al.  htw saar @ TREC 2018 News Track , 2018, TREC.

[6]  Jerome H Friedman,et al.  Multiple additive regression trees with application in epidemiology , 2003, Statistics in medicine.

[7]  Damir Vukicevic,et al.  Community structure in networks: Girvan-Newman algorithm improvement , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[8]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[9]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[10]  Javed A. Aslam,et al.  Relevance score normalization for metasearch , 2001, CIKM '01.

[11]  Krisztian Balog,et al.  Entity-Oriented Search , 2018, The Information Retrieval Series.

[12]  Yue Wang,et al.  UNC SILS at TREC 2019 News Track , 2019, TREC.

[13]  Krisztian Balog,et al.  On the Reproducibility of the TAGME Entity Linking System , 2016, ECIR.

[14]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[16]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008, Proceedings of the Python in Science Conference.

[17]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Hui Fang,et al.  Leveraging Entities in Background Document Retrieval for News Articles , 2019, TREC.

[19]  John Foley,et al.  Smith at TREC2019: Learning to Rank Background Articles with Poetry Categories and Keyphrase Extraction , 2019, TREC.

[20]  37th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2014, Opatija, Croatia, May 26-30, 2014 , 2014, MIPRO.

[21]  Donna K. Harman,et al.  TREC 2018 News Track , 2018, NewsIR@ECIR.

[22]  Tie-Yan Liu,et al.  Bag-of-Entities Representation for Ranking , 2016, ICTIR.

[23]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[24]  Ian H. Witten,et al.  An effective, low-cost measure of semantic relatedness obtained from Wikipedia links , 2008 .

[25]  Andrew MacFarlane,et al.  DMINR at TREC News Track , 2019, TREC.

[26]  James P. Callan,et al.  Improving Ad Hoc Retrieval With Bag Of Entities , 2018, TREC.

[27]  Houquan Zhou,et al.  ICTNET at TREC 2019 News Track , 2019, TREC.