From popularity prediction to ranking online news

News articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking.

[1]  Hamed Haddadi,et al.  The spread of media content through blogs , 2011, Social Network Analysis and Mining.

[2]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[3]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Lixin Gao,et al.  The impact of YouTube recommendation system on video views , 2010, IMC '10.

[6]  W. Bruce Croft,et al.  Linear feature-based models for information retrieval , 2007, Information Retrieval.

[7]  Dunja Mladenic,et al.  Proceedings of the 3rd international workshop on Link discovery , 2005, KDD 2005.

[8]  Hang Li,et al.  AdaRank: a boosting algorithm for information retrieval , 2007, SIGIR.

[9]  M. de Rijke,et al.  Linking online news and social media , 2011, WSDM '11.

[10]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[11]  Michael Mitzenmacher,et al.  A Brief History of Generative Models for Power Law and Lognormal Distributions , 2004, Internet Math..

[12]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[13]  Jon M. Kleinberg,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World [Book Review] , 2013, IEEE Technol. Soc. Mag..

[14]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[15]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[16]  Shou-De Lin,et al.  Modeling and evaluating information propagation in a microblogging social network , 2012, Social Network Analysis and Mining.

[17]  Li Fan,et al.  Web caching and Zipf-like distributions: evidence and implications , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[18]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[19]  D. Domingo,et al.  Making Online News: The ethnography of new media production, 1 , 2008 .

[20]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[21]  Qiang Wu,et al.  Adapting boosting for information retrieval measures , 2010, Information Retrieval.

[22]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[23]  Fang Wu,et al.  Novelty and collective attention , 2007, Proceedings of the National Academy of Sciences.

[24]  Vwani P. Roychowdhury,et al.  Why does attention to web articles fall with Time? , 2012, J. Assoc. Inf. Sci. Technol..

[25]  Rob Salmond Making Online News: The Ethnography of New Media Production, edited by Chris Paterson and David Domingo , 2009 .

[26]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[27]  Vicenç Gómez,et al.  Description and Prediction of Slashdot Activity , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[28]  Josep Blat,et al.  Homogeneous Temporal Activity Patterns in a Large Online Communication Space , 2007, SAW.

[29]  A. Barabasi,et al.  Dynamics of information access on the web. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Tie-Yan Liu,et al.  Learning to Rank for Information Retrieval , 2011 .

[31]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[32]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[33]  Craig MacDonald,et al.  News article ranking: leveraging the wisdom of bloggers , 2010, RIAO.

[34]  Sofus A. Macskassy Contextual linking behavior of bloggers: leveraging text mining to enable topic-based analysis , 2011, Social Network Analysis and Mining.

[35]  Kavé Salamatian,et al.  An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[36]  Krishna P. Gummadi,et al.  A measurement-driven analysis of information propagation in the flickr social network , 2009, WWW '09.

[37]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[38]  Christian Doerr,et al.  Lognormal distribution in the digg online social network , 2011 .

[39]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[40]  A Vespignani,et al.  Topical interests and the mitigation of search engine bias , 2006, Proceedings of the National Academy of Sciences.

[41]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[42]  Aristides Gionis,et al.  From chatter to headlines: harnessing the real-time web for personalized news recommendation , 2012, WSDM '12.

[43]  James Caverlee,et al.  Ranking Comments on the Social Web , 2009, 2009 International Conference on Computational Science and Engineering.

[44]  Wang-Chien Lee,et al.  A straw shows which way the wind blows: ranking potentially popular items from early votes , 2012, WSDM '12.

[45]  Serge Fdida,et al.  Predicting the popularity of online articles based on user comments , 2011, WIMS '11.

[46]  Matthew J. Salganik,et al.  Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market , 2006, Science.

[47]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[48]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[49]  Songqing Chen,et al.  The stretched exponential distribution of internet media access patterns , 2008, PODC '08.

[50]  Lars Lindberg Christensen Book Review: The Hands-On Guide for Science Communicators: A Step-by-Step Approach to Public Outreach , 2007 .