On the feasibility of predicting popular news at cold start

Prominent news sites on the web provide hundreds of news articles daily. The abundance of news content competing to attract online attention, coupled with the manual effort involved in article selection, necessitates the timely prediction of future popularity of these news articles. The future popularity of a news article can be estimated using signals indicating the article's penetration in social media (e.g., number of tweets) in addition to traditional web analytics (e.g., number of page views). In practice, it is important to make such estimations as early as possible, preferably before the article is made available on the news site (i.e., at cold start). In this paper we perform a study on cold‐start news popularity prediction using a collection of 13,319 news articles obtained from Yahoo News, a major news provider. We characterize the popularity of news articles through a set of online metrics and try to predict their values across time using machine learning techniques on a large collection of features obtained from various sources. Our findings indicate that predicting news popularity at cold start is a difficult task, contrary to the findings of a prior work on the same topic. Most articles' popularity may not be accurately anticipated solely on the basis of content features, without having the early‐stage popularity values.

[1]  Sung-Hwan Kim,et al.  Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[2]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[3]  Mike Thelwall,et al.  Sentiment strength detection for the social web , 2012, J. Assoc. Inf. Sci. Technol..

[4]  Huzefa Rangwala,et al.  Digging Digg: Comment Mining, Popularity Prediction, and Social Network Analysis , 2009, 2009 International Conference on Web Information Systems and Mining.

[5]  Tim Brody,et al.  Earlier Web usage statistics as predictors of later citation impact: Research Articles , 2006 .

[6]  Amit P. Sheth,et al.  Prediction of Topic Volume on Twitter , 2012 .

[7]  Nick Koudas,et al.  Early online identification of attention gathering items in social media , 2010, WSDM '10.

[8]  R. Lerman,et al.  The macrobiotic diet in chronic disease. , 2010, Nutrition in clinical practice : official publication of the American Society for Parenteral and Enteral Nutrition.

[9]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[10]  Stevan Harnad,et al.  Earlier Web Usage Statistics as Predictors of Later Citation Impact , 2005, J. Assoc. Inf. Sci. Technol..

[11]  Thomas McCarthy,et al.  Lateral tilt position for obese patients. , 2009, Resuscitation.

[12]  R. L. Thorndike Who belongs in the family? , 1953 .

[13]  Jiawei Han,et al.  Predicting future popularity trend of events in microblogging platforms , 2012, ASIST.

[14]  Yunhao Liu,et al.  Range-Based Network localization , 2011 .

[15]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[16]  Victor Lavrenko,et al.  Predicting social-tags for cold start book recommendations , 2009, RecSys '09.

[17]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[18]  Ciro Cattuto,et al.  Dynamical classes of collective attention in twitter , 2011, WWW.

[19]  Venkata Rama Kiran Garimella,et al.  FAST: forecast and analytics of social media and traffic , 2014, CSCW Companion '14.

[20]  Jaime G. Carbonell,et al.  Hourly Traffic Prediction of News Stories , 2013, ArXiv.

[21]  Circulations, Revenues, and Profits in a Newspaper Market with Fixed Advertising Costs , 2009 .

[22]  Berkant Barla Cambazoglu,et al.  On the Feasibility of Predicting News Popularity at Cold Start , 2014, SocInfo.

[23]  Jussara M. Almeida,et al.  What makes your opinion popular?: predicting the popularity of micro-reviews in foursquare , 2014, SAC.

[24]  Chao Liu,et al.  Wisdom of the better few: cold start recommendation via representative based rating elicitation , 2011, RecSys '11.

[25]  Jussara M. Almeida,et al.  Popularity dynamics of foursquare micro-reviews , 2014, COSN '14.

[26]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[27]  P. Gloor,et al.  Predicting Stock Market Indicators Through Twitter “I hope it is not as bad as I fear” , 2011 .

[28]  Saverio Niccolini,et al.  A peek into the future: predicting the evolution of popularity in user generated content , 2013, WSDM.

[29]  Serge Fdida,et al.  Predicting the popularity of online articles based on user comments , 2011, WIMS '11.

[30]  Christophe Diot,et al.  Finding a needle in a haystack of reviews: cold start context-based hotel recommender system , 2012, RecSys.

[31]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[32]  Deepak Agarwal,et al.  Multi-faceted ranking of news articles using post-read actions , 2012, CIKM '12.

[33]  Miao Chen,et al.  Toward Predicting Popularity of Social Marketing Messages , 2011, SBP.

[34]  Daniele Quercia,et al.  Recommending Social Events from Mobile Phone Location Data , 2010, 2010 IEEE International Conference on Data Mining.

[35]  Elizabeth M. Daly,et al.  Social networking feeds: recommending items of interest , 2010, RecSys '10.

[36]  Berkant Barla Cambazoglu,et al.  A large-scale sentiment analysis for Yahoo! answers , 2012, WSDM '12.

[37]  Jussara M. Almeida,et al.  Using early view patterns to predict the popularity of youtube videos , 2013, WSDM.

[38]  M. Thelwall Bloggers during the London attacks: Top information sources and topics , 2006 .

[39]  Yun-Su Kim Algebraic Elements and Invariant Subspaces , 2009 .

[40]  Jon M. Kleinberg,et al.  Does Bad News Go Away Faster? , 2011, ICWSM.

[41]  David A. Shamma,et al.  Viral Actions: Predicting Video View Counts Using Synchronous Sharing Behaviors , 2011, ICWSM.