Predicting the success of news: Using an ML-based language model in predicting the performance of news articles before publishing

Traditional recommendation systems have limited possibilities to optimise business value in editorial decision making in news production, as they select the recommendations only from the content whose production has been decided editorially in the daily news process or content from existing content inventories. This paper explores an approach to use predictive analytics to make it possible to optimise story assignment and editing in daily editorial work based on selected business objectives already before publishing. In this case study exploration, we use the `constructive approach' as a method to provide solutions to concrete business problems with a scientific approach. We contribute by experimenting a novel method combining elements from several scientific domains like strategic management and system dynamics. We conclude that with language analysis using recurrent neural networks, we were able to predict the success of a news story published on a digital channel in a way that fulfils the `weak market test' criteria of the constructive approach. A company with whom the model was developed considered it valuable enough to decide to move it from exploration to be further developed and used in real news production.

[1]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[2]  Nicole Blanchett Neheli News by Numbers , 2018 .

[3]  Mor Naaman,et al.  Finding and assessing social media information sources in the context of journalism , 2012, CHI.

[4]  Edson C. Tandoc Journalism is twerking? How web analytics is changing the process of gatekeeping , 2014, New Media Soc..

[5]  Craig MacDonald,et al.  A Learned Approach for Ranking News in Real-Time Using the Blogosphere , 2011, SPIRE.

[6]  Paulo Cortez,et al.  A Proactive Intelligent Decision Support System for Predicting the Popularity of Online News , 2015, EPIA.

[7]  Manuel Goyanes,et al.  An Empirical Study of Factors that Influence the Willingness to Pay for Online News , 2014 .

[8]  Ali Rodan,et al.  Online News Popularity Prediction , 2018, 2018 Fifth HCT Information Technology Trends (ITT).

[9]  HE REN,et al.  Predicting and Evaluating the Popularity of Online News , 2015 .

[10]  Paola Velardi,et al.  A topic recommender for journalists , 2018, Information Retrieval Journal.

[11]  Jannick Kirk Sørensen,et al.  Public Service Media, Diversity and Algorithmic Recommendation: Tensions between Editorial Principles and Algorithms in European PSM Organizations , 2019, INRA@RecSys.

[12]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[13]  Edson C. Tandoc Jr.,et al.  Doing “Well” or Doing “Good”: What Audience Analytics Reveal About Journalism’s Competing Goals , 2018, Journalism Studies.

[14]  Julio Gonzalo,et al.  Report on the 2nd International Workshop on Recent Trends in News Information Retrieval (NewsIR'18) , 2018, SIGF.

[15]  R. Boire Predictive analytics: The power to predict who will click, buy, lie, or die , 2013 .

[16]  Hwan-Gue Cho,et al.  A model for popularity dynamics to predict hot articles in discussion blog , 2012, ICUIMC '12.

[17]  Vicenç Gómez,et al.  Description and Prediction of Slashdot Activity , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[18]  Josep Blat,et al.  Homogeneous Temporal Activity Patterns in a Large Online Communication Space , 2007, SAW.

[19]  Wouter van Atteveldt,et al.  News selection criteria in the digital age: Professional norms versus online audience metrics , 2016 .

[20]  Xiaomo Liu,et al.  Data Sets: Word Embeddings Learned from Tweets and General Data , 2017, ICWSM.

[21]  J.D. Sterman,et al.  System Dynamics Modeling: Tools for Learning in a Complex World , 2001, IEEE Engineering Management Review.

[22]  Aristides Gionis,et al.  From chatter to headlines: harnessing the real-time web for personalized news recommendation , 2012, WSDM '12.

[23]  K. Lukka,et al.  The constructive approach in management accounting research , 1993 .

[24]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[25]  Ari Jantunen,et al.  Strategic interpretation on sustainability issues – eliciting cognitive maps of boards of directors , 2016 .

[26]  Edson C. Tandoc Why Web Analytics Click , 2015 .

[27]  Kalyani Chadha,et al.  Journalistic Responses to Technological Innovation in Newsrooms , 2016 .

[28]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[29]  Mark Deuze,et al.  Beyond journalism: Theorizing the transformation of journalism , 2017, Journalism.

[30]  Lívia Markíczy,et al.  A Method for Eliciting and Comparing Causal Maps , 1995 .

[31]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[32]  Paola Velardi,et al.  What to Write? A topic recommender for journalists , 2017, NLPmJ@EMNLP.

[33]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[34]  Heng Ji,et al.  Curating and contextualizing Twitter stories to assist with social newsgathering , 2013, IUI '13.

[35]  Abhinandan Das,et al.  Google news personalization: scalable online collaborative filtering , 2007, WWW '07.

[36]  Serge Fdida,et al.  From popularity prediction to ranking online news , 2014, Social Network Analysis and Mining.

[37]  Media economics and transformation in a digital Europe , 2018 .

[38]  D. Kleinbaum,et al.  Applied regression analysis and other multivariable methods, 3rd ed. , 1998 .

[39]  Atte Jääskeläinen,et al.  The future of national news agencies in Europe - case study 4: business model innovation in media-owned national news agencies , 2019 .

[40]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[41]  Kristina Lerman,et al.  Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks , 2010, ICWSM.

[42]  D. Kleinbaum,et al.  Applied Regression Analysis and Other Multivariate Methods , 1978 .

[43]  N. Draper,et al.  Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[44]  Xiaomo Liu,et al.  TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding , 2016, CIKM.

[45]  Steffen Bayer,et al.  Business dynamics: Systems thinking and modeling for a complex world , 2004 .

[46]  J. Walsh Managerial and Organizational Cognition: Notes from a Trip Down Memory Lane , 1995 .

[47]  Julian Thomas,et al.  Programming, filtering, adblocking: advertising and media automation , 2018 .

[48]  M. Goyanes The Value of Proximity: Examining the Willingness to Pay for Online Local News , 2015 .

[49]  Naren Ramakrishnan,et al.  Predicting the Popularity of News Articles , 2016, SDM.

[50]  Caitlin Petre Engineering Consent , 2018, Measurable Journalism.

[51]  Craig MacDonald,et al.  News article ranking: leveraging the wisdom of bloggers , 2010, RIAO.

[52]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[53]  Richard Socher,et al.  An Analysis of Neural Language Modeling at Multiple Scales , 2018, ArXiv.

[54]  Kavé Salamatian,et al.  Modeling and predicting the popularity of online contents with Cox proportional hazard regression model , 2012, Neurocomputing.

[55]  Edson C. Tandoc,et al.  When News Meets the Audience: How Audience Feedback Online Affects News Production and Consumption , 2017 .

[56]  J Swanson,et al.  Business Dynamics—Systems Thinking and Modeling for a Complex World , 2002, J. Oper. Res. Soc..

[57]  Mel Bunce Management and resistance in the digital newsroom , 2019 .

[58]  Julie M. Hays,et al.  A Methodology for Constructing Collective Causal Maps , 2006, Decis. Sci..

[59]  Dean Abbott,et al.  Applied Predictive Analytics: Principles and Techniques for the Professional Data Analyst , 2014 .

[60]  Sung-Hwan Kim,et al.  Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[61]  Kavé Salamatian,et al.  An Approach to Model and Predict the Popularity of Online Contents with Explanatory Factors , 2010, 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[62]  J. L. Nelson The Elusive Engagement Metric , 2018, Measurable Journalism.

[63]  S. Lewis THE TENSION BETWEEN PROFESSIONAL CONTROL AND OPEN PARTICIPATION , 2012 .

[64]  Xiaomo Liu,et al.  Reuters tracer: Toward automated news production using large scale social media data , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[65]  Edson C. Tandoc,et al.  The Ethics of Web Analytics , 2015 .

[66]  Michael B. Miller Linear Regression Analysis , 2013 .

[67]  S. Sexton,et al.  Engineering consent. , 2001, Splice : the splice of life.

[68]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[69]  Paola Velardi,et al.  What to write and why: a recommender for news media , 2018, SAC.

[70]  Eric Siegel,et al.  Predictive analytics: The power to predict who will click, buy, lie, or die , 2013, Journal of Marketing Analytics.

[71]  Owen Rambow,et al.  Predicting User Views in Online News , 2017, NLPmJ@EMNLP.

[72]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[73]  Bozena Izabela Mierzejewska,et al.  Strategic Responses to Media Market Changes , 2004 .

[74]  Sarah Kaplan Research in Cognition and Strategy: Reflections on Two Decades of Progress and a Look to the Future , 2010 .

[75]  Rasmus Kleis Nielsen,et al.  Editorial Analytics: How News Media are Developing and Using Audience Data and Metrics , 2016 .