Modeling and predicting the popularity of online news based on temporal and content-related features

As the market of globally available online news is large and still growing, there is a strong competition between online publishers in order to reach the largest possible audience. Therefore an intelligent online publishing strategy is of the highest importance to publishers. A prerequisite for being able to optimize any online strategy, is to have trustworthy predictions of how popular new online content may become. This paper presents a novel methodology to model and predict the popularity of online news. We first introduce a new strategy and mathematical model to capture view patterns of online news. After a thorough analysis of such view patterns, we show that well-chosen base functions lead to suitable models, and show how the influence of day versus night on the total view patterns can be taken into account to further increase the accuracy, without leading to more complex models. Second, we turn to the prediction of future popularity, given recently published content. By means of a new real-world dataset, we show that the combination of features related to content, meta-data, and the temporal behavior leads to significantly improved predictions, compared to existing approaches which only consider features based on the historical popularity of the considered articles. Whereas traditionally linear regression is used for the application under study, we show that the more expressive gradient tree boosting method proves beneficial for predicting news popularity.

[1]  Jussara M. Almeida,et al.  Using early view patterns to predict the popularity of youtube videos , 2013, WSDM.

[2]  Yongdong Zhang,et al.  Parallel deblocking filter for HEVC on many-core processor , 2014 .

[3]  Vicenç Gómez,et al.  Description and Prediction of Slashdot Activity , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[4]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[5]  M. de Rijke,et al.  Predicting IMDB Movie Ratings Using Social Media , 2012, ECIR.

[6]  Fei Ye,et al.  Predicting Future Retweet Counts in a Microblog , 2014 .

[7]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[8]  Liang Li,et al.  Efficient parallel HEVC intra-prediction on many-core processor , 2014 .

[9]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[10]  Serge Fdida,et al.  From popularity prediction to ranking online news , 2014, Social Network Analysis and Mining.

[11]  Walter L. Smith Probability and Statistics , 1959, Nature.

[12]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[13]  Yongdong Zhang,et al.  A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors , 2014, IEEE Signal Processing Letters.

[14]  Tetsuya Sakai,et al.  Evaluating evaluation metrics based on the bootstrap , 2006, SIGIR.

[15]  Piet Demeester,et al.  Named entity recognition on flemish audio-visual and news-paper archives , 2012 .

[16]  M. de Rijke,et al.  Predicting the volume of comments on online news stories , 2009, CIKM.

[17]  Berkant Barla Cambazoglu,et al.  On the Feasibility of Predicting News Popularity at Cold Start , 2014, SocInfo.

[18]  Yongdong Zhang,et al.  Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Sung-Hwan Kim,et al.  Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[20]  B. Vermeulen,et al.  Named Entity Recognition on Flemish audio-visual and newspaper archives , 2012 .

[21]  Katherine L. Milkman,et al.  What Makes Online Content Viral? , 2012 .

[22]  Jürgen Pfeffer,et al.  Characterizing the life cycle of online news stories using social media reactions , 2013, CSCW.

[23]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[24]  Flavio Figueiredo,et al.  Improving the Effectiveness of Content Popularity Prediction Methods using Time Series Trends , 2014, ArXiv.