Big Data Analytics for Program Popularity Prediction in Broadcast TV Industries

The precise and timely prediction of program popularity is of great value for content providers, advertisers, and broadcast TV operators. This information can be beneficial for operators in TV program purchasing decisions and can help advertisers formulate reasonable advertisement investment plans. Moreover, in terms of technical matters, a precise program popularity prediction method can optimize the whole broadcasting system, such as the content delivery network strategy and cache strategy. Several prediction models have been proposed based on video-on-demand (VOD) data from YouKu, YouTube, and Twitter. However, existing prediction methods usually require a large quantity of samples and long training time, and the prediction accuracy is poor for programs that experience a high peak or sharp decrease in popularity. This paper presents our improved prediction approach based on trend detection. First, a dynamic time warping-distance-based $K$ -medoids algorithm is applied to group programs’ popularity evolution into four trends. Then, four trend-specific prediction models are built separately using random forests regression. According to the features extracted from an electronic program guide and early viewing records, newly published programs are classified into the four trends by a gradient boosting decision tree. Finally, by combining forecasting values from the trend-specific models and the classification probability, our proposed approach achieves better prediction results. The experimental results on a massive set of real VOD data from the Jiangsu Broadcasting Corporation show that, compared with the existing prediction models, the prediction accuracy is increased by more than 20%, and the forecasting period is effectively shortened.

[1]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[2]  Xin Niu,et al.  A mobile recommendation system based on logistic regression and Gradient Boosting Decision Trees , 2016, 2016 International Joint Conference on Neural Networks (IJCNN).

[3]  Flavio Figueiredo,et al.  TrendLearner: Early prediction of popularity trends of user generated content , 2014, Inf. Sci..

[4]  Der-Jiunn Deng,et al.  Wireless Big Data Computing in Smart Grid , 2017, IEEE Wireless Communications.

[5]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[6]  L. Valiant Probably Approximately Correct: Nature's Algorithms for Learning and Prospering in a Complex World , 2013 .

[7]  Guangjie Han,et al.  LDPA: a local data processing architecture in ambient assisted living communications , 2015, IEEE Communications Magazine.

[8]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Dong Yue,et al.  Toward Distributed Data Processing on Intelligent Leak-Points Prediction in Petrochemical Industries , 2016, IEEE Transactions on Industrial Informatics.

[11]  Der-Jiunn Deng,et al.  Real-Time Load Reduction in Multimedia Big Data for Mobile Internet , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[12]  Wei Sun,et al.  An Improved K-medoids Clustering Algorithm Based on a Grid Cell Graph Realized by the P System , 2016, HCC.

[13]  Song Guo,et al.  Green Industrial Internet of Things Architecture: An Energy-Efficient Perspective , 2016, IEEE Communications Standards.

[14]  Gilles Louppe,et al.  Scikit-learn: Machine Learning Without Learning the Machinery , 2015, GETMBL.

[15]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[16]  Pascale Minet,et al.  ARMA based popularity prediction for caching in Content Delivery Networks , 2017, 2017 Wireless Days.

[17]  Eamonn J. Keogh,et al.  Extracting Optimal Performance from Dynamic Time Warping , 2016, KDD.

[18]  Eugene M. Kleinberg,et al.  On the Algorithmic Implementation of Stochastic Discrimination , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Saverio Niccolini,et al.  A peek into the future: predicting the evolution of popularity in user generated content , 2013, WSDM.

[20]  Serge Fdida,et al.  Predicting the popularity of online articles based on user comments , 2011, WIMS '11.

[21]  Sung-Hwan Kim,et al.  Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity , 2011, 2011 IEEE 11th International Conference on Computer and Information Technology.

[22]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[23]  Yu Liu,et al.  A new popularity prediction model based on lifetime forecast of online videos , 2016, 2016 IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC).

[24]  Kavé Salamatian,et al.  Modeling and predicting the popularity of online contents with Cox proportional hazard regression model , 2012, Neurocomputing.

[25]  Yixiong Feng,et al.  Big Data Analytics for System Stability Evaluation Strategy in the Energy Internet , 2017, IEEE Transactions on Industrial Informatics.

[26]  Lei Shu,et al.  Mobile big data fault-tolerant processing for ehealth networks , 2016, IEEE Network.

[27]  Ibrahim Matta,et al.  Describing and forecasting video access patterns , 2011, 2011 Proceedings IEEE INFOCOM.

[28]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[29]  Huzefa Rangwala,et al.  Digging Digg: Comment Mining, Popularity Prediction, and Social Network Analysis , 2009, 2009 International Conference on Web Information Systems and Mining.

[30]  Michael Timmers,et al.  On the Use of Reservoir Computing in Popularity Prediction , 2010, 2010 2nd International Conference on Evolving Internet.

[31]  Vicenç Gómez,et al.  Description and Prediction of Slashdot Activity , 2007, 2007 Latin American Web Conference (LA-WEB 2007).

[32]  E. M. Kleinberg,et al.  Stochastic discrimination , 1990, Annals of Mathematics and Artificial Intelligence.

[33]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[34]  Jussara M. Almeida,et al.  Using early view patterns to predict the popularity of youtube videos , 2013, WSDM.

[35]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[36]  Leo Breiman,et al.  Prediction Games and Arcing Algorithms , 1999, Neural Computation.

[37]  Serge Fdida,et al.  From popularity prediction to ranking online news , 2014, Social Network Analysis and Mining.

[38]  Song Guo,et al.  Robust Big Data Analytics for Electricity Price Forecasting in the Smart Grid , 2019, IEEE Transactions on Big Data.

[39]  Heng Lu,et al.  A context-aware system architecture for leak point detection in the large-scale petrochemical industry , 2014, IEEE Communications Magazine.

[40]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[41]  E. Kleinberg An overtraining-resistant stochastic modeling method for pattern recognition , 1996 .

[42]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[43]  Azah Mohamed,et al.  A Random Forest Regression Based Space Vector PWM Inverter Controller for the Induction Motor Drive , 2017, IEEE Transactions on Industrial Electronics.

[44]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .