Prediction of Video Popularity in the Absence of Reliable Data from Video Hosting Services: Utility of Traces Left by Users on the Web

With the growth of user-generated content, we observe the constant rise of the number of companies, such as search engines, content aggregators, etc., that operate with tremendous amounts of web content not being the services hosting it. Thus, aiming to locate the most important content and promote it to the users, they face the need of estimating the current and predicting the future content popularity. In this paper, we approach the problem of video popularity prediction not from the side of a video hosting service, as done in all previous studies, but from the side of an operating company, which provides a popular video search service that aggregates content from different video hosting websites. We investigate video popularity prediction based on features from three primary sources available for a typical operating company: first, the content hosting provider may deliver its data via its API, second, the operating company makes use of its own search and browsing logs, third, the company crawls information about embeds of a video and links to a video page from publicly available resources on the Web. We show that video popularity prediction based on the embed and link data coupled with the internal search and browsing data significantly improves video popularity prediction based only on the data provided by the video hosting and can even adequately replace the API data in the cases when it is partly or completely unavailable.

[1]  Amine Bermak,et al.  Predicting YouTube content popularity via Facebook data: A network spread model for optimizing multimedia delivery , 2013, 2013 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[2]  Saverio Niccolini,et al.  A peek into the future: predicting the evolution of popularity in user generated content , 2013, WSDM.

[3]  Didier Sornette,et al.  Robust dynamic classes revealed by measuring the response function of a social system , 2008, Proceedings of the National Academy of Sciences.

[4]  Niklas Carlsson,et al.  The untold story of the clones: content-agnostic factors that impact YouTube video popularity , 2012, KDD.

[5]  Tom Broxton,et al.  Catching a viral video , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[6]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[7]  Lin Sun,et al.  Advanced independent cascade model for YouTube content propagation in Facebook , 2013, 2013 IEEE China Summit and International Conference on Signal and Information Processing.

[8]  Alberto Del Bimbo,et al.  Web Video Popularity Prediction using Sentiment and Content Visual Features , 2016, ICMR.

[9]  Jun Liu,et al.  Characterizing and Predicting the Popularity of Online Videos , 2016, IEEE Access.

[10]  Flavio Figueiredo,et al.  The tube over time: characterizing popularity growth of youtube videos , 2011, WSDM '11.

[11]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[12]  Susan T. Dumais,et al.  Modeling and predicting behavioral dynamics on the web , 2012, WWW.

[13]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[14]  Shi-Kuo Chang,et al.  Sparse Multi-Task Learning for Detecting Influential Nodes in an Implicit Diffusion Network , 2013, AAAI.

[15]  Maarten de Rijke,et al.  News Comments: Exploring, Modeling, and Online Prediction , 2010, ECIR.

[16]  Ke Xu,et al.  On popularity prediction of videos shared in online social networks , 2013, CIKM.

[17]  Xiaohua Hu,et al.  Video Popularity Prediction by Sentiment Propagation via Implicit Network , 2015, CIKM.

[18]  Tad Hogg,et al.  Using a model of social dynamics to predict popularity of news , 2010, WWW '10.

[19]  Tad Hogg,et al.  Social dynamics of Digg , 2010, EPJ Data Science.

[20]  Wang-Chien Lee,et al.  A straw shows which way the wind blows: ranking potentially popular items from early votes , 2012, WSDM '12.

[21]  Venkata Rama Kiran Garimella,et al.  Who watches (and shares) what on youtube? and when?: using twitter to understand youtube viewership , 2013, WSDM.

[22]  Bernardo A. Huberman,et al.  Predicting the popularity of online content , 2008, Commun. ACM.

[23]  Flavio Figueiredo,et al.  On the prediction of popularity of trends and hits for user generated videos , 2013, WSDM.

[24]  Brian D. Davison,et al.  Predicting popular messages in Twitter , 2011, WWW.

[25]  Fei-Fei Li,et al.  Web image prediction using multivariate point processes , 2012, KDD.

[26]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[27]  Serge Fdida,et al.  Ranking News Articles Based on Popularity Prediction , 2012, 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining.

[28]  Yi Yang,et al.  Viral Video Style: A Closer Look at Viral Videos on YouTube , 2014, ICMR.

[29]  Jussara M. Almeida,et al.  Using early view patterns to predict the popularity of youtube videos , 2013, WSDM.

[30]  Yiqun Liu,et al.  Predicting the popularity of web 2.0 items based on user comments , 2014, SIGIR.

[31]  Gleb Gusev,et al.  Predicting the Audience Size of a Tweet , 2013, ICWSM.

[32]  Dan Wang,et al.  Towards understanding the external links of video sharing sites: measurement and analysis , 2010, NOSSDAV '10.

[33]  Gleb Gusev,et al.  Prediction of retweet cascade size over time , 2012, CIKM.

[34]  Serge Fdida,et al.  From popularity prediction to ranking online news , 2014, Social Network Analysis and Mining.

[35]  Virgílio A. F. Almeida,et al.  On Popularity in the Blogosphere , 2010, IEEE Internet Computing.