Chinese trending search terms popularity rank prediction

Baidu, the most popular Chinese search engine, monitors what their users are currently searching and provides top 50 search terms, called trending search terms, in descending order of popularity ranking. The paper focused on predicting the popularity ranking trends of this top trending search terms in Baidu. Based on the data analysis, two issues were identified that could affect accuracy of using the ranking data for predicting the popularity of trending searched terms. Firstly, all trending terms are disappeared from the top 50 terms list when the popularity is getting lower. However, there are several trending terms that reappear to the top 50 terms list after they disappeared. New distinct search terms can be differentiated from reappearances of old terms so we proposed the term distinction model by using the related news articles of a trending search term provided by Baidu. Secondly, it is necessary to handle the missing value when the term is out of the trending term list. To achieve the goal of this paper, we collected top 50 trending search terms from Baidu engine and its related news articles hourly for 6 months (from 1st March 2013 to 31th August 2013). Based on the proposed model, we found that the optimal disappearing interval can be 9 h, and using rank 51 for the missing values was the most successful. We conducted evaluations by using 3 months data (from 1st September 2013 to 30th November 2013), and four machine learning techniques where compared to evaluate the most accurate for predicting the popularity rank of trending search terms. Feed Forward Neural Network was achieved 78.81 % the most highest prediction accuracy, and achieved 85.55 % accuracy in ±3 error range.

[1]  Torsten Schmidt,et al.  Forecasting private consumption: survey‐based indicators vs. Google trends , 2011 .

[2]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1972 .

[3]  Hyunsuk Chung,et al.  Social Issue Gives You an Opportunity: Discovering the Personalised Relevance of Social Issues , 2012, PKAW.

[4]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[5]  Jörg Rech,et al.  Discovering trends in software engineering with google trend , 2007, SOEN.

[6]  Efthimis N. Efthimiadis,et al.  Conversational tagging in twitter , 2010, HT '10.

[7]  A. Flahault,et al.  More Diseases Tracked by Using Google Trends , 2009, Emerging infectious diseases.

[8]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[9]  Kimon P. Valavanis,et al.  Surveying stock market forecasting techniques - Part II: Soft computing methods , 2009, Expert Syst. Appl..

[10]  Byeong Ho Kang,et al.  It Is Time to Prepare for the Future: Forecasting Social Trends , 2012, FGIT-EL/DTA/UNESST.

[11]  Sungyoung Lee,et al.  Twitter Trending Topics Meaning Disambiguation , 2014, PKAW.

[12]  Krishna P. Gummadi,et al.  Measuring User Influence in Twitter: The Million Follower Fallacy , 2010, ICWSM.

[13]  Rui Li,et al.  TEDAS: A Twitter-based Event Detection and Analysis System , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[14]  Bernardo A. Huberman,et al.  What Trends in Chinese Social Media , 2011, ArXiv.

[15]  Amir F. Atiya,et al.  An Empirical Comparison of Machine Learning Models for Time Series Forecasting , 2010 .

[16]  Alok N. Choudhary,et al.  Twitter Trending Topic Classification , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[17]  Man Lung Yiu,et al.  Group-by skyline query processing in relational engines , 2009, CIKM.

[18]  J. M. Picone,et al.  Quantitative forecasting of near‐term solar activity and upper atmospheric density , 2009 .

[19]  Min Zhang,et al.  Automatic online news topic ranking using media focus and user attention based on aging theory , 2008, CIKM '08.

[20]  Haewoon Kwak,et al.  Finding influentials based on the temporal order of information adoption in twitter , 2010, WWW '10.

[21]  J. Scott Armstrong,et al.  Methods to Elicit Forecasts from Groups: Delphi and Prediction Markets Compared , 2007 .