A novel trend surveillance system using the information from web search engines

Web search engines are becoming a major platform for the general public to access information. It has been suggested that because the search patterns of search engine users are correlated with emerging events, the query log of search engines has the potential for trend surveillance, such as monitoring outbreaks of epidemics. Many trend surveillance studies have investigated the use of query logs and have strived to identify query terms suitable for trend surveillance. Most of these works select representative query terms by consulting domain experts or by preparing a large text corpus for feature selection. The process of these approaches, however, is too costly to make the trend surveillance methods adaptable to different topics. In this paper, we propose an adaptive trend surveillance method. We developed a simple and effective feature selection algorithm, called TF-LTR, which leverages the document returned by search engines and the frequency of the terms in the returned documents to select representative query terms of trending topics. Specifically, we investigated pair-wise learning to rank models in order to measure a term's discriminative power in making a document rank higher in the returned document list. The discriminative power is combined with the term frequency which denotes the on-topic degree of a term to measure a term's representativeness against a trending topic. Representative terms and their query frequencies are applied to a state-of-the-art data mining model to enhance the effectiveness of trend surveillance. The experimental results based on trending topics of different domains show that our trend surveillance method performs well and the ranking information of search engines are helpful for trend surveillance. In light of this, the proposed method can provide effective support for government officials and authorities in order to help them to respond to fast-changing events and topics, and to make appropriate decisions. Propose an adaptive trend surveillance framework and an effective feature selection algorithm TF-LTRInvestigated pair-wise learning to rank models to measure a term's discriminative powerSupport government officials and authorities to construct effective and efficient trend surveillance systems

[1]  Raymond Y. K. Lau,et al.  An ontology-based Web mining method for unemployment rate prediction , 2014, Decis. Support Syst..

[2]  R. Serfling Methods for current statistical analysis of excess pneumonia-influenza deaths. , 1963, Public health reports.

[3]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[4]  Mark L. Berenson,et al.  Basic Business Statistics : Concepts and Applications , 2007 .

[5]  Tie-Yan Liu Learning to Rank for Information Retrieval , 2009, Found. Trends Inf. Retr..

[6]  Monica Lam,et al.  Neural network techniques for financial performance prediction: integrating fundamental and technical analysis , 2004, Decis. Support Syst..

[7]  Gunther Eysenbach,et al.  Infodemiology: Tracking Flu-Related Searches on the Web for Syndromic Surveillance , 2006, AMIA.

[8]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[9]  Chien Chin Chen,et al.  A Study of Machine Learning Models in Epidemic Surveillance: Using the Query Logs of Search Engines , 2010, PACIS.

[10]  Kai H. Lim,et al.  Brand Positioning Strategy Using Search Engine Marketing , 2010, MIS Q..

[11]  Nikos Askitas,et al.  Google Econometrics and Unemployment Forecasting , 2009 .

[12]  Paola Sebastiani,et al.  Automated Detection of Influenza Epidemics with Hidden Markov Models , 2003, IDA.

[13]  Chih-Chou Chiu,et al.  Financial time series forecasting using independent component analysis and support vector regression , 2009, Decis. Support Syst..

[14]  Maria L. Gini,et al.  Detecting and Forecasting Economic Regimes in Multi-Agent Automated Exchanges , 2007, Decis. Support Syst..

[15]  Duk Bin Jun,et al.  Predicting turning points in business cycles by detection of slope changes in the leading composite index , 1993 .

[16]  Filip Radlinski,et al.  Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search , 2007, TOIS.

[17]  Thorsten Joachims,et al.  Accurately interpreting clickthrough data as implicit feedback , 2005, SIGIR '05.

[18]  Marianne Sensier,et al.  The Prediction of Business Cycle Phases: Financial Variables and International Linkages , 2002, National Institute Economic Review.

[19]  James D. Hamilton A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle , 1989 .

[20]  Filip Radlinski,et al.  Query chains: learning to rank from implicit feedback , 2005, KDD '05.

[21]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[22]  Mark P. Silverman,et al.  The Wisdom of Crowds. .The Wisdom of CrowdsJamesSurowiecki . 306 pp. Random House, New York, 2004. $24.95 (cloth) ISBN 0-385-50386-5; $14.00 (paper) ISBN 0-385-72170-6. , 2007 .

[23]  W. Bruce Croft,et al.  Incorporating query-specific feedback into learning-to-rank models , 2014, SIGIR.

[24]  Marcelle Chauvet,et al.  A Comparison of the Real-Time Performance of Business Cycle Dating Methods , 2005 .

[25]  Hao Wu,et al.  Suggesting Topic-Based Query Terms as You Type , 2010, 2010 12th International Asia-Pacific Web Conference.

[26]  Alexander Halavais,et al.  Search Engine Society , 2008 .

[27]  Torsten Schmidt,et al.  Forecasting private consumption: survey‐based indicators vs. Google trends , 2011 .

[28]  Chien Chin Chen,et al.  A novel business cycle surveillance system using the query logs of search engines , 2012, Knowl. Based Syst..

[29]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[30]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[31]  Wai Keung Wong,et al.  A multivariate intelligent decision-making model for retail sales forecasting , 2013, Decis. Support Syst..

[32]  Allan P. Layton,et al.  Dating and predicting phase changes in the U.S. business cycle , 1996 .

[33]  Chris Birchenhall,et al.  Predicting U.S. Business-Cycle Regimes , 1999 .

[34]  G. Eysenbach Infodemiology: The epidemiology of (mis)information. , 2002, The American journal of medicine.

[35]  Gavin C. Cawley,et al.  Fast exact leave-one-out cross-validation of sparse least-squares support vector machines , 2004, Neural Networks.

[36]  Robert J. Kauffman,et al.  Applications: Financial Risk and Financial Risk Management Technology (Rmt): Issues and Advances , 1991, Inf. Manag..

[37]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[38]  Chih-Fong Tsai,et al.  Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches , 2010, Decis. Support Syst..

[39]  Chien Chin Chen,et al.  An Effective Friend Recommendation Method Using Learning to Rank and Social Influence , 2015, PACIS.

[40]  Ricardo A. Baeza-Yates,et al.  Extracting semantic relations from query logs , 2007, KDD '07.

[41]  James D. Hamilton,et al.  What Do the Leading Indicators Lead , 1996 .

[42]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[43]  David M. Pennock,et al.  Using internet searches for influenza surveillance. , 2008, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[44]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[45]  Rosie Jones,et al.  The Linguistic Structure of English Web-Search Queries , 2008, EMNLP.

[46]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .