Crowd-squared: amplifying the predictive power of search trend data

Big data generated by crowds provides a myriad of opportunities for monitoring and modeling people's intentions, preferences, and opinions. A crucial step in analyzing such big data is selecting the relevant part of the data that should be provided as input to the modeling process. In this paper, we offer a novel, structured, crowd-based method to address the data selection problem in a widely used and challenging context: selecting search trend data. We label the method "crowd-squared," as it leverages crowds to identify the most relevant terms in search volume data that were generated by a larger crowd. We empirically test this method in two domains and find that our method yields predictions that are equivalent or superior to those obtained in previous studies (using alternative data selection methods) and to predictions obtained using various benchmark data selection methods. These results emphasize the importance of a structured data selection method in the prediction process, and demonstrate the utility of the crowd-squared approach for addressing this problem in the context of prediction using search trend data.

[1]  Rex Du,et al.  Leveraging Trends in Online Searches for Product Features in Market Response Modeling , 2015 .

[2]  Tom Fawcett,et al.  Data science for business , 2013 .

[3]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[4]  Zhi Da,et al.  In Search of Attention , 2009 .

[5]  Christopher G. Harris You're Hired! An Examination of Crowdsourcing Incentive Models in Human Resource Tasks , 2011 .

[6]  S. Dennis,et al.  What is free association and what does it measure? , 2000, Memory & cognition.

[7]  Beibei Li,et al.  Examining the Impact of Ranking on Consumer Behavior and Search Engine Revenue , 2013, Manag. Sci..

[8]  Douglas L. Nelson,et al.  Interpreting the influence of implicitly activated memories on recall and recognition. , 1998 .

[9]  B. Sparrow,et al.  Google Effects on Memory: Cognitive Consequences of Having Information at Our Fingertips , 2011, Science.

[10]  Dean F. Sittig,et al.  The emerging science of very early detection of disease outbreaks. , 2001, Journal of public health management and practice : JPHMP.

[11]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[12]  Shauna Reilly,et al.  Using Google Search Data for State Politics Research , 2012, State Politics & Policy Quarterly.

[13]  Torsten Schmidt,et al.  Forecasting private consumption: survey‐based indicators vs. Google trends , 2011 .

[14]  Barry L. Bayus,et al.  Crowdsourcing New Product Ideas over Time: An Analysis of the Dell IdeaStorm Community , 2013, Manag. Sci..

[15]  Chrysanthos Dellarocas,et al.  Harnessing Crowds: Mapping the Genome of Collective Intelligence , 2009 .

[16]  Daren C. Brabham Crowdsourcing as a Model for Problem Solving , 2008 .

[17]  H Eugene Stanley,et al.  Complex dynamics of our economic life on different scales: insights from search engine query data , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[18]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[19]  E. Brynjolfsson,et al.  The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales , 2013, ICIS 2013.

[20]  Michael Scharkow,et al.  Measuring the Public Agenda using Search Engine Queries , 2011 .

[21]  Guido Caldarelli,et al.  Web Search Queries Can Predict Stock Market Volumes , 2011, PloS one.

[22]  Thomas A. Schreiber,et al.  The University of South Florida free association, rhyme, and word fragment norms , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[23]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.

[24]  Luis von Ahn Games with a Purpose , 2006, Computer.

[25]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[26]  Declan Butler,et al.  When Google got flu wrong , 2013, Nature.

[27]  D. Meyer,et al.  Supporting Online Material Materials and Methods Som Text Figs. S1 to S6 References Evidence for a Collective Intelligence Factor in the Performance of Human Groups , 2022 .

[28]  Peter S. Fader,et al.  Dynamic Conversion Behavior at E-Commerce Sites , 2004, Manag. Sci..

[29]  C. Peng,et al.  Do Seasons Have an Influence on the Incidence of Depression? The Use of an Internet Search Engine Query Data as a Proxy of Human Affect , 2010, PloS one.

[30]  Lyle Ungar,et al.  Lessons Learned About Public Health from Online Crowd Surveillance , 2013, Big Data.

[31]  Karim R Lakhani,et al.  Using the crowd as an innovation partner. , 2013, Harvard business review.

[32]  Eric Schenk,et al.  Towards a characterization of crowdsourcing practices , 2011 .

[33]  Rex Du,et al.  Decomposing the Impact of Advertising: Augmenting Sales with Online Search Data , 2014 .

[34]  Vikas Kumar,et al.  CrowdSearch: exploiting crowds for accurate real-time image search on mobile phones , 2010, MobiSys '10.

[35]  Ś. Sen,et al.  Use of Google Insights for Search to track seasonal and geographic kidney stone incidence in the United States. , 2011, Urology.

[36]  Joshua B. Tenenbaum,et al.  The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth , 2001, Cogn. Sci..

[37]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[38]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[39]  Michael S. Drake,et al.  Investor Information Demand: Evidence from Google Searches Around Earnings Announcements , 2012 .

[40]  Ian Witten,et al.  Data Mining , 2000 .

[41]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[42]  Ladislav Kristoufek,et al.  BitCoin meets Google Trends and Wikipedia: Quantifying the relationship between phenomena of the Internet era , 2013, Scientific Reports.

[43]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[44]  D. Lester,et al.  Using google searches on the internet to monitor suicidal behavior. , 2013, Journal of affective disorders.

[45]  Jian Zhang,et al.  Does Search Matter? Using Clickstream Data to Examine the Relationship between Online Search and Purchase Behavior , 2006, ICIS.

[46]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[47]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[48]  Gal Oestreicher-Singer,et al.  Using Forum and Search Data for Sales Prediction of High-Involvement Projects , 2017, MIS Q..