Ringtail: Feature Selection For Easier Nowcasting

In recent years, social media “nowcasting”—the use of online user activity to predict various ongoing real-world social phenomena—has become a popular research topic; yet, this popularity has not led to widespread actual practice. We believe a major obstacle to widespread adoption is the feature selection problem. Typical nowcasting systems require the user to choose a set of relevant social media objects, which is difficult, time-consuming, and can imply a statistical background that users may not have. We propose Ringtail, which helps the user choose relevant social media signals. It takes a single user input string (e.g., unemployment) and yields a number of relevant signals the user can use to build a nowcasting model. We evaluate Ringtail on six different topics using a corpus of almost 6 billion tweets, showing that features chosen by Ringtail in a wholly-automated way are better or as good as those from a human and substantially better if Ringtail receives some human assistance. In all cases, Ringtail reduces the burden on the user.

[1]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[2]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[3]  M. Rothschild,et al.  Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[4]  Christopher Ré,et al.  Brainwash: A Data System for Feature Engineering , 2013, CIDR.

[5]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[6]  Michael J. Black,et al.  Robust principal component analysis for computer vision , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  David M. Pennock,et al.  Predicting consumer behavior with Web search , 2010, Proceedings of the National Academy of Sciences.

[8]  E. Brynjolfsson,et al.  The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales , 2013, ICIS 2013.

[9]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[10]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[11]  J. Jenkins,et al.  Word association norms , 1964 .

[12]  N. Askitas,et al.  Detecting Mortgage Delinquencies , 2011, SSRN Electronic Journal.

[13]  Peng Geng,et al.  A prediction study on the car sales based on web search data , 2011, 2011 International Conference on E-Business and E-Government (ICEE).

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  N. Askitas,et al.  Google Econometrics and Unemployment Forecasting , 2009, SSRN Electronic Journal.

[16]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.