Text-Driven Forecasting

Forecasting the future hinges on understanding the present. The web—particularly the social web—now gives us an up-to-the-minute snapshot of the world as it is and as it is perceived by many people, right now, but that snapshot is distributed in a way that is incomprehensible to a human. Much of this data is encoded in text, which is noisy, unstructured, and sparse; yet recent developments in natural language processing now permit us to analyze text and connect it to real-world measurable phenomena through statistical models. We propose text-driven forecasting as a challenge for natural language processing and machine learning:

[1]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[2]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[3]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[6]  Noah A. Smith,et al.  What's Worthy of Comment? Content and Comment Volume in Political Blogs , 2010, ICWSM.

[7]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[8]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[9]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Mário A. T. Figueiredo,et al.  Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems , 2007, IEEE Journal of Selected Topics in Signal Processing.

[12]  Noah A. Smith,et al.  Predicting Response to Political Blog Posts with Topic Models , 2009, NAACL.

[13]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[14]  Noah A. Smith,et al.  Novel estimation methods for unsupervised discovery of latent structure in natural language text , 2007 .

[15]  Claire Cardie,et al.  OpinionFinder: A System for Subjectivity Analysis , 2005, HLT.

[16]  Noah A. Smith,et al.  Predicting Risk from Financial Reports with Regression , 2009, NAACL.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[19]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[20]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[21]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[22]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .