Efficient Statement Identification for Automatic Market Forecasting

Strategic business decision making involves the analysis of market forecasts. Today, the identification and aggregation of relevant market statements is done by human experts, often by analyzing documents from the World Wide Web. We present an efficient information extraction chain to automate this complex natural language processing task and show results for the identification part. Based on time and money extraction, we identify sentences that represent statements on revenue using support vector classification. We provide a corpus with German online news articles, in which more than 2,000 such sentences are annotated by domain experts from the industry. On the test data, our statement identification algorithm achieves an overall precision and recall of 0.86 and 0.87 respectively.

[1]  Irene M. Cramer,et al.  Classifying Number Expressions in German Corpora , 2007, GfKl.

[2]  M. de Rijke,et al.  Extracting Temporal Information from Open Domain Text: A Comparative Exploration , 2005, J. Digit. Inf. Manag..

[3]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[4]  Moshe Koppel,et al.  Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[5]  Thomas Gottron EVALUATING CONTENT EXTRACTION ON HTML DOCUMENTS , 2007 .

[6]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[8]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[9]  Rafael Muñoz,et al.  TERSEO: Temporal Expression Resolution System Applied to Event Ordering , 2003, TSD.

[10]  Yuji Matsumoto,et al.  Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[11]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[12]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[13]  Inderjeet Mani,et al.  Robust Temporal Processing of News , 2000, ACL.

[14]  Ludwig Berekoven,et al.  Marktforschung : Methodische Grundlagen und praktische Anwendung , 1986 .

[15]  Benno Stein,et al.  AUTOMATING MARKET FORECAST SUMMARIZATION FROM INTERNET DATA , 2009 .

[16]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[17]  Fernando Pereira,et al.  Reading the Markets: Forecasting Public Opinion of Political Candidates by News Analysis , 2008, COLING.

[18]  Matthew Hurst,et al.  Deriving marketing intelligence from online discussion , 2005, KDD '05.

[19]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[20]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.