论文信息 - Efficient Statement Identification for Automatic Market Forecasting

Efficient Statement Identification for Automatic Market Forecasting

Strategic business decision making involves the analysis of market forecasts. Today, the identification and aggregation of relevant market statements is done by human experts, often by analyzing documents from the World Wide Web. We present an efficient information extraction chain to automate this complex natural language processing task and show results for the identification part. Based on time and money extraction, we identify sentences that represent statements on revenue using support vector classification. We provide a corpus with German online news articles, in which more than 2,000 such sentences are annotated by domain experts from the industry. On the test data, our statement identification algorithm achieves an overall precision and recall of 0.86 and 0.87 respectively.

Benno Stein | Henning Wachsmuth | Peter Prettenhofer

[1] Irene M. Cramer,et al. Classifying Number Expressions in German Corpora , 2007, GfKl.

[2] M. de Rijke,et al. Extracting Temporal Information from Open Domain Text: A Comparative Exploration , 2005, J. Digit. Inf. Manag..

[3] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[4] Moshe Koppel,et al. Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[5] Thomas Gottron. EVALUATING CONTENT EXTRACTION ON HTML DOCUMENTS , 2007 .

[6] Andreas Christmann,et al. Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[7] Xiaohui Yu,et al. ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[8] David A. Ferrucci,et al. UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[9] Rafael Muñoz,et al. TERSEO: Temporal Expression Resolution System Applied to Event Ordering , 2003, TSD.

[10] Yuji Matsumoto,et al. Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[11] Chih-Jen Lin,et al. Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..