Document Clustering for Event Identification and Trend Analysis in Market News

In this paper we have proposed a stock market analysis system that analyzes financial news items to identify and characterize major events that impact the market. The events have been identified using Latent Dirichlet Allocation(LDA) based topic extraction mechanism. The topic-document data is then clustered using kernel k means algorithm. The clusters are analyzed jointly with the SENSEX raw data to extract major events and their effects. The system has been implemented on capital market news about the Indian share market of the past three years.

[1]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[2]  Katia Sycara,et al.  News and trading rules , 2003 .

[3]  Jian Zhang,et al.  Daily Prediction of Major Stock Indices from Textual WWW Data , 1998, KDD.

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  Jon Espen Ingvaldsen,et al.  Financial News Mining: Monitoring Continuous Streams of Text , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[6]  Young-Woo Seo,et al.  Financial News Analysis for Intelligent Portfolio Management , 2004 .

[7]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[8]  Hannu Vanharanta,et al.  Combining data and text mining techniques for analysing financial reports , 2004, Intell. Syst. Account. Finance Manag..

[9]  Gyözö Gidófalvi Using News Articles to Predict Stock Price Movements , 2001 .