Data Mining the Financial Time Series with K-Nearest Neighbours Predictive Model

As we live in knowledge era, most of the organisations have rightly come to the realisation that the most important Critical Success Factor (CSF) is the knowledge the company has acquired. But the key question is how to acquire process and manage the knowledge. This necessitates the development of new disciplines such as Artificial Intelligence, Knowledge Management, Data mining etc. Data mining is the new discipline aimed at mining the data to discover knowledge for the organization. Forecasting is the fundamental problem of an organization. There has always been a growing interest in trying to predict the stock prices and to identify the future trend of the stock market. There is plethora of traditional and modern tools that have been used to test whether the stock prices are predictable with acceptable level of accuracy. This paper attempts to mine the time series data for forecasting the stock prices/index values. The primary objective of the study is to mine the time series data by building a model using KNN algorithm to predict the stock and index values. The secondary aim of the study is to validate the model by comparing its prediction results with that of the traditional prediction tool of regression technique. As an example for the time series, closing stock prices of two actively traded companies (Infosys Technologies & Associated Cement Company) and two major stock market indices (BSE SENSEX & NSE Nifty) for the period from 1st Jan 2010 to 12th April 2010 has been taken as a sample for study. Data mining is done by the K-Nearest Neighbour (KNN) model to predict the future values. Here the data set has been divided into training data set and test data set. The model leans from the training data set and makes prediction for the test data set. The difference between the actual values of the time series and the predicted values from the test data set is used for evaluating the performance of the predictive model. Then the model is validated by comparing the outcome of the predictive model with the predictions from the traditional Regression model for the same test data. It is found that the results are encouraging as the KNN predictive model outperforms the regression mode in all the four cases. Nevertheless it needs to be studied deeper.