PERFORMANCE COMPARISON OF TIME SERIES DATA USING PREDICTIVE DATA MINING TECHNIQUES

This paper focuses on the methodology used in applying the Time Series Data Mining techniques to financial time series data for calculating currency exchange rates of US dollars to Indian Rupees. Four Models namely Multiple Regression in Excel, Multiple Linear Regression of Dedicated Time Series Analysis in Weka, Vector Autoregressive Model in R and Neural Network Model using NeuralWorks Predict are analyzed. All the models are compared on the basis of the forecasting errors generated by them. Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE) are used as a forecast accuracy measure. Results show that all the models accurately predict the exchange rates, but Multiple Linear Regression of Dedicated Time Series Analysis in Weka outperforms the other three models. KeywordsExchange Rate Prediction, Time Series Models, Regression, Predictive Data Mining, Weka, VAR, Neural Network Advances in Information Mining ISSN: 0975-3265 & E-ISSN: 0975-9093, Volume 4, Issue 1, 2012 Introduction One of the most enticing application areas of data mining in these emerging technologies is in finance, becoming more amenable to data-driven modeling as large sets of financial data become available. In the field of finance the extensive use of data mining applications includes the area of forecasting stock market, pricing of corporate bonds, understanding and managing financial risk, trading futures, prediction of exchange rates, credit rating etc. Monthly data is collected for the last 10 years from 2000 to 2010, for predicting exchange rates of 2011 [5,14,16].The original rate of 2011 is available and then compared with the predicted values for calculating the accuracy of the models. Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE) is used as a forecast accuracy measure. The multiple variables used on which exchange rate depends are CPI, Trade Balance (in million US dollars), GDP, Unemployment and Monetary Base (in billion dollars) [16]. Four Models namely Multiple Linear Regression in Excel [14], Multiple Linear Regression of Dedicated Time Series Analysis in Weka [6, 9], Vector Autoregressive Model in R [6-8, 10] and Neural Network Model [4,5,13-15] using NeuralWorks Predict are analyzed. All the models are compared on the basis of the forecasting errors generated by them. The paper is organized as follows. Section II covers predictive data mining. Section III covers the four predictive time series models namely Multiple Linear Regression in Excel, Multiple Linear Regression of Dedicated Time Series Analysis in Weka, Vector Autoregressive Model in R and Neural Network Model using NeuralWorks Predict. Section IV covers the datasets used for the analysis and the steps and results obtained by using the four models. Section V shows the performance comparison of the four models using Mean Error (ME), Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE), Mean Percentage Error (MPE) and Mean Absolute Percentage Error (MAPE). Section VI concludes the work, followed by references in the last section. Predictive Data Mining Predictive data mining analyzes data in order to construct one or a set of models and attempts to predict the behavior of new data sets. Prediction is a form of data analysis that can be used to extract models describing important data classes or to predict future data trends. Such analysis can help provide us with a better understanding of the data at large. Prediction can also be viewed as a mapping or function, y = f (X), where X is the input (e.g., a tuple describing a loan applicant), and the output y is a continuous or ordered value (such as the predicted amount that the bank can safely loan the applicant); That is, we wish to learn a mapping or function that models the relationship between X and y. There are two issues regarding prediction: first is preparing the data for prediction which involves the preprocessing steps like data cleaning, relevance analysis, data transformation and data reduction, second issue is comparing the different prediction models. The models are compared according to the criteria given below: Citation: Saigal S. and Mehrotra D. (2012) Performance Comparison of Time Series Data Using Predictive Data Mining Techniques. Advances in Information Mining, ISSN: 0975-3265 & E-ISSN: 0975-9093, Volume 4, Issue 1, pp.-57-66. Copyright: Copyright©2012 Saigal S. and Mehrotra D. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.