A DT-SVM Strategy for Stock Futures Prediction with Big Data

This paper presents a stock futures prediction strategy by using a hybrid method to forecast the price trends of the futures which is essential for investment decisions. In order to deal with huge amounts of futures data, our strategy consists of two main parts: I. Raw Data Treatment and Features Extraction, and II. DT-SVM Hybrid Model Training. In this paper, we employ real-world transaction data of stock futures contracts for our study. The data are first stored in a distributed database. Afterwards, the data are distributed to a group of computing nodes to extract statistical features. Finally, a hybrid method combing DT (Decision Tree) and SVM (Support Vector Machine) algorithms is applied. The method can filter most noisy data with the DT algorithm in the first phase, and then using the SVM algorithm to process the big training data in the second phase. As prediction models are trained for each stock futures contract, it is necessary to employ high performance algorithms. Therefore, to deal with the processing of the big data, distributed algorithms are implemented in the form of MapReduce. The experimental results show that our strategy can outperform three popular methods including Bootstrap-SVM, Bootstrap-DT and BPNN. Specifically, our DT-SVM strategy can achieve an increase on the best average precision rate, best average recall rate and best average F-One rate among the other three methods by 5%, 19%, and 12% respectively.

[1]  Asif Ullah Khan,et al.  Genetic Algorithm Based Backpropagation Neural Network Performs better than Backpropagation Neural Network in Stock Rates Prediction , 2008 .

[2]  F.-C. Chen,et al.  Back-propagation neural networks for nonlinear self-tuning adaptive control , 1990, IEEE Control Systems Magazine.

[3]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[4]  Pineda,et al.  Generalization of back-propagation to recurrent neural networks. , 1987, Physical review letters.

[5]  Jean-Arcady Meyer,et al.  BIOLOGICALLY BASED ARTIFICIAL NAVIGATION SYSTEMS: REVIEW AND PROSPECTS , 1997, Progress in Neurobiology.

[6]  Leslie Lamport,et al.  Time, clocks, and the ordering of events in a distributed system , 1978, CACM.

[7]  Kyoung-jae Kim,et al.  Stock market prediction using artificial neural networks with optimal feature transformation , 2004, Neural Computing & Applications.

[8]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[9]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[10]  David E. Goldberg,et al.  Genetic algorithms and Machine Learning , 1988, Machine Learning.

[11]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[12]  George J. Klir,et al.  Fuzzy sets and fuzzy logic , 1995 .

[13]  Yoshiki Uchikawa,et al.  On fuzzy modeling using fuzzy neural networks with the back-propagation algorithm , 1992, IEEE Trans. Neural Networks.

[14]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[15]  Pei-Chann Chang,et al.  A neural network with a case based dynamic window for stock trading prediction , 2009, Expert Syst. Appl..

[16]  Ming-Chi Lee,et al.  Using support vector machine with a hybrid feature selection method to the stock trend prediction , 2009, Expert Syst. Appl..

[17]  Kin Keung Lai,et al.  A Novel Adaptive Learning Algorithm for Stock Market Prediction , 2005, ISAAC.

[18]  Pei-Chann Chang,et al.  Fuzzy Delphi and back-propagation model for sales forecasting in PCB industry , 2006, Expert Syst. Appl..

[19]  Marc Ferrer,et al.  Median Absolute Deviation to Improve Hit Selection for Genome-Scale RNAi Screens , 2008, Journal of biomolecular screening.

[20]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[21]  N De Pauw,et al.  River restoration simulations by ecosystem models predicting aquatic macroinvertebrate communities based on J48 classification trees. , 2001, Mededelingen.

[22]  F.-C. Chen,et al.  Back-propagation neural network for nonlinear self-tuning adaptive control , 1989, Proceedings. IEEE International Symposium on Intelligent Control 1989.

[23]  H. White,et al.  Economic prediction using neural networks: the case of IBM daily stock returns , 1988, IEEE 1988 International Conference on Neural Networks.

[24]  Douglas Stott Parker,et al.  Map-reduce-merge: simplified relational data processing on large clusters , 2007, SIGMOD '07.

[25]  Siegfried Gottwald,et al.  Fuzzy Sets and Fuzzy Logic , 1993 .

[26]  F. Tay,et al.  Application of support vector machines in financial time series forecasting , 2001 .

[27]  Zehong Yang,et al.  Short-term stock price prediction based on echo state networks , 2009, Expert Syst. Appl..

[28]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[29]  Chih-Fong Tsai,et al.  Combining multiple feature selection methods for stock prediction: Union, intersection, and multi-intersection approaches , 2010, Decis. Support Syst..

[30]  J. M. Duncan,et al.  Factors of Safety and Reliability in Geotechnical Engineering , 2000 .

[31]  Cars H. Hommes,et al.  Financial markets as nonlinear adaptive evolutionary systems , 2001 .

[32]  William Stafford Noble,et al.  Support vector machine , 2013 .

[33]  Sukhdev Khebbal,et al.  Intelligent Hybrid Systems , 1994 .