Machine Learning Models for Stock Prediction Using Real-Time Streaming Data

In recent years stock prediction has attracted a lot of attention to the researchers in financial sectors. Apart from the static log data, streaming data has also been proven to be a perennial source of data analysis collected in real-time, which basically deals with the continuous flow of data carrying information from sources like websites, mobile phone applications, server logs, social websites, trading floors, etc. The classifying model made out of historical data can be relentlessly honed to give even more accurate results since its outcome is always compared to the next tick of the clock. In this study, an attempt is made to develop machine learning models to predict the potential prices of a company’s stock which helps in making financial decisions. Spark streaming has been considered for the processing of humongous data and data ingestion tools like NodeJS have been further used for analysis. Earlier researches are made on the same concept but the present goal of the study is to develop such a model that is scalable, fault-tolerant and has a lower latency. The model rests on a distributed computing architecture called the Lambda Architecture which helps in attaining the goals as intended. Upon analysis, it is found that prediction of stock values is more accurate when support vector regression is applied. The historical stock values are considered as supervised datasets for training the models.

[1]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[2]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[3]  Inder Monga,et al.  Lambda architecture for cost-effective batch and speed big data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[4]  R. Glen Donaldson,et al.  Price Barriers in the Dow Jones Industrial Average , 1993, Journal of Financial and Quantitative Analysis.

[5]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[6]  Silviu Maniu,et al.  StreamDM: Advanced Data Mining in Spark Streaming , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[7]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[8]  Mike Cantelon,et al.  Node.js in Action , 2013 .

[9]  Erdinç Altay,et al.  Stock Market Forecasting: Artificial Neural Network and Linear Regression Comparison in An Emerging Market , 2006 .

[10]  J. Poterba,et al.  What moves stock prices? , 1988 .

[11]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..