StockProF: a stock profiling framework using data mining approaches

Analysing stock financial data and producing an insight into it are not easy tasks for many stock investors, particularly individual investors. Therefore, building a good stock portfolio from a pool of stocks often requires Herculean efforts. This paper proposes a stock profiling framework, StockProF, for building stock portfolios rapidly. StockProF utilizes data mining approaches, namely, (1) Local Outlier Factor (LOF) and (2) Expectation Maximization (EM). LOF first detects outliers (stocks) that are superior or poor in financial performance. After removing the outliers, EM clusters the remaining stocks. The investors can then profile the resulted clusters using mean and 5-number summary. This study utilized the financial data of the plantation stocks listed on Bursa Malaysia. The authors used 1-year stock price movements to evaluate the performance of the outliers as well as the clusters. The results showed that StockProF is effective as the profiling corresponded to the average capital gain or loss of the plantation stocks.

[1]  Robert Tibshirani,et al.  Estimating the number of clusters in a data set via the gap statistic , 2000 .

[2]  P. Lynch,et al.  Beating the Street , 1993 .

[3]  Frank Klawonn,et al.  Cluster Analysis for Outlier Detection , 2009, Encyclopedia of Data Warehousing and Mining.

[4]  Huan Liu,et al.  Subspace clustering for high dimensional data: a review , 2004, SKDD.

[5]  Raymond T. Ng,et al.  Algorithms for Mining Distance-Based Outliers in Large Datasets , 1998, VLDB.

[6]  Kennedy D. Gunawardana,et al.  PREDICTING STOCK PRICE PERFORMANCE: A NEURAL NETWORK APPROACH , 2007 .

[7]  Roger E. A. Farmer,et al.  The Stock Market Crash of 2008 Caused the Great Recession: Theory and Evidence , 2011 .

[8]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[9]  Carlos Ordonez,et al.  SQLEM: fast clustering in SQL using the EM algorithm , 2000, SIGMOD '00.

[10]  Efraim Benmelech,et al.  Does Short-Term Debt Increase Vulnerability to Crisis? Evidence from the East Asian Financial Crisis , 2011 .

[11]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD 2000.

[12]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[13]  Bao Rong Chang,et al.  Feature Selection and Parameter Optimization of a Fuzzy-based Stock Selection Model Using Genetic Algorithms , 2012 .

[14]  Yoshimi Fukuhara,et al.  Stock Price Prediction Using Prior Knowledge and Neural Networks , 1997, Intell. Syst. Account. Finance Manag..

[15]  S. Basu,et al.  Investment Performance of Common Stocks in Relation to their Price-Earnings Ratios , 1977 .

[16]  Yigitcan Karabulut Can Facebook Predict Stock Market Activity? , 2013 .

[17]  Yannis Manolopoulos,et al.  Data Mining techniques for the detection of fraudulent financial statements , 2007, Expert Syst. Appl..

[18]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[19]  Yong Haur Tay,et al.  Modeling financial ratios of Malaysian plantation stocks using Bayesian Networks , 2012, 2012 IEEE Conference on Sustainable Utilization and Development in Engineering and Technology (STUDENT).

[20]  Chuong B Do,et al.  What is the expectation maximization algorithm? , 2008, Nature Biotechnology.

[21]  Hui Li,et al.  Data mining method for listed companies' financial distress prediction , 2008, Knowl. Based Syst..

[22]  Elise Whitley,et al.  Statistics review 1: Presenting and summarising data , 2001, Critical care.

[23]  Andrew Ang,et al.  Investing for the Long Run , 2011 .

[24]  Shouyang Wang,et al.  Forecasting stock market movement direction with support vector machine , 2005, Comput. Oper. Res..

[25]  Xin Jin,et al.  Expectation Maximization Clustering , 2010, Encyclopedia of Machine Learning.

[26]  Yi-Fan Wang,et al.  Mining stock price using fuzzy rough set system , 2003, Expert Syst. Appl..

[27]  Patricia M. Dechow,et al.  Short-sellers, fundamental analysis, and stock returns , 2001 .

[28]  David Enke,et al.  The use of data mining and neural networks for forecasting stock market returns , 2005, Expert Syst. Appl..

[29]  M. Kosaka,et al.  Application Of Neural Network To Technical Analysis Of Stock Market Prediction , 2001 .

[30]  Yukihiro Nakamura,et al.  Stock Price Prediction Using Prior Knowledge and Neural Networks , 1997, Intell. Syst. Account. Finance Manag..

[31]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[32]  Pei-Chann Chang,et al.  A TSK type fuzzy rule based system for stock price prediction , 2008, Expert Syst. Appl..

[33]  Yudong Zhang,et al.  Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network , 2009, Expert Syst. Appl..

[34]  S. Penman Financial Statement Analysis and Security Valuation , 2001 .

[35]  P. Shi,et al.  The 2011 eastern Japan great earthquake disaster: Overview and comments , 2011 .

[36]  Kenneth Kasa,et al.  Common stochastic trends in international stock markets , 1992 .

[37]  J. Lowe Warren Buffett Speaks: Wit and Wisdom from the World's Greatest Investor , 1997 .

[38]  Jinyan Li,et al.  A case study on financial ratios via cross-graph quasi-bicliques , 2011, Inf. Sci..

[39]  Chih-Ming Hsu,et al.  A hybrid procedure for stock price prediction by integrating self-organizing map and genetic programming , 2011, Expert Syst. Appl..

[40]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[41]  H. Kong,et al.  Economic Impact of SARS: The Case of Hong Kong , 2004, Asian Economic Papers.

[42]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[43]  M. K. Tiwari,et al.  Clustering Indian stock market data for portfolio management , 2010, Expert Syst. Appl..

[44]  Abdul Razak Intan Azmira,et al.  Feature selection and parameter optimization with GA-LSSVM in electricity price forecasting , 2015 .

[45]  Andrew Kusiak,et al.  Data-mining-based system for prediction of water chemistry faults , 2006, IEEE Transactions on Industrial Electronics.

[46]  Osama Abu Abbas,et al.  Comparisons Between Data Clustering Algorithms , 2008, Int. Arab J. Inf. Technol..

[47]  Anthony J. T. Lee,et al.  An Effective Clustering Approach to Stock Market Prediction , 2010, PACIS.

[48]  Hans-Peter Kriegel,et al.  LOF: identifying density-based local outliers , 2000, SIGMOD '00.

[49]  E. Fama Random Walks in Stock Market Prices , 1965 .

[50]  Wai Lam,et al.  News Sensitive Stock Trend Prediction , 2002, PAKDD.

[51]  Robert Tibshirani,et al.  Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[52]  Uwe Aickelin,et al.  A Data Mining Framework to Model Consumer Indebtedness with Psychological Factors , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[53]  A. Keller Fuzzy clustering with outliers , 2000, PeachFuzz 2000. 19th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.00TH8500).

[54]  Ta-Cheng Chen,et al.  A study of applying data mining approach to the information disclosure for Taiwan's stock market investors , 2009, Expert Syst. Appl..

[55]  Wing-Keung Wong,et al.  How rewarding is technical analysis? Evidence from Singapore stock market , 2003 .

[56]  David L. Olson,et al.  Introduction to Business Data Mining , 2005 .

[57]  P. A. Fisher Common Stocks and Uncommon Profits , 1960 .

[58]  B. Greenwald,et al.  Value Investing: From Graham to Buffett and Beyond , 2001 .

[59]  Uwe Aickelin,et al.  A Data Mining Framework to Model Consumer Indebtedness with Psychological Factors , 2014 .

[60]  Vladimir Estivill-Castro,et al.  Fast and Robust General Purpose Clustering Algorithms , 2000, PRICAI.

[61]  Andrew H. Chen,et al.  The effects of terrorism on global capital markets , 2004 .