Big data and portfolio optimization: A novel approach integrating DEA with multiple data sources

Abstract The existing literature suggests that the out-of-sample performance of traditional mean-variance portfolio strategies is not robust, and their performance is even inferior to that of the equal weight strategy. To address this problem, this paper first clarifies that a complete investment process consists of two parts, namely, stock selection and investment weight formulation. Then, we design a stock selection scheme integrating Data Envelopment Analysis (DEA) with multiple data sources, including historical stock trading data, technical indicators, social media data and news data, to assess the investment value of stocks in terms of historical return, asset correlation and investor sentiment performance. In addition, we use Support Vector Machine (SVM) combined with the multi-source data on stocks to predict the stock price movements and combine the obtained stock price movements and the proposed stock selection scheme to construct the portfolio optimization model. Further, we also carry out an out-of-sample test on the proposed stock selection scheme and investment strategies, in which the constituents of CSI 300 index are selected as the test samples. The empirical results show that the proposed stock selection scheme can effectively improve the out-of-sample performance of all investment strategies. Besides, the proposed investment strategy has a better out-of-sample performance compared to the traditional global minimum variance investment strategy, tangency portfolio investment strategy, and equal weight strategy. Finally, we perform a robustness test of the above findings using an additional dataset.

[1]  N. C. P. Edirisinghe,et al.  Generalized DEA model of fundamental analysis and its application to portfolio optimization , 2007 .

[2]  Yu-Chen Wei,et al.  Informativeness of the market news sentiment in the Taiwan stock market , 2017 .

[3]  Kristiaan Kerstens,et al.  Non-parametric frontier estimates of mutual fund performance using C- and L-moments: Some specification tests , 2011 .

[4]  Tan Wang,et al.  Keynes Meets Markowitz: The Tradeoff between Familiarity and Diversification , 2009, Manag. Sci..

[5]  Jan Annaert,et al.  Performance Evaluation of Portfolio Insurance Strategies Using Stochastic Dominance Criteria , 2009 .

[6]  Jesper Rangvid,et al.  The Aggregate Cost of Equity Underdiversification , 2019, Financial Review.

[7]  Lucas Borges Ferreira,et al.  Estimation of reference evapotranspiration in Brazil with limited meteorological data using ANN and SVM – A new approach , 2019, Journal of Hydrology.

[8]  Avinash Chandra Pandey,et al.  Twitter sentiment analysis using hybrid cuckoo search method , 2017, Inf. Process. Manag..

[9]  Nikolas Topaloglou,et al.  TESTING FOR PROSPECT AND MARKOWITZ STOCHASTIC DOMINANCE EFFICIENCY , 2017 .

[10]  Martin Branda,et al.  Diversification-consistent data envelopment analysis based on directional-distance measures , 2015 .

[11]  Christoph Memmel,et al.  Estimating the Global Minimum Variance Portfolio , 2006 .

[12]  Tahir M. Nisar,et al.  Twitter as a tool for forecasting stock market movements: A short-window event study , 2018, The Journal of Finance and Data Science.

[13]  Chiwei Su,et al.  Does the Efficient Market Hypothesis Fit Military Enterprises in China? , 2019 .

[14]  Joe Zhu,et al.  Use of DEA cross-efficiency evaluation in portfolio selection: An application to Korean stock market , 2014, Eur. J. Oper. Res..

[15]  Junyoung Heo,et al.  Stock Price Prediction Based on Financial Statements Using SVM , 2016 .

[16]  Stefan Feuerriegel,et al.  Long-term stock index forecasting based on text mining of regulatory disclosures , 2018, Decis. Support Syst..

[17]  Richard O. Michaud The Markowitz Optimization Enigma: Is 'Optimized' Optimal? , 1989 .

[18]  Gholam R. Amin,et al.  Modelling stock selection using ordered weighted averaging operator , 2018, Int. J. Intell. Syst..

[19]  Thomas Renault,et al.  Intraday online investor sentiment and return patterns in the U.S. stock market , 2017 .

[20]  Raman Uppal,et al.  Model Misspecification and Under-Diversification , 2002 .

[21]  Helu Xiao,et al.  Estimation of fuzzy portfolio efficiency via an improved DEA approach , 2020, INFOR Inf. Syst. Oper. Res..

[22]  Hong Liu Solvency Constraint, Underdiversification, and Idiosyncratic Risks , 2014, Journal of Financial and Quantitative Analysis.

[23]  Philippe Jorion Bayesian and CAPM estimators of the means: Implications for portfolio selection , 1991 .

[24]  Hsinchun Chen,et al.  Sentimental Spidering: Leveraging Opinion Information in Focused Crawlers , 2012, TOIS.

[25]  Helu Xiao,et al.  Forecasting stock price movements with multiple data sources: Evidence from stock market in China , 2020 .

[26]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[27]  Daiki Min,et al.  Efficiency of well-diversified portfolios: Evidence from data envelopment analysis , 2017 .

[28]  Olivier Scaillet,et al.  Testing for Stochastic Dominance Efficiency , 2006 .

[29]  M. Thenmozhi,et al.  Support Vector Machines Approach to Predict the S&P CNX NIFTY Index Returns , 2007 .

[30]  Gholam R. Amin,et al.  Application of Optimistic and Pessimistic OWA and DEA Methods in Stock Selection , 2016, Int. J. Intell. Syst..

[31]  G. Prem Kumar,et al.  Cuckoo optimized SVM for stock market prediction , 2015, 2015 International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS).

[32]  Chun-Ying Huang,et al.  An integrated DEA-MODM methodology for portfolio optimization , 2015, Oper. Res..

[33]  Dimitrios D. Thomakos,et al.  Robust model rankings of forecasting performance , 2018, Journal of Forecasting.

[34]  Guofu Zhou,et al.  Markowitz meets Talmud: A combination of sophisticated and naive diversification strategies ☆ , 2011 .

[35]  Dexiang Wu,et al.  Robust Decision Support System for Asset Assessment and Management , 2017, IEEE Systems Journal.

[36]  Fadel M. Megahed,et al.  Stock market one-day ahead movement prediction using disparate data sources , 2017, Expert Syst. Appl..

[37]  Olivier Ledoit,et al.  Improved estimation of the covariance matrix of stock returns with an application to portfolio selection , 2003 .

[38]  Snehanshu Saha,et al.  Predicting the direction of stock market prices using tree-based classifiers , 2019, The North American Journal of Economics and Finance.

[39]  Elisabetta Fersini,et al.  Sentiment analysis: Bayesian Ensemble Learning , 2014, Decis. Support Syst..

[40]  Ammar Belatreche,et al.  Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning , 2016, Decis. Support Syst..

[41]  Ruiyue Lin,et al.  Directional distance based diversification super-efficiency DEA models for mutual funds , 2020 .

[42]  E. Fama The Behavior of Stock-Market Prices , 1965 .

[43]  Helu Xiao,et al.  DEA frontier improvement and portfolio rebalancing: An application of China mutual funds on considering sustainability information disclosure , 2017, Eur. J. Oper. Res..

[44]  Adam Atkins,et al.  Financial news predicts stock market volatility better than close price , 2018, The Journal of Finance and Data Science.

[45]  Massimo Guidolin,et al.  Ambiguity Aversion and Underdiversification , 2016, Journal of Financial and Quantitative Analysis.

[46]  Investment Strategy on the Zagreb Stock Exchange Based on Dynamic DEA , 2014 .

[47]  Philip S. Yu,et al.  Improving stock market prediction via heterogeneous information fusion , 2017, Knowl. Based Syst..

[48]  O. Malafeyev,et al.  Random Walks and Market Efficiency in Chinese and Indian Equity Markets , 2017, Statistics, Optimization & Information Computing.

[49]  Hsin-Hung Chen,et al.  Stock selection using data envelopment analysis , 2008, Ind. Manag. Data Syst..

[50]  R. Jagannathan,et al.  Risk Reduction in Large Portfolios: Why Imposing the Wrong Constraints Helps , 2002 .

[51]  M. Nardo,et al.  Walking Down Wall Street with a Tablet: A Survey of Stock Market Predictions Using the Web , 2016 .

[52]  Victor DeMiguel,et al.  Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy? , 2009 .

[53]  Dolores Añón Higón,et al.  The hasty wisdom of the mob: How market sentiment predicts stock market behavior , 2017, Expert Syst. Appl..

[54]  Alan F. Smeaton,et al.  Classifying sentiment in microblogs: is brevity an advantage? , 2010, CIKM.

[55]  Leila Zamani,et al.  Portfolio Selection using Data Envelopment Analysis (DEA): A Case of Select Indian Investment Companies , 2014 .

[56]  Antonios Siganos,et al.  Divergence of Sentiment and Stock Market Trading , 2017 .

[57]  Nikolas Topaloglou,et al.  Stochastic dominance tests , 2020 .

[58]  Helu Xiao,et al.  Estimation of cardinality constrained portfolio efficiency via segmented DEA , 2018 .

[59]  John D. Lamb,et al.  Data envelopment analysis models of investment funds , 2012, Eur. J. Oper. Res..

[60]  Hashem Omrani,et al.  An integrated multi-objective Markowitz-DEA cross-efficiency model with fuzzy returns for portfolio selection problem , 2016, Appl. Soft Comput..

[61]  W. Ziemba,et al.  The Effect of Errors in Means, Variances, and Covariances on Optimal Portfolio Choice , 1993 .

[62]  Marc Winter,et al.  Objective microstructure classification by support vector machine (SVM) using a combination of morphological parameters and textural features for low carbon steels , 2019, Computational Materials Science.

[63]  Helu Xiao,et al.  Estimation of portfolio efficiency via DEA , 2015 .

[64]  Equity portfolio optimization: A DEA based methodology applied to the Zagreb Stock Exchange , 2015 .

[65]  Olivier Darné,et al.  The random walk hypothesis for Chinese stock markets: Evidence from variance ratio tests , 2009 .

[66]  Huimin Zhao,et al.  Adapting sentiment lexicons to domain-specific social media texts , 2017, Decis. Support Syst..