Fund Asset Inference Using Machine Learning Methods: What’s in That Portfolio?

Given only the historic net asset value of a large-cap mutual fund, which members of some universe of stocks are held by the fund? Discovering an exact solution is combinatorially intractable because there are, for example, C(500, 30) or 1.4 × 1048 possible portfolios of 30 stocks drawn from the S&P 500. The authors extend an existing linear clones approach and introduce a new sequential oscillating selection method to produce a computationally efficient inference. Such techniques could inform efforts to detect fund window dressing of disclosure statements or to adjust market positions in advance of major fund disclosure dates. The authors test the approach by tasking the algorithm with inferring the constituents of exchange-traded funds for which the components can be later examined. Depending on the details of the specific problem, the algorithm runs on consumer hardware in 8 to 15 seconds and identifies target portfolio constituents with an accuracy of 88.2% to 98.6%. TOPICS: Big data/machine learning, statistical methods, portfolio management/multi-asset allocation

[1]  Chen Chen,et al.  Robust portfolio selection for index tracking , 2012, Comput. Oper. Res..

[2]  J. Evans Straightforward Statistics for the Behavioral Sciences , 1995 .

[3]  N. Amenc,et al.  The Performance of Characteristics-Based Indices , 2009 .

[4]  Marcin T. Kacperczyk,et al.  Unobserved Actions of Mutual Funds , 2005 .

[5]  Helder P. Palaro,et al.  Hedge Fund Returns , 2005 .

[6]  Pavel Paclík,et al.  Adaptive floating search methods in feature selection , 1999, Pattern Recognit. Lett..

[7]  M. Cugmas,et al.  On comparing partitions , 2015 .

[8]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[9]  William F. Sharpe,et al.  ASSET ALLOCATION: MANAGEMENT STYLE AND PERFORMANCE MEASUREMENT , 2002 .

[10]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[11]  N. C. P. Edirisinghe Index-tracking optimal portfolio selection , 2013 .

[12]  Jasmina Hasanhodzic,et al.  Can Hedge-Fund Returns Be Replicated?: The Linear Case , 2006 .

[13]  Alex Frino,et al.  The accuracy of the tick test : evidence from the Australian Stock Exchange , 1996 .

[14]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[15]  Helder P. Palaro,et al.  Hedge Fund Returns: You Can Make Them Yourself! , 2005 .

[16]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[17]  E. Croci,et al.  Price changes around hedge fund trades: disentangling trading and disclosure effects , 2013 .

[18]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[19]  L. Martellini,et al.  Passive Hedge Fund Replication – Beyond the Linear Case , 2010 .

[20]  W. Fung,et al.  Survivorship Bias and Investment Style in the Returns of CTAs , 1997 .

[21]  R. O. Edmister,et al.  JOURNAL OF FINANCIAL AND QUANTITATIVE ANALYSIS March 1972 AN EMPIRICAL TEST OF FINANCIAL RATIO ANALYSIS FOR SMALL BUSINESS FAILURE PREDICTION , 2009 .

[22]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[23]  E. Henry Market Reaction to Verbal Components of Earnings Press Releases: Event Study Using a Predictive Algorithm , 2006 .

[24]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[25]  V. Efthymiou Mandatory Portfolio Disclosure, Stock Liquidity, and Mutual Fund Performance , 2016 .

[26]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..