Constructing Equity Portfolios from SEC 13F Data Using Feature Extraction and Machine Learning

Securities and Exchange Commission (SEC) form 13F allows the public a quarterly glimpse of the holdings of top asset managers. Here the authors propose two novel quantitative approaches for constructing portfolio-generating models based on these data, as well as guidelines for feature extraction. The first examined approach uses extracted features from 13F data to predict stock price movements. Stocks that are predicted to have positive returns are then used to assemble a portfolio. The second approach uses extracted features to identify top funds and then assembles their holdings into an aggregate portfolio. Both techniques outperformed the S&P 500 in historical backtesting, earning an annualized 15.0% and 19.8%, respectively, compared with 9.5% for the S&P 500. Overall, these results build on additional statistical approaches for creating quantitative models with SEC 13F data, in addition to offering a previously unexplored machine learning approach that outperforms the benchmark. TOPICS: Exchanges/markets/clearinghouses, statistical methods, simulations, big data/machine learning Key Findings • Proper feature extraction from Securities and Exchange Commission 13F data allows for the construction of machine learning models such as XGBoost and Logistic Regression that outperform an S&P 500 benchmark in historical backtesting for equity portfolio generation. • Traditional algorithmic strategies achieve strong performance over the S&P 500 benchmark. The examined strategies create an aggregate portfolio based on the holdings of top 13F-reporting asset managers as determined by extracted features such as cash inflow and historical returns. • Features may be extracted from 13F data that have statistically significant predicting power not only for the future returns of funds but also for the direction of future price movements of specific stocks.