Exploiting social media with higher-order Factorization Machines: statistical arbitrage on high-frequency data of the S&P 500

Over the past 15 years, there have been a number of studies using text mining for predicting stock market data. Two recent publications employed support vector machines and second-order Factorization Machines, respectively, to this end. However, these approaches either completely neglect interactions between the features extracted from the text, or they only account for second-order interactions. In this paper, we apply higher-order Factorization Machines, for which efficient training algorithms have only been available since 2016. As Factorization Machines require hyperparameters to be specified, we also introduce a novel adaptive-order algorithm for automatically determining them. Our study is the first one to make use of social media data for predicting minute-by-minute stock returns, namely the ones of the S&P 500 stock constituents. We show that, unlike a trading strategy employing support vector machines, Factorization-Machine-based strategies attain positive returns after transactions costs for the years 2014 and 2015. Especially the approach applying the adaptive-order algorithm outperforms classical approaches with respect to a multitude of criteria, and it features very favorable characteristics.

[1]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[2]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[3]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[4]  Xin Wang,et al.  Compressed knowledge transfer via factorization machine for heterogeneous collaborative recommendation , 2015, Knowl. Based Syst..

[5]  Ying Wah Teh,et al.  Text mining for market prediction: A systematic review , 2014, Expert Syst. Appl..

[6]  Julian Knoll,et al.  Financial market predictions with Factorization Machines: Trading the opening hour based on overnight social media data , 2018 .

[7]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[8]  A. Menkveld High frequency trading and the new market makers , 2013 .

[9]  Paulo Cortez,et al.  The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices , 2017 .

[10]  Martha Larson,et al.  Cross-Domain Collaborative Filtering with Factorization Machines , 2014, ECIR.

[11]  Nicolas Huck,et al.  Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 , 2017, Eur. J. Oper. Res..

[12]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[13]  Gyözö Gidófalvi Using News Articles to Predict Stock Price Movements , 2001 .

[14]  Narasimhan Jegadeesh,et al.  Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency , 1993 .

[15]  Hsinchun Chen,et al.  A quantitative stock prediction system based on financial news , 2009, Inf. Process. Manag..

[16]  R. Faff,et al.  Does Simple Pairs Trading Still Work? , 2010 .

[17]  Nigel Collier,et al.  An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter , 2012 .

[18]  Christoph Freudenthaler,et al.  Bayesian Factorization Machines , 2011 .

[19]  C. Krauss,et al.  Statistical arbitrage with vine copulas , 2018 .

[20]  Julian Knoll,et al.  Recommending with Higher-Order Factorization Machines , 2016, SGAI Conf..

[21]  Gerhard Gossen,et al.  Evaluation of methods and techniques for language based sentiment analysis for dax 30 stock exchange - A first concept of a "LUGO" sentiment indicator , 2012 .

[22]  E. Fama,et al.  Multifactor Explanations of Asset Pricing Anomalies , 1996 .

[23]  Nicolas Huck,et al.  Pairs trading and outranking: The multi-step-ahead forecasting case , 2010, Eur. J. Oper. Res..

[24]  Hossein Rad,et al.  The Profitability of Pairs Trading Strategies: Distance, Cointegration, and Copula Methods , 2015 .

[25]  C. Bacon Practical Portfolio Performance Measurement and Attribution , 2004 .

[26]  Steffen Rendle,et al.  Learning recommender systems with adaptive regularization , 2012, WSDM '12.

[27]  William N. Goetzmann,et al.  Pairs Trading: Performance of a Relative Value Arbitrage Rule , 1998 .

[28]  Johannes Stübinger,et al.  Non-linear dependence modelling with bivariate copulas: statistical arbitrage pairs trading on the S&P 100 , 2017 .

[29]  Jianxin Chen,et al.  Matrix Factorization-Based Prediction of Novel Drug Indications by Integrating Genomic Space , 2015, Comput. Math. Methods Medicine.

[30]  Martin Weber,et al.  On the Determinants of Pairs Trading Profitability , 2014 .

[31]  Raymond K. Wong,et al.  Currency Exchange Rate Forecasting From News Headlines , 2002, Australasian Database Conference.

[32]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[33]  Steffen Rendle,et al.  Factorization Machines with libFM , 2012, TIST.

[34]  Nicolas Huck,et al.  Pairs selection and outranking: An application to the S&P 100 index , 2009, Eur. J. Oper. Res..

[35]  Chuan-Ju Wang,et al.  Social Influencer Analysis with Factorization Machines , 2015, WebSci.

[36]  Yuan Zhang,et al.  Modelling high-frequency limit order book dynamics with support vector machines , 2015 .

[37]  M. Avellaneda,et al.  Statistical arbitrage in the US equities market , 2010 .

[38]  Brian D. Davison,et al.  Co-factorization machines: modeling user interests and predicting individual decisions in Twitter , 2013, WSDM.

[39]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[40]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[41]  David Lo,et al.  Predicting response in mobile advertising with hierarchical importance-aware factorization machine , 2014, WSDM.

[42]  Wolfgang Härdle,et al.  Modeling default risk with support vector machines , 2011 .

[43]  Jorge Mina,et al.  Return to RiskMetrics: The Evolution of a Standard , 2001 .

[44]  E. Fama,et al.  A Five-Factor Asset Pricing Model , 2014 .

[45]  Johannes Stübinger,et al.  Pairs trading with a mean-reverting jump–diffusion model on high-frequency data , 2018 .

[46]  Rand Kwong Yew Low,et al.  The profitability of pairs trading strategies: distance, cointegration and copula methods , 2016 .

[47]  Xiaojie Yuan,et al.  Exploiting Social Media for Stock Market Prediction with Factorization Machine , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[48]  Peng Yan,et al.  E-Commerce Item Recommendation Based on Field-aware Factorization Machine , 2015, RecSys Challenge.

[49]  Hélyette Geman,et al.  Intraday pairs trading strategies on high frequency data: the case of oil companies , 2017 .

[50]  Mark C. Hutchinson,et al.  Pairs trading in the UK equity market: risk and return , 2014 .

[51]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[52]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[53]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.