Statistical Arbitrage in High Frequency Trading Based on Limit Order Book Dynamics

Classic asset pricing theory assumes prices will eventually adjust to and reflect the fair value, the route and speed of transition is not specified. Market Microstructure studies how prices adjust to reflect new information. Recent years have seen the widely available high frequency data enabled by the rapid advance in information technology. Using high frequency data, it’s interesting to study the roles played by the informed traders and noise traders and how the prices are adjusted to reflect information flow. It’s also interesting to study whether returns are more predictable in the high frequency setting and whether one could exploit limit order book dynamics in trading. Broadly speaking, the traditional approach to statistical arbitrage is through attempting to bet on the temporal convergence and divergence of price movements of pairs and baskets of assets, using statistical methods. A more academic definition of statistical arbitrage is to spread the risk among thousands to millions of trades in very short holding time, hoping to gain profit in expectation through the law of large numbers. Following this line, recently, a model based approach has been proposed by Rama Cont and coauthors [1], based on a simple birth-death markov chain model. After the model is calibrated to the order book data, various types of odds can be computed. For example, if a trader could estimate the probability of mid-price uptick movement conditional on the current orderbook status and if the odds are in his/her favor, the trader could submit an order to capitalize the odds. When the trade is carefully executed with a judicious stop-loss, the trader should be able to make profit in expectation. In this project, we adopted a data-driven approach. We first built an ”simulated” exchange order matching engine which allows us to reconstruct the orderbook. Therefore, in theory, we’ve built an exchange system which allows us to not only back-test our trading strategies but also evaluate the price impacts of trading. And we then implemented, calibrated and tested the Rama Cont model on both simulated data and real data. We also implemented, calibrated and tested an extended model. Based on these models and based on the orderbook dynamics, we explored a few high frequency trading strategies. In Section 2, we discuss the ”simulated” exchange order matching engine. In Section 3, we present a few statistical observations and stylized facts about data. In Section 4, we review the Rama Cont model of order book dynamics and extend the Rama Cont model. By