An improved algorithm for cleaning Ultra High-Frequency data

We develop a multiple-stage algorithm for detecting outliers in Ultra High-Frequency financial market data. We show that an efficient data filter needs to address four effects: the minimum tick size, the price level, the volatility of prices and the distribution of returns. We argue that previous studies tend to address only the distribution of returns, and may tend to ‘overscrub’ a data set. In this study, we address these issues in the market microstructure element of the algorithm. In the statistical element, we implement the robust median absolute deviation method to take into account the statistical properties of financial time series. The data filter is then tested against previous data-cleaning techniques and validated using a rich individual equity options transactions data set from the London International Financial Futures and Options Exchange. The paper has many relevant insights for any practitioner who uses high frequency derivatives data, for example, for market analysis or for developing trading strategies.

[1]  C. Sutcliffe,et al.  High-frequency financial market data : sources, applications and market microstructure , 1999 .

[2]  R. Gencay,et al.  An Introduc-tion to High-Frequency Finance , 2001 .

[3]  J. Fox A mathematical primer for social statistics , 2008 .

[4]  Christian Jost,et al.  Heterogeneous real-time trading strategies in the foreign exchange market , 1995 .

[5]  Kee H. Chung,et al.  Order Preferencing and Market Quality on NASDAQ Before and after Decimalization , 2004 .

[6]  Joseph M. Hellerstein,et al.  Quantitative Data Cleaning for Large Databases , 2008 .

[7]  Eric Terry,et al.  The Effect of Tick Size on Price Clustering and Trading Volume , 1998 .

[8]  O. Gwilym,et al.  The bid‐ask spread on stock index options: An ordered probit analysis , 1998 .

[9]  Vic Barnett,et al.  Outliers in Statistical Data , 1980 .

[10]  C. Goodhart,et al.  High frequency data in financial markets: Issues and applications , 1997 .

[11]  Avanidhar Subrahmanyam,et al.  Market Liquidity and Trading Activity , 2000 .

[12]  Kee H. Chung,et al.  Trading Costs and Quote Clustering on the NYSE and NASDAQ after Decimalization , 2004 .

[13]  Stefan Van Aelst,et al.  Theory and applications of recent robust methods , 2004 .

[14]  Jun Hong,et al.  Flexible and Efficient Information Handling, 23rd British National Conference on Databases, BNCOD 23, Belfast, Northern Ireland, UK, July 18-20, 2006, Proceedings , 2006, BNCOD.

[15]  Hendrik Bessembinder The Degree of Price Resolution and Equity Trading Costs , 1996 .

[16]  Ruppa K. Thulasiram,et al.  An Efficient System for Detecting Outliers from Financial Time Series , 2006, BNCOD.

[17]  Christian T. Brownlees,et al.  Financial Econometric Analysis at Ultra-High Frequency: Data Handling Concerns , 2006, Comput. Stat. Data Anal..

[18]  Did NASDAQ Market Makers Successfully Collude to Increase Spreads? A Reexamination of Evidence from Stocks that Moved from NASDAQ to the New York or American Stock Exchanges , 2007 .

[19]  Robert A. Van Ness,et al.  Do investors prefer even-eighth prices? Evidence from NYSE limit orders , 2003 .

[20]  Tim Bollerslev,et al.  Bid—ask spreads and volatility in the foreign exchange market: An empirical analysis , 1994 .

[21]  Robert F. Engle,et al.  The Econometrics of Ultra-High Frequency Data , 1996 .

[22]  Kee H. Chung,et al.  Spreads, Depths, and Quote Clustering on the NYSE and NASDAQ: Evidence after the 1997 Securities and Exchange Commission Rule Changes , 2002 .

[23]  H. Stoll,et al.  Dealer versus auction markets: A paired comparison of execution costs on NASDAQ and the NYSE , 1996 .

[24]  Ehud I. Ronn,et al.  A Characterization of the Daily and Intraday Behavior of Returns on Options , 1994 .

[25]  R. Engle Analysis of High Frequency Financial Data , 2004 .

[26]  Charles Sutcliffe,et al.  Problems encountered when using high frequency financial market data: suggested solutions. , 2001 .