Time series modeling of histogram-valued data: The daily histogram time series of S&P500 intradaily returns

Histogram time series (HTS) and interval time series (ITS) are examples of symbolic data sets. Though there have been methodological developments in a cross-sectional environment, they have been scarce in a time series setting. Arroyo, Gonzalez-Rivera, and Mate (2011) analyze various forecasting methods for HTS and ITS, adapting smoothing filters and nonparametric algorithms such as the k-NN. Though these methods are very flexible, they may not be the true underlying data generating process (DGP). We present the first step in the search for a DGP by focusing on the autocorrelation functions (ACFs) of HTS and ITS. We analyze the ACF of the daily histogram of 5-minute intradaily returns to the S&P500 index in 2007 and 2008. There are clusters of high/low activity that generate a strong, positive, and persistent autocorrelation, pointing towards some autoregressive process for HTS. Though smoothing and k-NN may not be the true DGPs, we find that they are very good approximations because they are able to capture almost all of the original autocorrelation. However, there seems to be some structure left in the data that will require new modelling techniques. As a byproduct, we also analyze the [90,100%] quantile interval. By using all of the information contained in the histogram, we find that there are advantages in the estimation and prediction of a specific interval.

[1]  Javier Arroyo,et al.  Forecasting with Interval and Histogram Data. Some Financial Applications , 2011 .

[2]  Kin Keung Lai,et al.  Interval Time Series Analysis with an Application to the Sterling-Dollar Exchange Rate , 2008, J. Syst. Sci. Complex..

[3]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[4]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[5]  Antonio Irpino,et al.  Comparing Histogram Data Using a Mahalanobis–Wasserstein Distance , 2008 .

[6]  Aman Ullah,et al.  Handbook of empirical economics and finance , 2010 .

[7]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[8]  E. Ziegel,et al.  Proceedings in Computational Statistics , 1998 .

[9]  J. Arroyo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[10]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[11]  Cecilio Angulo,et al.  Sobre núcleos, distancias y similitudes entre intervalos , 2007, Inteligencia Artif..

[12]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[13]  E. Ziegel COMPSTAT: Proceedings in Computational Statistics , 1988 .

[14]  William Gould,et al.  Rangefinder Box Plots: A Note , 1987 .

[15]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .