Forecasting with Interval and Histogram Data. Some Financial Applications

Data sets across many disciplines are becoming consistently large and they bring with them the need of new methods for processing information. We introduce the analysis of interval-valued and histogram-valued data sets as an alternative to classic single-valued data sets and we show the promise of this approach on dealing with economic and …nancial data. Being our current focus the prediction problem we explore two di¤erent venues to produce a forecast with interval time series (ITS) and histogram time series (HTS). For ITS, we adapt classical regression methods and time series strategies for model building and prediction. For ITS and HTS, we implement …ltering techniques, such as smoothing, and nonparametric methods such as the k-NN. We need interval arithmetic in ITS and the concept of a barycentric histogram in HTS to compute the appropiate averages required by smoothing and k-NN. The assessment of the forecast error also requires the introduction of dissimilarity measures like a kernel based distance for ITS and the Wasserstein and Mallows distances for HTS. We apply the proposed methods to predict the daily interval-valued dispersion for the level of SP500 index and the weekly cross-sectional histogram of the returns to the constituents of the SP500 index. Overall, k-NN methods perform very well. Key Words: Interval-valued data, histogram-valued data, interval arithmetic, dissimilarity measures, exponential smoothing, k-NN, Wasserstein distance, Mallows distance. JEL Classi…cation: C22, C53

[1]  M. J. Klass,et al.  On the Estimation of Security Price Volatilities from Historical Data , 1980 .

[2]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[3]  Javier Arroyo,et al.  Different Approaches to Forecast Interval Time Series: A Comparison in Finance , 2011 .

[4]  A T de CarvalhoFrancisco de,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010 .

[5]  Gloria González-Rivera,et al.  Jumps in cross-sectional rank and expected returns: a mixture model , 2008 .

[6]  Alan T. K. Wan,et al.  A High-Low Model of Daily Stock Price Ranges , 2008, SSRN Electronic Journal.

[7]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[8]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[9]  J. Arroyo,et al.  FORECASTING TIME SERIES OF OBSERVED DISTRIBUTIONS WITH SMOOTHING METHODS BASED ON THE BARYCENTRIC HISTOGRAM , 2008 .

[10]  Walter N. Torous,et al.  The Maximum Likelihood Estimation of Security Price Volatility: Theory, Evidence, and Application to Option Pricing , 1984 .

[11]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[12]  Paula Brito Modelling and Analysing Interval Data , 2006, GfKl.

[13]  A T de CarvalhoFrancisco de,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008 .

[14]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[15]  Everette S. Gardner,et al.  Exponential smoothing: The state of the art , 1985 .

[16]  Cecilio Angulo,et al.  Sobre núcleos, distancias y similitudes entre intervalos , 2007, Inteligencia Artif..

[17]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[18]  Carlos Maté,et al.  Electric power demand forecasting using interval time series: A comparison between VAR and iMLP , 2010 .

[19]  M. Parkinson The Extreme Value Method for Estimating the Variance of the Rate of Return , 1980 .

[20]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[21]  D. Yang,et al.  Drift Independent Volatility Estimation Based on High, Low, Open, and Close Prices , 2000 .

[22]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[23]  Antonio Irpino,et al.  Dynamic Clustering of Histogram Data: Using the Right Metric , 2007 .

[24]  C. Manski,et al.  Inference on Regressions with Interval Data on a Regressor or Outcome , 2002 .

[25]  J. Arroyo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[26]  Michael W. Brandt,et al.  Range-Based Estimation of Stochastic Volatility Models , 2001 .

[27]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[28]  R. Baker Kearfott,et al.  Introduction to Interval Analysis , 2009 .

[29]  E. S. Gardner EXPONENTIAL SMOOTHING: THE STATE OF THE ART, PART II , 2006 .

[30]  Ana Colubi,et al.  On a Linear Independence Test for Interval-Valued Random Sets , 2008, SMPS.

[31]  S. Yakowitz NEAREST‐NEIGHBOUR METHODS FOR TIME SERIES ANALYSIS , 1987 .

[32]  A. G. Colombo,et al.  A Powerful Numerical Method to Combine Random Variables , 1980, IEEE Transactions on Reliability.

[33]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[34]  Zenon Kulpa A diagrammatic approach to investigate interval relations , 2006, J. Vis. Lang. Comput..

[35]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[36]  L. Rogers,et al.  Estimating Variance From High, Low and Closing Prices , 1991 .

[37]  A. Zellner,et al.  A Note on Aggregation, Disaggregation and Forecasting Performance , 2000 .

[38]  J. Thigpen,et al.  Does the progression‐free interval after primary chemotherapy predict survival after salvage chemotherapy in advanced and recurrent endometrial cancer? , 2010, Cancer.

[39]  Kin Keung Lai,et al.  Interval Time Series Analysis with an Application to the Sterling-Dollar Exchange Rate , 2008, J. Syst. Sci. Complex..

[40]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .