Association measures for interval variables

Symbolic Data Analysis (SDA) is a relatively new field of statistics that extends conventional data analysis by taking into account intrinsic data variability and structure. Unlike conventional data analysis, in SDA the features characterizing the data can be multi-valued, such as intervals or histograms. SDA has been mainly approached from a sampling perspective. In this work, we propose a model that links the micro-data and macro-data of interval-valued symbolic variables, which takes a populational perspective. Using this model, we derive the micro-data assumptions underlying the various definitions of symbolic covariance matrices proposed in the literature, and show that these assumptions can be too restrictive, raising applicability concerns. We analyze the various definitions using worked examples and four datasets. Our results show that the existence/absence of correlations in the macro-data may not be correctly captured by the definitions of symbolic covariance matrices and that, in real data, there can be a strong divergence between these definitions. Thus, in order to select the most appropriate definition, one must have some knowledge about the micro-data structure.

[1]  M. R. Oliveira,et al.  Extracting Information from Interval Data Using Symbolic Principal Component Analysis , 2017 .

[2]  Paula Brito,et al.  Discriminant Analysis of Interval Data: An Assessment of Parametric and Distance-Based Approaches , 2015, J. Classif..

[3]  L. Billard,et al.  Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data , 2012 .

[4]  G. Cordeiro,et al.  Bivariate symbolic regression models for interval-valued variables , 2011 .

[5]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[6]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[7]  Mika Sato-Ilic,et al.  Symbolic Clustering with Interval-Valued Data , 2011, Complex Adaptive Systems.

[8]  Theodore W. Anderson,et al.  Anderson-Darling Tests of Goodness-of-Fit , 2011, International Encyclopedia of Statistical Science.

[9]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[10]  Xin Zhang,et al.  Constructing likelihood functions for interval‐valued random variables , 2016, Scandinavian Journal of Statistics.

[11]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[12]  Peter Filzmoser,et al.  Outlier detection in interval data , 2018, Adv. Data Anal. Classif..

[13]  Paula Brito,et al.  Off the beaten track: A new linear model for interval data , 2017, Eur. J. Oper. Res..

[14]  Marcus C. Araújo,et al.  Kernelized inner product-based discriminant analysis for interval data , 2017, Pattern Analysis and Applications.

[15]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[16]  Paula Brito,et al.  Modeling Interval Time Series with Space–Time Processes , 2015 .

[17]  L. Billard,et al.  Likelihood functions and some maximum likelihood estimators for symbolic data , 2008 .

[18]  Junjie Wu,et al.  CIPCA: Complete-Information-based Principal Component Analysis for interval-valued data , 2012, Neurocomputing.

[19]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[20]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[21]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[22]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[23]  Paulo Salvador,et al.  Customer-side detection of Internet-scale traffic redirection , 2014, 2014 16th International Telecommunications Network Strategy and Planning Symposium (Networks).

[24]  E. Diday,et al.  Extension de l'analyse en composantes principales à des données de type intervalle , 1997 .

[25]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[26]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[27]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[28]  Paula Brito,et al.  Symbolic Data Analysis: another look at the interaction of Data Mining and Statistics , 2014, WIREs Data Mining Knowl. Discov..