Principal component histograms from interval-valued observations

The focus of this paper is to propose an approach to construct histogram values for the principal components of interval-valued observations. Le-Rademacher and Billard (J Comput Graph Stat 21:413–432, 2012) show that for a principal component analysis on interval-valued observations, the resulting observations in principal component space are polytopes formed by the convex hulls of linearly transformed vertices of the observed hyper-rectangles. In this paper, we propose an algorithm to translate these polytopes into histogram-valued data to provide numerical values for the principal components to be used as input in further analysis. Other existing methods of principal component analysis for interval-valued data construct the principal components, themselves, as intervals which implicitly assume that all values within an observation are uniformly distributed along the principal components axes. However, this assumption is only true in special cases where the variables in the dataset are mutually uncorrelated. Representation of the principal components as histogram values proposed herein more accurately reflects the variation in the internal structure of the observations in a principal component space. As a consequence, subsequent analyses using histogram-valued principal components as input result in improved accuracy.

[1]  Pierre Cazes Analyse factorielle d'un tableau de lois de probabilité , 2002 .

[2]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[3]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[4]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[5]  Edwin Diday,et al.  Principal component analysis for interval‐valued observations , 2011, Stat. Anal. Data Min..

[6]  A. Irpino,et al.  Visualizing symbolic data by closed shapes. , 2003 .

[7]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[8]  I. Jolliffe Principal Component Analysis , 2002 .

[9]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[10]  Ahlame Douzal-Chouakria Extension des méthodes d'analyse factorielles à des données de type intervalle , 1998 .

[11]  Francesco Palumbo,et al.  Principal component analysis of interval data: a symbolic data analysis approach , 2000, Comput. Stat..

[12]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[13]  Martin Schader,et al.  Between Data Science and Applied Data Analysis , 2003 .

[14]  E. Diday,et al.  Extension de l'analyse en composantes principales à des données de type intervalle , 1997 .

[15]  Allan P. Donsig,et al.  Real Analysis with Real Applications , 2001 .

[16]  Paolo Giordani,et al.  Principal Component Analysis of symmetric fuzzy data , 2004, Comput. Stat. Data Anal..

[17]  Jacqueline J. Meulman,et al.  New Developments in Psychometrics. , 2003 .

[18]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[19]  Antonio Irpino,et al.  Principal Component Analysis of Symbolic Data Described by Intervals , 2008 .

[20]  Manabu Ichino The quantile method for symbolic principal component analysis , 2011, Stat. Anal. Data Min..

[21]  L. Billard,et al.  Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data , 2012 .

[22]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[23]  E. Diday,et al.  Approche géométrique et classification pour la reconnaissance de visage , 1996 .

[24]  F. Palumbo,et al.  A PCA for interval-valued data based on midpoints and radii , 2003 .

[25]  Carlo Lauro,et al.  Principal component analysis on interval data , 2006, Comput. Stat..

[26]  P. Giordani,et al.  Component Models for Fuzzy Data , 2006 .