Principal component analysis for histogram-valued data

This paper introduces a principal component methodology for analysing histogram-valued data under the symbolic data domain. Currently, no comparable method exists for this type of data. The proposed method uses a symbolic covariance matrix to determine the principal component space. The resulting observations on principal component space are presented as polytopes for visualization. Numerical representation of the resulting polytopes via histogram-valued output is also presented. The necessary algorithms are included. The technique is illustrated on a weather data set.

[1]  A. Irpino,et al.  Visualizing symbolic data by closed shapes. , 2003 .

[2]  Antonio Irpino,et al.  Principal Component Analysis of Symbolic Data Described by Intervals , 2008 .

[3]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[4]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[5]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[6]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[7]  J. Le-Rademacher,et al.  Principal component histograms from interval-valued observations , 2013, Comput. Stat..

[8]  Francesco Palumbo,et al.  Principal component analysis of interval data: a symbolic data analysis approach , 2000, Comput. Stat..

[9]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[10]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  Edwin Diday,et al.  Principal component analysis for interval‐valued observations , 2011, Stat. Anal. Data Min..

[13]  Arnold F. Shapiro,et al.  Fuzzy random variables , 2009 .

[14]  L. Billard,et al.  Symbolic Covariance Principal Component Analysis and Visualization for Interval-Valued Data , 2012 .

[15]  T. W. Anderson An Introduction to Multivariate Statistical Analysis , 1959 .

[16]  Lynne Billard,et al.  Symbolic Principal Components for Interval-Valued Data , 2011, HDSDA.

[17]  Pierre Cazes Analyse factorielle d'un tableau de lois de probabilité , 2002 .

[18]  Edwin Diday,et al.  Adaptation of interval PCA to symbolic histogram variables , 2012, Adv. Data Anal. Classif..

[19]  L. Zadeh Probability measures of Fuzzy events , 1968 .

[20]  Manabu Ichino The quantile method for symbolic principal component analysis , 2011, Stat. Anal. Data Min..

[21]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[22]  Lynne Billard Brief overview of symbolic data and analytic issues , 2011, Stat. Anal. Data Min..

[23]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[24]  L. Billard,et al.  Likelihood functions and some maximum likelihood estimators for symbolic data , 2008 .

[25]  F. Palumbo,et al.  A PCA for interval-valued data based on midpoints and radii , 2003 .