Far beyond the classical data models: symbolic data analysis

This paper introduces symbolic data analysis, explaining how it extends the classical data models to take into account more complete and complex information. Several examples motivate the approach, before the modeling of variables assuming new types of realizations are formally presented. Some methods for the (multivariate) analysis of symbolic data are presented and discussed. This is however far from being exhaustive, given the present dynamic development of this new field of research. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 157–170, 2011 © 2011 Wiley Periodicals, Inc.

[1]  R.M.C.R. de Souza,et al.  Dynamic clustering of interval data based on adaptive Chebyshev distances , 2004 .

[2]  Etienne Cuvelier QAMML: Probability Distributions For Functional Data , 2009 .

[3]  Yves Lechevallier,et al.  New clustering methods for interval data , 2006, Comput. Stat..

[4]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[5]  Monique Noirhomme-Fraiture,et al.  Parametric Families of Probability Distributions for Functional Data Using Quasi-Arithmetic Means with Archimedean Generators , 2008 .

[6]  Antonio Irpino,et al.  Generalized Canonical Analysis , 2008 .

[7]  Jean-Paul Rasson,et al.  Unsupervised Divisive Classification , 2008 .

[8]  S. J. Simoff Handling uncertainty in neural networks: an interval approach , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[9]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[10]  Fabrice Rossi,et al.  Multi-layer Perceptron on Interval Data ? , 2002 .

[11]  Carlos Maté,et al.  Electric power demand forecasting using interval time series: A comparison between VAR and iMLP , 2010 .

[12]  Paolo Giordani,et al.  A comparison of three methods for principal component analysis of fuzzy interval data , 2006, Comput. Stat. Data Anal..

[13]  Jean-Paul Rasson,et al.  Symbolic Kernel Discriminant Analysis , 2000 .

[14]  Edwin Diday Introduction à l'approche symbolique en analyse des données , 1989 .

[15]  Paula Brito,et al.  Symbolic Clustering of Constrained Probabilistic Data , 2003 .

[16]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[17]  E. Diday,et al.  Extension de l'analyse en composantes principales à des données de type intervalle , 1997 .

[18]  Paula Brito Use of Pyramids in Symbolic Data Analysis , 1994 .

[19]  Donato Malerba,et al.  Dissimilarity and Matching , 2008 .

[20]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[21]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[22]  Javier Arroyo,et al.  Forecasting with Interval and Histogram Data. Some Financial Applications , 2011 .

[23]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[24]  Francisco de A. T. de Carvalho,et al.  Proximity Coefficients between Boolean symbolic objects , 1994 .

[25]  Edwin Diday,et al.  Generalization of the Principal Components Analysis to Histogram Data , 2000 .

[26]  Thanh-Nghi Do,et al.  Kernel Methods and Visualization for Interval Data Mining , 2005 .

[27]  F. Plastria,et al.  Classification problems with imprecise data through separating hyperplanes , 2007 .

[28]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[29]  Francisco de A. T. de Carvalho,et al.  Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances , 2010, Fuzzy Sets Syst..

[30]  André Hardy,et al.  Une nouvelle méthode de classification pour des données intervalles , 2009 .

[31]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[32]  Otto Opitz,et al.  Exploratory Data Analysis in Empirical Research , 2002 .

[33]  Chenyi Hu,et al.  On interval weighted three-layer neural networks , 1998, Proceedings 31st Annual Simulation Symposium.

[34]  Antonio Irpino,et al.  Factor Discriminant Analysis , 2008 .

[35]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[36]  R. Vignes Caracterisation automatique de groupes biologiques , 1991 .

[37]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[38]  André Hardy,et al.  Clustering and Validation of Interval Data , 2007 .

[39]  Mathieu Vrac,et al.  Mixture decomposition of distributions by copulas in the symbolic data analysis framework , 2005, Discret. Appl. Math..

[40]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[41]  Jirí Síma,et al.  Neural expert systems , 1995, Neural Networks.

[42]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[43]  Edwin Diday,et al.  Symbolic Data Analysis: A Mathematical Framework and Tool for Data Mining , 1999, Electron. Notes Discret. Math..

[44]  Hans-Hermann Bock,et al.  Visualizing Symbolic Data by Kohonen Maps , 2008 .

[45]  Paula Brito Symbolic objects: order structure and pyramidal clustering , 1995, Ann. Oper. Res..

[46]  Yves Lechevallier,et al.  Clustering constrained symbolic data , 2009, Pattern Recognit. Lett..

[47]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[48]  Francisco de A. T. de Carvalho,et al.  Unsupervised pattern recognition models for mixed feature-type symbolic data , 2010, Pattern Recognit. Lett..

[49]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[50]  L. Billard,et al.  Likelihood functions and some maximum likelihood estimators for symbolic data , 2008 .

[51]  Javier Arroyo Gallardo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[52]  Peter Walley,et al.  Towards a unified theory of imprecise probability , 2000, Int. J. Approx. Reason..

[53]  Edwin Diday,et al.  An introduction to symbolic data analysis and the SODAS software , 2003, Intell. Data Anal..

[54]  Francisco de A. T. de Carvalho,et al.  Hierarchical and Pyramidal Clustering , 2008 .

[55]  Francesco Palumbo,et al.  Principal Component Analysis for Non-Precise Data , 2005 .

[56]  Monique Noirhomme-Fraiture,et al.  Asymptotic Behaviour in Symbolic Markov Chains , 2010 .

[57]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[58]  Yves Lechevallier,et al.  Adaptative Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data , 2017 .

[59]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[60]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[61]  Hans-Hermann Bock,et al.  Probabilistic Modeling for Symbolic Data , 2008 .

[62]  Francisco de A. T. de Carvalho,et al.  Two Partitional Methods for Interval-Valued Data Using Mahalanobis Distances , 2004, IBERAMIA.

[63]  Paula Brito,et al.  On the Analysis of Symbolic Data , 2007 .

[64]  Herman Stekler,et al.  Measuring consensus in binary forecasts: NFL game predictions , 2009 .

[65]  D. Dubois,et al.  Properties of measures of information in evidence and possibility theories , 1987 .

[66]  Edwin Diday,et al.  Analyse de données symboliques et graphe de connaissances d'un agent , 2005, EGC.

[67]  Edwin Diday,et al.  Growing a tree classifier with imprecise data , 2000, Pattern Recognit. Lett..

[68]  G. Polaillon Interpretation and Reduction of Galois Lattices of Complex Data , 1998 .

[69]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[70]  Géraldine Polaillon Organisation et interprétation par les treillis de Galois de données de type multivalué, intervalle ou histogramme , 1998 .

[71]  Javier Arroyo,et al.  Time series modeling of histogram-valued data: The daily histogram time series of S&P500 intradaily returns , 2012 .

[72]  P. Brito,et al.  Structuring probabilistic data by Galois lattices , 2005 .

[73]  Lynne Billard,et al.  Dependencies and Variation Components of Symbolic Interval-Valued Data , 2007 .

[74]  Vincent Duquenne,et al.  Familles minimales d'implications informatives résultant d'un tableau de données binaires , 1986 .

[75]  Edwin Diday,et al.  Descriptive statistics for interval-valued observations in the presence of rules , 2006, Comput. Stat..

[76]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[77]  Emilio Brito HEGEL ET LE SAINT-ESPRIT , 2010 .

[78]  Claus Weihs,et al.  Classification as a Tool for Research , 2010 .

[79]  Paula Brito,et al.  Linear discriminant analysis for interval data , 2006, Comput. Stat..

[80]  Paula Brito Symbolic Clustering Of Probabilistic Data , 1998 .

[81]  Lynne Billard Brief overview of symbolic data and analytic issues , 2011, Stat. Anal. Data Min..

[82]  Marc Csernel,et al.  Usual operations with symbolic data under normal symbolic form , 1999 .

[83]  Javier Arroyo Gallardo Métodos de predicción para series temporales de intervalos e histogramas , 2008 .

[84]  Marie Chavent,et al.  Normalized k-means clustering of hyper-rectangles , 2005 .

[85]  Lynne Billard Dependencies in Bivariate Interval-Valued Symbolic Data , 2004 .

[86]  Kin Keung Lai,et al.  Interval Time Series Analysis with an Application to the Sterling-Dollar Exchange Rate , 2008, J. Syst. Sci. Complex..

[87]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[88]  G. Choquet Theory of capacities , 1954 .

[89]  Francisco de A. T. de Carvalho,et al.  Clustering of Interval-Valued Data Using Adaptive Squared Euclidean Distances , 2004, ICONIP.