Symbolic Data Analysis: Definition and Examples

With the advent of computers, large, very large datasets have become routine. What is not so routine is how to analyse these data and/or how to glean useful information from within their massive confines. One approach is to summarize large data sets in such a way that the resulting summary dataset is of a manageable size. One consequence of this is that the data may no longer be formatted as single values such as is the case for classical data, but may be represented by lists, intervals, distributions and the like. These summarized data are examples of symbolic data. This paper looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyse such data. It quickly becomes clear that the range of methodologies available draws analogies with developments prior to 1900 which formed a foundation for the inferential statistics of the 1900’s, methods that are largely limited to small (by comparison) data sets and limited to classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalogue (so to speak) of new symbolic methodologies along with rigorous mathematical foundational work for these methods.

[1]  D. Clayton A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence , 1978 .

[2]  G. Choquet Theory of capacities , 1954 .

[3]  Ahlame Douzal-Chouakria Extension des méthodes d'analyse factorielles à des données de type intervalle , 1998 .

[4]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[5]  Daryl Pregibon,et al.  A Statistical Perspective on Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Francisco de A. T. de Carvalho,et al.  Dependence Rules Influence on Factorial representation of Boolean Symbolic Objects , 1998 .

[7]  M. Ichino General Metrics For Mixed Features The Cartesian Space Theory For Pattern Recognition , 1988, Proceedings of the 1988 IEEE International Conference on Systems, Man, and Cybernetics.

[8]  Edwin Diday,et al.  A Recent Advance in Data Analysis: Clustering Objects into Classes Characterized by Conjunctive Concepts , 1981 .

[9]  Edwin Diday,et al.  Capacities and Credibilities in Analysis of Probabilistic Objects , 1996 .

[10]  Patrice Bertrand Structural Properties of Pyramidal Clustering , 1993, Partitioning Data Sets.

[11]  Edwin Diday Introduction à l'approche symbolique en analyse des données , 1989 .

[12]  Hans-Jürgen Zimmermann,et al.  Fuzzy Data Analysis , 1996 .

[13]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[14]  Edwin Diday,et al.  Orders and overlapping clusters by pyramids , 1987 .

[15]  Yves Lechevallier,et al.  Generation of Symbolic Objects from Relational Databases , 2000 .

[16]  Mireille Gettler Summa Marking and Generalization by Symbolic Objects in the Symbolic Official Data Analysis Software , 2000 .

[17]  Edwin Hewitt,et al.  Real and Abstract Analysis: A Modern Treatment of the Theory of Functions of a Real Variable , 1965 .

[18]  R. Nelsen An Introduction to Copulas , 1998 .

[19]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[20]  E. Diday,et al.  Extension de l'analyse en composantes principales à des données de type intervalle , 1997 .

[21]  Paula Brito Use of Pyramids in Symbolic Data Analysis , 1994 .

[22]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[23]  Francisco de A. T. de Carvalho,et al.  Proximity Coefficients between Boolean symbolic objects , 1994 .

[24]  Mireille Gettler-Summa Marking and Generalization by Symbolic Objects in the Symbolic Official Data Analysis Software , 2000 .

[25]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[26]  Francisco de A. T. de Carvalho,et al.  Extension based proximities between constrained Boolean symbolic objects , 1998 .

[27]  Bill Ravens,et al.  An Introduction to Copulas , 2000, Technometrics.

[28]  Edwin Diday,et al.  Probabilist, possibilist and belief objects for knowledge analysis , 1995, Ann. Oper. Res..

[29]  Donato Malerba,et al.  Flexible Matching for Noisy Structural Descriptions , 1991, IJCAI.

[30]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[31]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[32]  Mireille Gettler-Summa,et al.  Symbolic Approaches for Three-way Data , 2000 .

[33]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[34]  Edwin Diday,et al.  A generalisation of the mixture decomposition problem in the symbolic data analysis framework , 2001 .

[35]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[36]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[37]  Paula Brito Symbolic objects: order structure and pyramidal clustering , 1995, Ann. Oper. Res..

[38]  J. Ferraris,et al.  Knowledge Extraction Using Stochastic Matrices Application to Elaborate Fishing Strategies , 1996 .

[39]  Richard Emilion Clustering and mixtures of stochastic processes , 2001 .

[40]  M. Schader,et al.  New Approaches in Classification and Data Analysis , 1994 .

[41]  E. F. Codd,et al.  Further Normalization of the Data Base Relational Model , 1971, Research Report / RJ / IBM / San Jose, California.

[42]  F. A. T. de Carvalho Histograms in symbolic data analysis , 1995, Ann. Oper. Res..

[43]  Niall M. Adams,et al.  Data Mining for Fun and Profit , 2000 .

[44]  M. J. Frank On the simultaneous associativity of F(x, y) and x+y-F(x, y). (Short Communication). , 1978 .

[45]  Catherine Pardoux,et al.  Analyses des données et modélisation des séries temporelles. Application à la prévision des ventes de périodiques , 1994 .

[46]  Edwin Diday,et al.  Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..

[47]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[48]  P. Nagabhushan,et al.  Dimensionality reduction of symbolic data , 1995, Pattern Recognit. Lett..

[49]  Edwin Diday Knowledge Representation and Symbolic Data Analysis , 1990 .