Probabilistic Modeling for Symbolic Data

Symbolic data refer to variables whose ‘values’ might be, e.g., intervals, sets of categories, or even frequency distributions. Symbolic data analysis provides exploratory methods for revealing the structure of such data and proceeds typically by heuristical, even if suggestive methods that generalize criteria and algorithms from classical multivariate statistics. In contrast, this paper proposes to base the analysis of symbolic data on probability models as well and to derive statistical tools by standard methods (such as maximum likelihood). This approach is exemplified for the case of multivariate interval data where we consider minimum volume hypercubes, average intervals, clustering and regression models, also with reference to previous work.

[1]  G. Matheron Random Sets and Integral Geometry , 1976 .

[2]  Hans-Hermann Bock CLUSTERING ALGORITHMS AND KOHONEN MAPS FOR SYMBOLIC DATA(Symbolic Data Analysis) , 2003 .

[3]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[4]  Approximation of Distributions by Bounded Sets , 2007 .

[5]  G. De Soete,et al.  Clustering and Classification , 2019, Data-Driven Science and Engineering.

[6]  Hans-Hermann Bock,et al.  Classification and Related Methods of Data Analysis , 1988 .

[7]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[8]  Ilya Molchanov,et al.  On the expected measure of a random set , 1997 .

[9]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[10]  Manuel Montenegro,et al.  Regression and correlation analyses of a linear relation between random intervals , 2001 .

[11]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[12]  Ulrich Furbach KI 2005: Advances in Artificial Intelligence , 2005 .

[13]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[14]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[15]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[16]  Martin Schader,et al.  Data Analysis and Decision Support , 2006 .

[17]  Hans-Hermann Bock Optimization in Symbolic Data Analysis: Dissimilarities, Class Centers, and Clustering , 2005, Data Analysis and Decision Support.

[18]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[19]  Günther Palm,et al.  KI 2004: Advances in Artificial Intelligence , 2004, Lecture Notes in Computer Science.

[20]  P. Groenen,et al.  Data analysis, classification, and related methods , 2000 .

[21]  Hans-Hermann Bock,et al.  Classification, Clustering, and Data Analysis , 2002 .

[22]  Yves Lechevallier,et al.  Dynamical Clustering of Interval Data: Optimization of an Adequacy Criterion Based on Hausdorff Distance , 2002 .

[23]  Robert D. Nowak,et al.  Learning Minimum Volume Sets , 2005, J. Mach. Learn. Res..

[24]  Marie Chavent,et al.  A Hausdorff Distance Between Hyper-Rectangles for Clustering Interval Data , 2004 .

[25]  Lynne Billard Dependencies in Bivariate Interval-Valued Symbolic Data , 2004 .

[26]  Kwang-Hyun Cho,et al.  Level sets and minimum volume sets of probability density functions , 2003, Int. J. Approx. Reason..

[27]  Ilya Molchanov,et al.  Statistical Problems for Random Sets , 1997 .

[28]  H. Bock Probabilistic models in cluster analysis , 1996 .

[29]  N. L. Johnson,et al.  Continuous Univariate Distributions. , 1995 .

[30]  Hans-Hermann Bock,et al.  PROBABILITY MODELS AND HYPOTHESES TESTING IN PARTITIONING CLUSTER ANALYSIS , 1996 .

[31]  Belgium H. H. Bock Analyzing Symbolic Data: Problems, Methods, and Perspectives , 2009 .

[32]  Francisco de A. T. de Carvalho,et al.  Applying Constrained Linear Regression Models to Predict Interval-Valued Data , 2005, KI.

[33]  Hung T. Nguyen,et al.  Random sets : theory and applications , 1997 .

[34]  Ana Colubi,et al.  Testing linear independence in linear models with interval-valued data , 2007, Comput. Stat. Data Anal..

[35]  Ilya S. Molchanov,et al.  Averaging of Random Sets Based on Their Distance Functions , 2004, Journal of Mathematical Imaging and Vision.

[36]  Rudolf Kruse,et al.  On the variance of random sets , 1987 .

[37]  Francisco de A. T. de Carvalho,et al.  A New Method to Fit a Linear Regression Model for Interval-Valued Data , 2004, KI.

[38]  Hans-Hermann Bock 6. Symbolic Data Analysis , 2003 .

[39]  Hans-Hermann Bock,et al.  Probabilistic Models in Partitional Cluster Analysis , 2003 .

[40]  Wolfgang Gaul,et al.  "Classification, Clustering, and Data Mining Applications" , 2004 .