Probabilistic clustering of interval data

In this paper we address the problem of clustering interval data, adopting a model-based approach. To this purpose, parametric models for interval-valued variables are used which consider configurations for the variance-covariance matrix that take the nature of the interval data directly into account. Results, both on synthetic and empirical data, clearly show the well-founding of the proposed approach. The method succeeds in finding parsimonious heterocedastic models which is a critical feature in many applications. Furthermore, the analysis of the different data sets made clear the need to explicitly consider the intrinsic variability present in interval data.

[1]  Clifford M. Hurvich,et al.  Regression and time series model selection in small samples , 1989 .

[2]  Hans-Hermann Bock,et al.  Visualizing Symbolic Data by Kohonen Maps , 2008 .

[3]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[4]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[5]  Victor H. Lachos,et al.  Robust mixture modeling based on scale mixtures of skew-normal distributions , 2010, Comput. Stat. Data Anal..

[6]  Hani Hamdan,et al.  Self-organizing map based on hausdorff distance for interval-valued data , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  Hans-Hermann Bock CLUSTERING ALGORITHMS AND KOHONEN MAPS FOR SYMBOLIC DATA(Symbolic Data Analysis) , 2003 .

[8]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[9]  Paula Brito Use of Pyramids in Symbolic Data Analysis , 1994 .

[10]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[11]  Peter Filzmoser,et al.  Robust fitting of mixtures using the trimmed likelihood estimator , 2007, Comput. Stat. Data Anal..

[12]  P. Deb Finite Mixture Models , 2008 .

[13]  Joffray Baune,et al.  Clustering and Validation of Interval Data, Selected contributions in Data Analysis and Classification, P. Brito, P. Bertrand, G. Cucumel, F. DE Carvalho (Eds), Springer, 69-82 , 2007 .

[14]  Francisco de A. T. de Carvalho,et al.  Adaptive Batch SOM for Multiple Dissimilarity Data Tables , 2011, 2011 IEEE 23rd International Conference on Tools with Artificial Intelligence.

[15]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[16]  Yves Lechevallier,et al.  New clustering methods for interval data , 2006, Comput. Stat..

[17]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .

[18]  Mohamed A. Ismail,et al.  Fuzzy clustering for symbolic data , 1998, IEEE Trans. Fuzzy Syst..

[19]  Francisco de A. T. de Carvalho,et al.  Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances , 2010, Fuzzy Sets Syst..

[20]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[21]  A T de CarvalhoFrancisco de,et al.  Fuzzy K-means clustering algorithms for interval-valued data based on adaptive quadratic distances , 2010 .

[22]  Tsung I. Lin,et al.  Robust mixture modeling using multivariate skew t distributions , 2010, Stat. Comput..

[23]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[24]  Paula Brito Symbolic objects: order structure and pyramidal clustering , 1995, Ann. Oper. Res..

[25]  Paul D. McNicholas,et al.  Parsimonious skew mixture models for model-based clustering and classification , 2013, Comput. Stat. Data Anal..

[26]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[27]  Yves Lechevallier,et al.  Partitioning hard clustering algorithms based on multiple dissimilarity matrices , 2012, Pattern Recognit..

[28]  Hans-Hermann Bock,et al.  Dynamic clustering for interval data based on L2 distance , 2006, Comput. Stat..

[29]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[30]  José G. Dias,et al.  Latent Class Analysis and Model Selection , 2005, GfKl.

[31]  Chih-Cheng Tseng,et al.  Robust Interval Competitive Agglomeration Clustering Algorithm with Outliers , 2010 .

[32]  Yves Lechevallier,et al.  Partitional clustering algorithms for symbolic interval data based on single adaptive distances , 2009, Pattern Recognit..

[33]  André Hardy,et al.  Clustering and Validation of Interval Data , 2007 .

[34]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[35]  Rosanna Verde,et al.  Clustering Methods in Symbolic Data Analysis , 2004 .

[36]  Belgium H. H. Bock Analyzing Symbolic Data: Problems, Methods, and Perspectives , 2009 .

[37]  Hans-Hermann Bock,et al.  Probabilistic Modeling for Symbolic Data , 2008 .

[38]  Miin-Shen Yang,et al.  Self-organizing map for symbolic data , 2012, Fuzzy Sets Syst..

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .