Cluster Analysis: An Application to a Real Mixed-Type Data Set

When you dispose of multivariate data it is crucial to summarize them, so as to extract appropriate and useful information, and consequently, to make proper decisions accordingly. Cluster analysis fully meets this requirement; it groups data into meaningful groups such that both the similarity within a cluster and the dissimilarity between groups are maximized. Thanks to its great usefulness, clustering is used in a broad variety of contexts; this explains its huge appeal in many disciplines. Most of the existing clustering approaches are limited to numerical or categorical data only. However, since data sets composed of mixed types of attributes are very common in real life applications, it is absolutely worth to perform clustering on them. In this paper therefore we stress the importance of this approach, by implementing an application on a real world mixed-type data set.

[1]  S. A. Gattone,et al.  Non‐parametric tests and confidence regions for intrinsic diversity profiles of ecological populations , 2003 .

[2]  Michael G Kenward,et al.  The use of baseline covariates in crossover studies. , 2010, Biostatistics.

[3]  Francesca Fortuna,et al.  K-means clustering of item characteristic curves and item information curves via functional principal component analysis , 2018, Quality & Quantity.

[4]  Zhengxin Chen,et al.  Improving Clustering Analysis for Credit Card Accounts Classification , 2005, International Conference on Computational Science.

[5]  Tonio Di Battista,et al.  Cluster Analysis as a Decision-Making Tool: A Methodological Review , 2017, Decision Economics@DCAI.

[6]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[7]  Lingling Zhang,et al.  Credit card customer analysis based on panel data clustering , 2010, ICCS.

[8]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[9]  T. Battista Diversity index estimation by adaptive sampling , 2002 .

[10]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[11]  I. Dryden,et al.  Surface shape analysis with an application to brain surface asymmetry in schizophrenia. , 2010, Biostatistics.

[12]  Fabrizio Maturo,et al.  Unsupervised classification of ecological communities ranked according to their biodiversity patterns via a functional principal component decomposition of Hill’s numbers integral functions , 2018, Ecological Indicators.

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  Angela De Sanctis,et al.  A shape distance based on the Fisher-Rao metric and its application for shapes clustering , 2017 .

[15]  Tonio Di Battista,et al.  Heterogeneity Measures in Customer Satisfaction Analysis , 2011, J. Classif..

[16]  Multivariate bootstrap confidence regions for abundance vector using , 2004, Environmental and Ecological Statistics.

[17]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .