Cluster Analysis as a Decision-Making Tool: A Methodological Review

Cluster analysis has long played an important role in a broad variety of areas, such as psychology, biology, computer sciences. It has established as a precious tool for marketing and business areas, thanks to its capability to help in decision-making processes. Traditionally, clustering approaches concentrate on purely numerical or categorical data only. An important area of cluster analysis deals with mixed data, composed by both numerical and categorical attributes. Clustering mixed data is not simple, because there is a strong gap between the similarity metrics for these two kind of data. In this review we provide some technical details about the kind of distances that could be used with mixed-data types. Finally, we emphasize as in most applications of cluster analysis practitioners focus either on numeric or categorical variables, lessening the effectiveness of the method as a tool of decision-making.

[1]  Danny MacKinnon,et al.  Introduction: Clusters in Urban and Regional Development , 2004 .

[2]  James L. Wescoat,et al.  Cluster analysis of urban water supply and demand: Toward large-scale comparative sustainability planning , 2016 .

[3]  M. Punithavalli,et al.  A Review on Data Clustering Algorithms for Mixed Data , 2010 .

[4]  Zhengxin Chen,et al.  Improving Clustering Analysis for Credit Card Accounts Classification , 2005, International Conference on Computational Science.

[5]  Renée J. Miller,et al.  LIMBO: Scalable Clustering of Categorical Data , 2004, EDBT.

[6]  Ira Assent,et al.  Clicks: An effective algorithm for mining subspace clusters in categorical datasets , 2007, Data Knowl. Eng..

[7]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[8]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[9]  Emmanuel Letier,et al.  Clustering Stakeholders for Requirements Decision Making , 2011, REFSQ.

[10]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[11]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[12]  Lynette A. Hunt,et al.  Clustering mixed data , 2011, WIREs Data Mining Knowl. Discov..

[13]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[14]  Kieran Jay Edwards,et al.  Astronomy and Big Data: A Data Clustering Approach to Identifying Uncertain Galaxy Morphology , 2014 .

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  Woncheol Jang,et al.  Cluster analysis of massive datasets in astronomy , 2007, Stat. Comput..

[17]  Lingling Zhang,et al.  Credit card customer analysis based on panel data clustering , 2010, ICCS.

[18]  Tonio Di Battista,et al.  Clustering functional data on convex function spaces , 2016 .

[19]  Kate Smith-Miles,et al.  Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry , 2001, Intell. Syst. Account. Finance Manag..

[20]  S. Priebe,et al.  Background Assertive outreach teamshave been introduced in the UK, based onthe assertive community treatment (ACT) model.It is unclearhow models of communitycare translate from one culture to anotheror the degree of adaptationthatmayresult , 2003 .

[21]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[22]  David A. Yuen,et al.  Nonlinear multidimensional scaling and visualization of earthquake clusters over space, time and feature space , 2005 .

[23]  Tonio Di Battista,et al.  Heterogeneity Measures in Customer Satisfaction Analysis , 2011, J. Classif..

[24]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[25]  S. Vidhya,et al.  A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods , 2014 .

[26]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[27]  Yi Li,et al.  COOLCAT: an entropy-based algorithm for categorical clustering , 2002, CIKM '02.

[28]  Jaideep Srivastava,et al.  Web usage mining: discovery and applications of usage patterns from Web data , 2000, SKDD.

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  Tomasz F. Stepinski,et al.  On using a clustering approach for global climate classi , 2015 .

[31]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[32]  Eugenio Cesario,et al.  Top-Down Parameter-Free Clustering of High-Dimensional Categorical Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[33]  Kieran Jay Edwards,et al.  Astronomy and Big Data , 2014 .