论文信息 - Clustering techniques

Clustering techniques

Given a population of individuals described by a set of attribute variables, clustering them into “similar” groups has many applications. The clustering problem, also known as unsupervised learning, is the problem of partitioning a population into clusters (or classes). The population is a set of n elements that can be clients, products, shops, agencies, etc., described by m attributes. These attributes can be quantitative (salary), categorical (type of profession) or binary (owner of a credit card). The goal is to construct a partition in which elements of a cluster are “similar” and elements of different clusters are “dissimilar” in terms of the m attributes. Here we define the clustering problem and discuss the ideas behind some of the major approaches, including a relatively new method, called RDA/AREVOMS, that is based on the theory of voting.

Pierre Michaud | P. Michaud

[1] C. S. Wallace,et al. A General Selection Criterion for Inductive Inference , 1984, ECAI.

[2] J. H. Ward. Hierarchical Grouping to Optimize an Objective Function , 1963 .

[3] P. Michaud,et al. Condorcet — a man of the avant‐garde , 1987 .

[4] K. Arrow,et al. Social Choice and Individual Values , 1951 .

[5] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[6] Peter C. Cheeseman,et al. Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[8] A. Raftery,et al. Model-based Gaussian and non-Gaussian clustering , 1993 .

[9] Pierre Michaud. Variational Data Analysis Versus Classical Data Analysis , 1995 .

[10] David L. Dowe,et al. Intrinsic classification by MML - the Snob program , 1994 .

[11] Teuvo Kohonen,et al. Self-Organizing Maps , 2010 .