Clustering techniques

Given a population of individuals described by a set of attribute variables, clustering them into “similar” groups has many applications. The clustering problem, also known as unsupervised learning, is the problem of partitioning a population into clusters (or classes). The population is a set of n elements that can be clients, products, shops, agencies, etc., described by m attributes. These attributes can be quantitative (salary), categorical (type of profession) or binary (owner of a credit card). The goal is to construct a partition in which elements of a cluster are “similar” and elements of different clusters are “dissimilar” in terms of the m attributes. Here we define the clustering problem and discuss the ideas behind some of the major approaches, including a relatively new method, called RDA/AREVOMS, that is based on the theory of voting.