Constrained classification: The use of a priori information in cluster analysis

In many classification problems, one often possesses external and/or internal information concerning the objects or units to be analyzed which makes it appropriate to impose constraints on the set of allowable classifications and their characteristics. CONCLUS, or CONstrained CLUStering, is a new methodology devised to perform constrained classification in either an overlapping or nonoverlapping (hierarchical or nonhierarchial) manner. This paper initially reviews the related classification literature. A discussion of the use of constraints in clustering problems is then presented. The CONCLUS model and algorithm are described in detail, as well as their flexibility for use in various applications. Monte Carlo results are presented for two synthetic data sets with appropriate discussion of the resulting implications. An illustration of CONCLUS is presented with respect to a sales territory design problem where the objects classified are various Forbes-500 companies. Finally, the discussion section highlights the main contribution of the paper and offers some areas for future research.

[1]  W. DeSarbo Gennclus: New models for general nonhierarchical clustering analysis , 1982 .

[2]  C. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[3]  Joseph A. Lukes Combinatiorial Solution to the Partitioning of General Graphs , 1975, IBM J. Res. Dev..

[4]  Brian W. Kernighan,et al.  An Effective Heuristic Algorithm for the Traveling-Salesman Problem , 1973, Oper. Res..

[5]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[6]  P. Arabie,et al.  Overlapping Clustering: A New Method for Product Positioning , 1981 .

[7]  Yadolah Dodge,et al.  Mathematical Programming In Statistics , 1981 .

[8]  Peter T. Fitzroy Analytical methods for marketing management , 1976 .

[9]  Theodore David Klastorin A clustering approach to systems design , 1973 .

[10]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[11]  Joseph L. Zinnes,et al.  Theory and Methods of Scaling. , 1958 .

[12]  A. Ferligoj,et al.  Clustering with relational constraint , 1982 .

[13]  R. M. Cormack,et al.  A Review of Classification , 1971 .

[14]  Peter J. Taylor,et al.  Some implications of the spatial organization of elections , 1973 .

[15]  Vijay Mahajan,et al.  An Approach to Normative Segmentation , 1978 .

[16]  David Mautner Himmelblau,et al.  Applied Nonlinear Programming , 1972 .

[17]  C. M. Reeves,et al.  Function minimization by conjugate gradients , 1964, Comput. J..

[18]  Robert E. Jensen,et al.  A Dynamic Programming Algorithm for Cluster Analysis , 1969, Oper. Res..

[19]  Vladimir Batagelj,et al.  Some types of clustering with relational constraints , 1983 .

[20]  Philip E. Gill,et al.  Practical optimization , 1981 .

[21]  Stan Openshaw,et al.  A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling , 1977 .

[22]  T. Klastorin,et al.  The determination of alternative hospital classifications. , 1981, Health services research.

[23]  L. Lefkovitch,et al.  Cluster generation and grouping using mathematical programming , 1978 .

[24]  L. Tucker Relations between multidimensional scaling and three-mode factor analysis , 1972 .

[25]  P. Arabie,et al.  Indclus: An individual differences generalization of the adclus model and the mapclus algorithm , 1983 .

[26]  Joseph A. Lukes Efficient Algorithm for the Partitioning of Trees , 1974, IBM J. Res. Dev..

[27]  Singiresu S. Rao,et al.  Optimization Theory and Applications , 1980, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  R. E. Marsten An Algorithm for Large Set Partitioning Problems , 1974 .

[29]  Pertti Järvinen,et al.  Technical Note - A Branch-and-Bound Algorithm for Seeking the P-Median , 1972, Oper. Res..

[30]  R. Webster,et al.  COMPUTER‐BASED SOIL MAPPING OF SMALL AREAS FROM SAMPLE DATA , 1972 .

[31]  K. Faegri,et al.  Textbook of Pollen Analysis , 1965 .

[32]  A. Tversky,et al.  Additive similarity trees , 1977 .

[33]  Roger N. Shepard,et al.  Additive clustering: Representation of similarities as combinations of discrete overlapping properties. , 1979 .

[34]  Ramanathan Gnanadesikan,et al.  Methods for statistical data analysis of multivariate observations , 1977, A Wiley publication in applied statistics.

[35]  John M. Liittschwager,et al.  Integer Programming Solution of a Classification Problem , 1978 .

[36]  G. Mills,et al.  The Determination of Local Government Electoral Boundaries , 1967 .

[37]  J. Kruskal Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis , 1964 .

[38]  P A Burrough,et al.  COMPUTER‐BASED SOIL MAPPING OF SMALL AREAS FROM SAMPLE DATA , 1972 .

[39]  Robert R. Roediger,et al.  Political redistricting by computer , 1972, CACM.

[40]  M. Rao Cluster Analysis and Mathematical Programming , 1971 .

[41]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[42]  Brian W. Kernighan,et al.  Optimal Sequential Partitions of Graphs , 1971, J. ACM.

[43]  J. Hartigan REPRESENTATION OF SIMILARITY MATRICES BY TREES , 1967 .

[44]  Theodore D. Klastorin,et al.  An Alternative Method for Hospital Partition Determination Using Hierarchical Cluster Analysis , 1982, Oper. Res..

[45]  H. Crowder,et al.  Cluster Analysis: An Application of Lagrangian Relaxation , 1979 .

[46]  Nicos Christofides,et al.  The Optimal Partitioning of Graphs , 1976 .

[47]  Martin D. Levine,et al.  An Algorithm for Detecting Unimodal Fuzzy Sets and Its Application as a Clustering Technique , 1970, IEEE Transactions on Computers.

[48]  J. Carroll,et al.  Synthesized clustering: A method for amalgamating alternative clustering bases with differential weighting of variables , 1984 .