A CLUE for CLUster Ensembles

Cluster ensembles are collections of individual solutions to a given clustering problem which are useful or necessary to consider in a wide range of applications. The R package clue provides an extensible computational environment for creating and analyzing cluster ensembles, with basic data structures for representing partitions and hierarchies, and facilities for computing on these, including methods for measuring proximity and obtaining consensus and "secondary" clusterings.

[1]  Alain Guénoche,et al.  Trees and proximity representations , 1991, Wiley-Interscience series in discrete mathematics and optimization.

[2]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[3]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[4]  Dimitri P. Bertsekas,et al.  RELAX-IV : a faster version of the RELAX code for solving minimum cost flow problems , 1994 .

[5]  F. Leisch FlexMix: A general framework for finite mixture models and latent class regression in R , 2004 .

[6]  Joachim M. Buhmann,et al.  A Resampling Approach to Cluster Validation , 2002, COMPSTAT.

[7]  FraleyChris,et al.  Enhanced Model-Based Clustering, Density Estimation,and Discriminant Analysis Software , 2003 .

[8]  Teofilo F. Gonzalez,et al.  On the computational complexity of clustering and related problems , 1982 .

[9]  A. D. Gordon,et al.  Partitions of Partitions , 1998 .

[10]  Alain Guénoche,et al.  Maximum Transfer Distance Between Partitions , 2006, J. Classif..

[11]  Issue 12 , 2003 .

[12]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[13]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[14]  Kurt Hornik,et al.  A Combination Scheme for Fuzzy Clustering , 2002, AFSS.

[15]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[16]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[17]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[18]  Stan Lipovetsky,et al.  The Structural Representation of Proximity Matrices With MATLAB , 2007, Technometrics.

[19]  Maurizio Vichi,et al.  Fuzzy partition models for fitting a set of partitions , 2001 .

[20]  Marina Meila,et al.  Comparing Clusterings by the Variation of Information , 2003, COLT.

[21]  Boris Mirkin,et al.  Mathematical Classification and Clustering , 1996 .

[22]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[23]  Panos M. Pardalos,et al.  Randomized parallel algorithms for the multidimensional assignment problem , 2004 .

[24]  William H. E. Day,et al.  Extremes in the Complexity of Computing Metric Distances Between Partitions , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[26]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[27]  G. A. Miller,et al.  An Analysis of Perceptual Confusions Among Some English Consonants , 1955 .

[28]  William H. E. Day,et al.  Foreword: Comparison and consensus of classifications , 1986 .

[29]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[30]  Mia Hubert,et al.  Clustering in an object-oriented environment , 1997 .

[31]  Moonja P. Kim,et al.  The Method of Sorting as a Data-Gathering Procedure in Multivariate Research. , 1975, Multivariate behavioral research.

[32]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[33]  Bernhard Schölkopf,et al.  Learning Theory and Kernel Machines , 2003, Lecture Notes in Computer Science.

[34]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[35]  Mirko Krivánek,et al.  NP-hard problems in hierarchical-tree clustering , 1986, Acta Informatica.

[36]  Samuel E. Buttrey,et al.  Calling the lp_solve Linear Program Software from R, S-PLUS and Excel , 2005 .

[37]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[38]  L. Katz,et al.  A proposed index of the conformity of one sociometric measurement to another , 1953 .

[39]  David L. Wallace,et al.  A Method for Comparing Two Hierarchical Clusterings: Comment , 1983 .

[40]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[41]  Lloyd G. Humphreys,et al.  Multivariate Applications in the Social Sciences , 1982 .

[42]  P. Green,et al.  A Generalized Rand-Index Method for Consensus Clustering of Separate Partitions of the Same Data Base , 1999 .

[43]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[44]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[45]  Ana L. N. Fred,et al.  Data clustering using evidence accumulation , 2002, Object recognition supported by user interaction for service robots.

[46]  E. Forgy,et al.  Cluster analysis of multivariate data : efficiency versus interpretability of classifications , 1965 .

[47]  F. Leisch Bagged Clustering , 1999 .

[48]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[49]  Fred R. McMorris,et al.  Consensusn-trees , 1981 .

[50]  Adrian E. Raftery,et al.  Enhanced Model-Based Clustering, Density Estimation, and Discriminant Analysis Software: MCLUST , 2003, J. Classif..

[51]  Friedrich Leisch,et al.  A toolbox for K-centroids cluster analysis , 2006 .

[52]  L. Hubert,et al.  Iterative projection strategies for the least-squares fitting of tree structures to proximity data , 1995 .

[53]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[54]  Panos M. Pardalos,et al.  Asymptotic Results for Random Multidimensional Assignment Problems , 2005, Comput. Optim. Appl..

[55]  Anthony V. Fiacco,et al.  Nonlinear programming;: Sequential unconstrained minimization techniques , 1968 .

[56]  Joydeep Ghosh,et al.  Relationship-Based Clustering and Visualization for High-Dimensional Data Mining , 2003, INFORMS J. Comput..

[57]  Yoshiharu Sato,et al.  On a multicriteria fuzzy Clustering Method for 3-Way Data , 1994, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[58]  B. Mellers,et al.  Similarity and Choice , 2004 .

[59]  Paolo Toth,et al.  Algorithm 548: Solution of the Assignment Problem [H] , 1980, TOMS.

[60]  Geert De Soete,et al.  A least squares algorithm for fitting an ultrametric tree to a dissimilarity matrix , 1984, Pattern Recognit. Lett..

[61]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[62]  Martin Schader,et al.  Clusterwise aggregation of relations , 1988 .