Evidence accumulation clustering using combinations of features

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data.• The clustering ensemble is made by clustering on subset combinations of features from the data• The recombined data may be prewhitened• Evidence accumulation can be improved by weighting the evidence with a goodness-of-clustering measure

[1]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[2]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[3]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  F. J. Richards A Flexible Growth Function for Empirical Use , 1959 .

[5]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[6]  Anil K. Jain,et al.  A self-organizing network for hyperellipsoidal clustering (HEC) , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[7]  Vladimir Filkov,et al.  Consensus Clustering Algorithms: Comparison and Refinement , 2008, ALENEX.

[8]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[9]  Jorge J. Moré,et al.  Computing a Trust Region Step , 1983 .

[10]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[11]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[12]  K. Strimmer,et al.  Optimal Whitening and Decorrelation , 2015, 1512.00809.

[13]  Anil K. Jain Data clustering: 50 years beyond K-means , 2008, Pattern Recognit. Lett..

[14]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[15]  A. Revonsuo,et al.  The Dream Catcher experiment: blinded analyses failed to detect markers of dreaming consciousness in EEG spectral power , 2020, Neuroscience of consciousness.

[16]  A. Revonsuo,et al.  The Dream Catcher experiment: Blinded analyses disconfirm markers of dreaming consciousness in EEG spectral power , 2019, bioRxiv.