Unsupervised ensemble minority clustering

Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise.The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms.In this work, we propose a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination. Its properties have been theoretically proved under a loose set of constraints. We also propose a number of weak clustering algorithms, and an unsupervised procedure to determine the scaling parameters for Gaussian kernels used within the task.We have implemented a number of approaches built from the proposed components, and evaluated them on a collection of datasets. The results show how approaches based on Ewocs are competitive with respect to—and even outperform—other minority clustering approaches in the state of the art.

[1]  Adrian E. Raftery,et al.  How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[3]  Einoshin Suzuki,et al.  An Information Theoretic Approach to Detection of Minority Subsets in Database , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[5]  Lakhmi C. Jain,et al.  Data Mining: Foundations and Intelligent Paradigms , 2012 .

[6]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[7]  Yizong Cheng,et al.  Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8]  Joydeep Ghosh,et al.  Bregman bubble clustering , 2008, ACM Trans. Knowl. Discov. Data.

[9]  Shin Ando,et al.  Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10]  Treebank Penn,et al.  Linguistic Data Consortium , 1999 .

[11]  Jean-Michel Renders,et al.  Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[12]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[13]  Aristides Gionis,et al.  Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[15]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[17]  Veit Schwämmle,et al.  BIOINFORMATICS ORIGINAL PAPER , 2022 .

[18]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[19]  Edgar Gonzàlez Pellicer Unsupervised learning of relation detection patterns , 2012 .

[20]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21]  George J. Klir,et al.  Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[22]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[25]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[26]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[27]  L. Bregman The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[28]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[29]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[30]  Joydeep Ghosh,et al.  Bregman Bubble Clustering: A Robust, Scalable Framework for Locating Multiple, Dense Regions in Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[33]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34]  H. L. Le Roy,et al.  Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[35]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[36]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37]  Rajesh N. Davé,et al.  Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[38]  Joydeep Ghosh,et al.  Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  Naftali Tishby,et al.  Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[40]  Geoffrey J. McLachlan,et al.  Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[41]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[42]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[43]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[44]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[45]  Koby Crammer,et al.  A rate-distortion one-class model and its applications to clustering , 2008, ICML '08.

[46]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[47]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[48]  Arnon Karnieli,et al.  Linear mixture model approach for selecting fuzzy exponent value in fuzzy c-means algorithm , 2006, Ecol. Informatics.

[49]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[50]  G. Loukidis,et al.  SIAM International Conference on Data Mining (SDM) , 2015 .

[51]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[52]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[54]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[55]  Peter Bauer,et al.  Multiple Hypothesenprüfung / Multiple Hypotheses Testing , 1988, Medizinische Informatik und Statistik.

[56]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[57]  Jian Yu,et al.  Analysis of the weighting exponent in the FCM , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[58]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[59]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[61]  Ossama Emam,et al.  Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement , 2006, EMNLP.

[62]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[63]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[64]  Jordi Turmo,et al.  Unsupervised Relation Extraction by Massive Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[65]  M. M. Moya,et al.  One-class classifier networks for target recognition applications , 1993 .

[66]  Joydeep Ghosh,et al.  A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[67]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[68]  Michael Brady,et al.  Estimating the bias field of MR images , 1997, IEEE Transactions on Medical Imaging.

[69]  Anil K. Jain,et al.  Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[70]  Rajesh N. Davé,et al.  Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[71]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[72]  Koby Crammer,et al.  A needle in a haystack: local one-class optimization , 2004, ICML.

[73]  Joydeep Ghosh,et al.  Robust one-class clustering using hybrid global and local search , 2005, ICML.

[74]  Suguru Arimoto,et al.  An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[75]  Peter W. Eklund,et al.  A study of parameter values for a Mahalanobis Distance fuzzy classifier , 2003, Fuzzy Sets Syst..