论文信息 - Unsupervised ensemble minority clustering

Unsupervised ensemble minority clustering

Cluster analysis lies at the core of most unsupervised learning tasks. However, the majority of clustering algorithms depend on the all-in assumption, in which all objects belong to some cluster, and perform poorly on minority clustering tasks, in which a small fraction of signal data stands against a majority of noise.The approaches proposed so far for minority clustering are supervised: they require the number and distribution of the foreground and background clusters. In supervised learning and all-in clustering, combination methods have been successfully applied to obtain distribution-free learners, even from the output of weak individual algorithms.In this work, we propose a novel ensemble minority clustering algorithm, Ewocs, suitable for weak clustering combination. Its properties have been theoretically proved under a loose set of constraints. We also propose a number of weak clustering algorithms, and an unsupervised procedure to determine the scaling parameters for Gaussian kernels used within the task.We have implemented a number of approaches built from the proposed components, and evaluated them on a collection of datasets. The results show how approaches based on Ewocs are competitive with respect to—and even outperform—other minority clustering approaches in the state of the art.

Jordi Turmo | Edgar González | J. Turmo | Edgar González

[1] Adrian E. Raftery,et al. How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis , 1998, Comput. J..

[2] Mark A. Girolami,et al. Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[3] Einoshin Suzuki,et al. An Information Theoretic Approach to Detection of Minority Subsets in Database , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4] Anil K. Jain,et al. Data clustering: a review , 1999, CSUR.

[5] Lakhmi C. Jain,et al. Data Mining: Foundations and Intelligent Paradigms , 2012 .

[6] Karen Spärck Jones. A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[7] Yizong Cheng,et al. Mean Shift, Mode Seeking, and Clustering , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Joydeep Ghosh,et al. Bregman bubble clustering , 2008, ACM Trans. Knowl. Discov. Data.

[9] Shin Ando,et al. Clustering Needles in a Haystack: An Information Theoretic Analysis of Minority and Outlier Detection , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[10] Treebank Penn,et al. Linguistic Data Consortium , 1999 .

[11] Jean-Michel Renders,et al. Word-Sequence Kernels , 2003, J. Mach. Learn. Res..

[12] Larry D. Hostetler,et al. The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[13] Aristides Gionis,et al. Clustering aggregation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14] Michael Collins,et al. New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[15] Rui Xu,et al. Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[16] Chandan Srivastava,et al. Support Vector Data Description , 2011 .

[17] Veit Schwämmle,et al. BIOINFORMATICS ORIGINAL PAPER , 2022 .

[18] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[19] Edgar Gonzàlez Pellicer. Unsupervised learning of relation detection patterns , 2012 .

[20] Hans-Peter Kriegel,et al. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[21] George J. Klir,et al. Fuzzy sets and fuzzy logic - theory and applications , 1995 .

[22] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[23] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[25] Richard E. Blahut,et al. Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.

[26] A. Raftery,et al. Model-based Gaussian and non-Gaussian clustering , 1993 .

[27] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[28] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[29] Daniel T. Larose,et al. Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[30] Joydeep Ghosh,et al. Bregman Bubble Clustering: A Robust, Scalable Framework for Locating Multiple, Dense Regions in Data , 2006, Sixth International Conference on Data Mining (ICDM'06).

[31] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[33] Mark Goadrich,et al. The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34] H. L. Le Roy,et al. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; Vol. IV , 1969 .

[35] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .

[36] Joydeep Ghosh,et al. Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37] Rajesh N. Davé,et al. Robust clustering methods: a unified view , 1997, IEEE Trans. Fuzzy Syst..

[38] Joydeep Ghosh,et al. Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39] Naftali Tishby,et al. Document clustering using word clusters via the information bottleneck method , 2000, SIGIR '00.

[40] Geoffrey J. McLachlan,et al. Robust mixture modelling using the t distribution , 2000, Stat. Comput..

[41] James C. Bezdek,et al. Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[42] Karen Sparck Jones. A statistical interpretation of term specificity and its application in retrieval , 1972 .

[43] Martin F. Porter,et al. An algorithm for suffix stripping , 1997, Program.

[44] Victoria J. Hodge,et al. A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[45] Koby Crammer,et al. A rate-distortion one-class model and its applications to clustering , 2008, ICML '08.

[46] Koby Crammer,et al. On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[47] VARUN CHANDOLA,et al. Anomaly detection: A survey , 2009, CSUR.

[48] Arnon Karnieli,et al. Linear mixture model approach for selecting fuzzy exponent value in fuzzy c-means algorithm , 2006, Ecol. Informatics.

[49] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[50] G. Loukidis,et al. SIAM International Conference on Data Mining (SDM) , 2015 .

[51] Inderjit S. Dhillon,et al. Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[52] Anil K. Jain,et al. Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53] Tian Zhang,et al. BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[54] George Karypis,et al. Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[55] Peter Bauer,et al. Multiple Hypothesenprüfung / Multiple Hypotheses Testing , 1988, Medizinische Informatik und Statistik.

[56] Anil K. Jain,et al. Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[57] Jian Yu,et al. Analysis of the weighting exponent in the FCM , 2004, IEEE Trans. Syst. Man Cybern. Part B.

[58] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[59] Gérard Govaert,et al. Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[60] Anil K. Jain,et al. A Mixture Model for Clustering Ensembles , 2004, SDM.

[61] Ossama Emam,et al. Unsupervised Information Extraction Approach Using Graph Mutual Reinforcement , 2006, EMNLP.

[62] Janez Demsar,et al. Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[63] S. García,et al. An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[64] Jordi Turmo,et al. Unsupervised Relation Extraction by Massive Clustering , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[65] M. M. Moya,et al. One-class classifier networks for target recognition applications , 1993 .

[66] Joydeep Ghosh,et al. A Consensus Framework for Integrating Distributed Clusterings Under Limited Knowledge Sharing , 2002 .

[67] G. Hommel,et al. Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[68] Michael Brady,et al. Estimating the bias field of MR images , 1997, IEEE Transactions on Medical Imaging.

[69] Anil K. Jain,et al. Clustering Methodologies in Exploratory Data Analysis , 1980, Adv. Comput..

[70] Rajesh N. Davé,et al. Characterization and detection of noise in clustering , 1991, Pattern Recognit. Lett..

[71] Ali S. Hadi,et al. Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[72] Koby Crammer,et al. A needle in a haystack: local one-class optimization , 2004, ICML.

[73] Joydeep Ghosh,et al. Robust one-class clustering using hybrid global and local search , 2005, ICML.

[74] Suguru Arimoto,et al. An algorithm for computing the capacity of arbitrary discrete memoryless channels , 1972, IEEE Trans. Inf. Theory.

[75] Peter W. Eklund,et al. A study of parameter values for a Mahalanobis Distance fuzzy classifier , 2003, Fuzzy Sets Syst..