Minimum Density Hyperplanes

Associating distinct groups of objects (clusters) with contiguous regions of high probability density (high-density clusters), is central to many statistical and machine learning approaches to the classification of unlabelled data. We propose a novel hyperplane classifier for clustering and semi-supervised classification which is motivated by this objective. The proposed minimum density hyperplane minimises the integral of the empirical probability density function along it, thereby avoiding intersection with high density clusters. We show that the minimum density and the maximum margin hyperplanes are asymptotically equivalent, thus linking this approach to maximum margin clustering and semi-supervised support vector classifiers. We propose a projection pursuit formulation of the associated optimisation problem which allows us to find minimum density hyperplanes efficiently in practice, and evaluate its performance on a range of benchmark datasets. The proposed approach is found to be very competitive with state of the art methods for clustering and semi-supervised classification.

[1]  Pasi Fränti,et al.  Iterative shrinking method for clustering problems , 2006, Pattern Recognit..

[2]  Shai Ben-David,et al.  Learning Low Density Separators , 2008, AISTATS.

[3]  Benjamin Pfaff,et al.  Perturbation Analysis Of Optimization Problems , 2016 .

[4]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[5]  E. Polak On the mathematical foundations of nondifferentiable optimization in engineering design , 1987 .

[6]  Bernard W. Silverman,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[7]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[8]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[9]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[10]  Nicola Torelli,et al.  Clustering via nonparametric density estimation , 2007, Stat. Comput..

[11]  A. Cuevas,et al.  Estimating the number of clusters , 2000 .

[12]  Adrian S. Lewis,et al.  Approximating Subdifferentials by Random Sampling of Gradients , 2002, Math. Oper. Res..

[13]  Daphne Koller,et al.  Restricted Bayes Optimal Classifiers , 2000, AAAI/IAAI.

[14]  W. Stuetzle,et al.  A Generalized Single Linkage Method for Estimating the Cluster Tree of a Density , 2010 .

[15]  A. Cuevas,et al.  Cluster analysis: a further approach based on density estimation , 2001 .

[16]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[17]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[18]  Shengrui Wang,et al.  On Comparison of Clustering Techniques for Histogram PDF Estimation 1 , 2000 .

[19]  Larry S. Davis,et al.  Automatic online tuning for fast Gaussian summation , 2008, NIPS.

[20]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[21]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[22]  A. Cuevas,et al.  A plug-in approach to support estimation , 1997 .

[23]  Joachim M. Buhmann,et al.  Correlated random features for fast semi-supervised learning , 2013, NIPS.

[24]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[25]  S. Sathiya Keerthi,et al.  Optimization Techniques for Semi-Supervised Support Vector Machines , 2008, J. Mach. Learn. Res..

[26]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[27]  J. Carmichael,et al.  FINDING NATURAL CLUSTERS , 1968 .

[28]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[29]  B. Silverman,et al.  Using Kernel Density Estimates to Investigate Multimodality , 1981 .

[30]  Giovanna Menardi,et al.  An advancement in clustering via nonparametric density estimation , 2014, Stat. Comput..

[31]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[32]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[33]  Rong Jin,et al.  A Simple Algorithm for Semi-supervised Learning with Improved Generalization Error Bound , 2012, ICML.

[34]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[35]  F. Clarke Optimization And Nonsmooth Analysis , 1983 .

[36]  P. Rigollet,et al.  Optimal rates for plug-in estimators of density level sets , 2006, math/0611473.

[37]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[38]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[40]  Yu. S. Ledyaev,et al.  Nonsmooth analysis and control theory , 1998 .

[41]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[42]  Adrian S. Lewis,et al.  Nonsmooth optimization via quasi-Newton methods , 2012, Mathematical Programming.

[43]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[44]  A. Rinaldo,et al.  Generalized density clustering , 2009, 0907.3454.

[45]  Dimitris K. Tasoulis,et al.  Enhancing principal direction divisive clustering , 2010, Pattern Recognit..

[46]  Adrian S. Lewis,et al.  A Robust Gradient Sampling Algorithm for Nonsmooth, Nonconvex Optimization , 2005, SIAM J. Optim..

[47]  Frank E. Curtis,et al.  An adaptive gradient sampling algorithm for non-smooth optimization , 2013, Optim. Methods Softw..

[48]  Philippe Rigollet,et al.  Generalization Error Bounds in Semi-supervised Classification Under the Cluster Assumption , 2006, J. Mach. Learn. Res..

[49]  P. Wolfe On the convergence of gradient methods under constraint , 1972 .

[50]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[51]  J. Hartigan,et al.  The Dip Test of Unimodality , 1985 .

[52]  Vittorio Castelli,et al.  The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter , 1996, IEEE Trans. Inf. Theory.

[53]  Alexander Zien,et al.  A continuation method for semi-supervised SVMs , 2006, ICML.

[54]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.