Linearithmic Time Sparse and Convex Maximum Margin Clustering

Recently, a new clustering method called maximum margin clustering (MMC) was proposed and has shown promising performances. It was originally formulated as a difficult nonconvex integer problem. To make the MMC problem practical, the researchers either relaxed the original MMC problem to inefficient convex optimization problems or reformulated it to nonconvex optimization problems, which sacrifice the convexity for efficiency. However, no approaches can both hold the convexity and be efficient. In this paper, a new linearithmic time sparse and convex MMC algorithm, called support-vector-regression-based MMC (SVR-MMC), is proposed. Generally, it first uses the SVR as the core of the MMC. Then, it is relaxed as a convex optimization problem, which is iteratively solved by the cutting-plane algorithm. Each cutting-plane subproblem is further decomposed to a serial supervised SVR problem by a new global extended-level method (GELM). Finally, each supervised SVR problem is solved in a linear time complexity by a new sparse-kernel SVR (SKSVR) algorithm. We further extend the SVR-MMC algorithm to the multiple-kernel clustering (MKC) problem and the multiclass MMC (M3C) problem, which are denoted as SVR-MKC and SVR-M3C, respectively. One key point of the algorithms is the utilization of the SVR. It can prevent the MMC and its extensions meeting an integer matrix programming problem. Another key point is the new SKSVR. It provides a linear time interface to the nonlinear kernel scenarios, so that the SVR-MMC and its extensions can keep a linearthmic time complexity in nonlinear kernel scenarios. Our experimental results on various real-world data sets demonstrate the effectiveness and the efficiency of the SVR-MMC and its two extensions. Moreover, the unsupervised application of the SVR-MKC to the voice activity detection (VAD) shows that the SVR-MKC can achieve good performances that are close to its supervised counterpart, meet the real-time demand of the VAD, and need no labeling for model training.

[1]  Dong Enqing,et al.  Applying support vector machines to voice activity detection , 2002, 6th International Conference on Signal Processing, 2002..

[2]  Joon-Hyuk Chang,et al.  Voice activity detection based on statistical models and machine learning approaches , 2010, Comput. Speech Lang..

[3]  Daniel Hernández-Lobato,et al.  An Analysis of Ensemble Pruning Techniques Based on Ordered Aggregation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Sang-Ick Kang,et al.  Discriminative Weight Training for a Statistical Model-Based Voice Activity Detection , 2008, IEEE Signal Processing Letters.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Ji Wu,et al.  An efficient voice activity detection algorithm by combining statistical model and energy detection , 2011, EURASIP J. Adv. Signal Process..

[7]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[8]  Kuang-Yao Lee,et al.  Multiclass support vector classification via coding and regression , 2010, Neurocomputing.

[9]  Fei Wang,et al.  Efficient multiclass maximum margin clustering , 2008, ICML '08.

[10]  Yann Guermeur,et al.  Combining Discriminant Models with New Multi-Class SVMs , 2002, Pattern Analysis & Applications.

[11]  Oliver Kramer,et al.  Fast evolutionary maximum margin clustering , 2009, ICML '09.

[12]  Yi Peng,et al.  Unsupervised and Semi-supervised Support Vector Machines , 2011 .

[13]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[14]  Yiu-ming Cheung,et al.  Semi-Supervised Maximum Margin Clustering with Pairwise Constraints , 2012, IEEE Transactions on Knowledge and Data Engineering.

[15]  Yurii Nesterov,et al.  New variants of bundle methods , 1995, Math. Program..

[16]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[17]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[18]  Tatsuya Kawahara,et al.  Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection , 2010, IEEE Journal of Selected Topics in Signal Processing.

[19]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Jianwu Dang,et al.  Voice Activity Detection Based on an Unsupervised Learning Framework , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[22]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[23]  Zenglin Xu,et al.  An Extended Level Method for Efficient Multiple Kernel Learning , 2008, NIPS.

[24]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[25]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[26]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[27]  Zhi-Hua Zhou,et al.  Cost-sensitive face recognition , 2008, CVPR.

[28]  Stan Z. Li,et al.  Stochastic gradient kernel density mode-seeking , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Yihong Gong,et al.  iHelp: An Intelligent Online Helpdesk System , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Ivor W. Tsang,et al.  Maximum Margin Clustering Made Practical , 2007, IEEE Transactions on Neural Networks.

[31]  Joon-Hyuk Chang,et al.  Statistical model-based voice activity detection using support vector machine , 2009 .

[32]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[33]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[34]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[35]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[36]  Joon-Hyuk Chang,et al.  A New Statistical Voice Activity Detection Based on UMP Test , 2007, IEEE Signal Processing Letters.

[37]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine-mediated learning.

[38]  Glenn Fung,et al.  Multicategory Proximal Support Vector Machine Classifiers , 2005, Machine Learning.

[39]  Yves Grandvalet,et al.  More efficiency in multiple kernel learning , 2007, ICML '07.

[40]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[41]  Xiang Zhang,et al.  SPARSE KERNEL MAXIMUM MARGIN CLUSTERING , 2011 .

[42]  Ivor W. Tsang,et al.  Tighter and Convex Maximum Margin Clustering , 2009, AISTATS.

[43]  Hong Yan,et al.  Framelet Kernels With Applications to Support Vector Regression and Regularization Networks , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[44]  Stephen P. Boyd,et al.  A minimax theorem with applications to machine learning, signal processing, and finance , 2007, 2007 46th IEEE Conference on Decision and Control.

[45]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[46]  Sergio Escalera,et al.  On the Decoding Process in Ternary Error-Correcting Output Codes , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ran He,et al.  Robust Principal Component Analysis Based on Maximum Correntropy Criterion , 2011, IEEE Transactions on Image Processing.

[48]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[49]  John H. L. Hansen,et al.  Discriminative Training for Multiple Observation Likelihood Ratio Based Voice Activity Detection , 2010, IEEE Signal Processing Letters.

[50]  Wang Jeen-Shing,et al.  A Cluster Validity Measure With Outlier Detection for Support Vector Clustering , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[51]  Jordi Vitrià,et al.  Discriminant ECOC: a heuristic method for application dependent design of error correcting output codes , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  J. E. Kelley,et al.  The Cutting-Plane Method for Solving Convex Programs , 1960 .

[53]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[54]  E. Shlomot,et al.  ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications , 1997, IEEE Commun. Mag..

[55]  Thomas Hofmann,et al.  Kernel Methods for Missing Variables , 2005, AISTATS.

[56]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[57]  Pradipta Maji,et al.  Fuzzy–Rough Supervised Attribute Clustering Algorithm and Classification of Microarray Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[58]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[59]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[60]  Yves Grandvalet,et al.  Y.: SimpleMKL , 2008 .

[61]  Kevin J. Cherkauer Human Expert-level Performance on a Scientiic Image Analysis Task by a System Using Combined Artiicial Neural Networks , 1996 .

[62]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[63]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[64]  Ran He,et al.  Agglomerative Mean-Shift Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[65]  Gunnar Rätsch,et al.  Large Scale Multiple Kernel Learning , 2006, J. Mach. Learn. Res..

[66]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[67]  Bin Zhao,et al.  Multiple Kernel Clustering , 2009, SDM.

[68]  Larry D. Hostetler,et al.  The estimation of the gradient of a density function, with applications in pattern recognition , 1975, IEEE Trans. Inf. Theory.

[69]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[70]  Rong Jin,et al.  Generalized Maximum Margin Clustering and Unsupervised Kernel Learning , 2006, NIPS.

[71]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[72]  Fei Wang,et al.  Linear Time Maximum Margin Clustering , 2010, IEEE Transactions on Neural Networks.

[73]  Zenglin Xu,et al.  Efficient Sparse Generalized Multiple Kernel Learning , 2011, IEEE Transactions on Neural Networks.

[74]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[75]  Hava T. Siegelmann,et al.  A support vector clustering method , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[76]  Juan Manuel Górriz,et al.  SVM-based speech endpoint detection using contextual speech features , 2006 .

[77]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[78]  Cheng Soon Ong,et al.  Multiclass multiple kernel learning , 2007, ICML '07.

[79]  Xiaotong Yuan,et al.  Stochastic gradient kernel density mode-seeking , 2009, CVPR.

[80]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[81]  Ji Wu,et al.  Efficient Multiple Kernel Support Vector Machine Based Voice Activity Detection , 2011, IEEE Signal Processing Letters.

[82]  Sergio Escalera,et al.  Subclass Problem-Dependent Design for Error-Correcting Output Codes , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Sergio Escalera,et al.  An incremental node embedding technique for error correcting output codes , 2008, Pattern Recognit..

[84]  Dale Schuurmans,et al.  Maximum Margin Clustering , 2004, NIPS.

[85]  Anirban Mukherjee,et al.  Discriminant Analysis for Fast Multiclass Data Classification Through Regularized Kernel Function Approximation , 2010, IEEE Transactions on Neural Networks.

[86]  Nenghai Yu,et al.  Maximum Margin Clustering with Pairwise Constraints , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[87]  Ji Wu,et al.  Maximum Margin Clustering Based Statistical VAD With Multiple Observation Compound Feature , 2011, IEEE Signal Processing Letters.

[88]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[89]  Santosh S. Vempala,et al.  On clusterings-good, bad and spectral , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[90]  Mohamed S. Kamel,et al.  A generalized adaptive ensemble generation and aggregation approach for multiple classifier systems , 2009, Pattern Recognit..

[91]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[92]  Loris Nanni,et al.  FuzzyBagging: A novel ensemble of classifiers , 2006, Pattern Recognit..

[93]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[94]  Wei Hu,et al.  Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[95]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[96]  Fei Wang,et al.  Efficient Maximum Margin Clustering via Cutting Plane Algorithm , 2008, SDM.