Low-Density Cut Based Tree Decomposition for Large-Scale SVM Problems

The current trend of growth of information reveals that it is inevitable that large-scale learning problems become the norm. In this paper, we propose and analyze a novel Low-density Cut based tree Decomposition method for large-scale SVM problems, called LCD-SVM. The basic idea here is divide and conquer: use a decision tree to decompose the data space and train SVMs on the decomposed regions. Specifically, we demonstrate the application of low density separation principle to devise a splitting criterion for rapidly generating a high-quality tree, thus maximizing the benefits of SVMs training. Extensive experiments on 14 real-world datasets show that our approach can provide a significant improvement in training time over state-of-the-art methods while keeps comparable test accuracy with other methods, especially for very large-scale datasets.

[1]  Bernd Barak,et al.  Data Mining and Support Vector Regression Machine Learning in Semiconductor Manufacturing to Improve Virtual Metrology , 2013, 2013 46th Hawaii International Conference on System Sciences.

[2]  Bao-Liang Lu,et al.  A Parallel and Modular Pattern Classification Framework for Large-Scale Problems , 2009 .

[3]  Daniel Boley,et al.  Training Support Vector Machines Using Adaptive Clustering , 2004, SDM.

[4]  Edward Y. Chang,et al.  Learning the unified kernel machines for classification , 2006, KDD '06.

[5]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[6]  Venu Govindaraju,et al.  Speeding Up Multi-class SVM Evaluation by PCA and Feature Selection , 2004 .

[7]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[8]  Alexander Zien,et al.  Semi-Supervised Classification by Low Density Separation , 2005, AISTATS.

[9]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[10]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[11]  K. Woodsend Using interior point methods for large-scale support vector machine training , 2010 .

[12]  Thomas M. Link,et al.  The Effects of Geometric and Threshold Definitions on Cortical Bone Metrics Assessed by In Vivo High-Resolution Peripheral Quantitative Computed Tomography , 2007, Calcified Tissue International.

[13]  Thorsten Joachims,et al.  Sparse kernel SVMs via cutting-plane training , 2009, Machine Learning.

[14]  Jason A. Laska,et al.  Randomized Sampling for Large Data Applications of SVM , 2012, 2012 11th International Conference on Machine Learning and Applications.

[15]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[16]  Ivor W. Tsang,et al.  Core Vector Machines: Fast SVM Training on Very Large Data Sets , 2005, J. Mach. Learn. Res..

[17]  Koby Crammer,et al.  Breaking the curse of kernelization: budgeted stochastic gradient descent for large-scale SVM training , 2012, J. Mach. Learn. Res..

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Jiawei Han,et al.  Classifying large data sets using SVMs with hierarchical clusters , 2003, KDD '03.

[20]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[21]  Abbas Toloie Eshlaghy,et al.  Using Three Machine Learning Techniques for Predicting Breast Cancer Recurrence , 2013 .

[22]  Marcos M. Campos,et al.  O-Cluster: scalable clustering of large high dimensional data sets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[23]  Philip S. Yu,et al.  Clustering through decision tree construction , 2000, CIKM '00.

[24]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[25]  Ruchi Jain,et al.  A Comparative Study of Hidden Markov Model and Support Vector Machine in Anomaly Intrusion Detection , 2013 .

[26]  Chi-Jen Lu,et al.  Tree Decomposition for Large-Scale SVM Problems , 2010, 2010 International Conference on Technologies and Applications of Artificial Intelligence.