Distributed SVM learning and support vector reduction

A Support Vector Machine (SVM) is a machine learning classification method with robust learning and minimal overtraining concerns. In this paper experimental data involving a nanopore detector is examined, where 150component feature data is gathered on individual molecules. SVM training data can be produced by the nanopore detector in prodigious amounts, to arrive at a set of 150-component feature vectors that number from 10,000 to 100,000 for the experiments described, depending on the number of molecular classes being examined for a particular application. Training sets of 10,000 or more, however, can’t be managed with a single PC-based resource. For this reason most SVM implementations must contend with some kind of chunking process to learn parts of the data at a time. In this paper, two sets of binary SVM training results are examined. These results show that chunk aliasing and outlier accumulation may pose problems for distributed SVM learning. The results also present new methods and how they offer a stable learning solution to these problems at minimal cost. One of the methods extends the learning process with modified alpha-selection heuristics that enable a support-vector reduction phase. The distributed SVM learning described here was implemented using Java RMI, and was developed to run on a network of multi-core computers. 92 Stephen Winters-Hilt

[1]  Matthew Landry,et al.  Support Vector Machine Implementations for Classification & Clustering , 2006, BMC Bioinformatics.

[2]  Chih-Jen Lin,et al.  A Comparison of Methods for Multi-class Support Vector Machines , 2015 .

[3]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[4]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2004 .

[5]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[6]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[10]  Luca Zanni,et al.  A parallel solver for large quadratic programs in training support vector machines , 2003, Parallel Comput..

[11]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[12]  Armond,et al.  Distributed Support Vector Machine Learning , 2008 .

[13]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[15]  Laura Schweitzer,et al.  Advances In Kernel Methods Support Vector Learning , 2016 .

[16]  David Haussler,et al.  Highly accurate classification of Watson-Crick basepairs on termini of single DNA molecules. , 2003, Biophysical journal.

[17]  Igor Durdanovic,et al.  Parallel Support Vector Machines: The Cascade SVM , 2004, NIPS.