A support vector machine classifier from a bit-constrained, sparse and localized hypothesis space

Choosing an appropriate hypothesis space in classification applications, according to the Structural Risk Minimization (SRM) principle, is of paramount importance to train effective models: in fact, properly selecting the the space complexity allows to optimize the learned functions performance. This selection is not straightforward, especially (though not solely) when few samples are available for deriving an effective model (e.g. in bioinformatics applications). In this paper, by exploiting a bit-based definition for Support Vector Machine (SVM) classifiers, selected from an hypothesis space described according to sparsity and locality principles, we show how the complexity of the corresponding space of functions can be effectively tuned through the number of bits used for the function representation. Real world datasets are exploited to show how the number of bits and the degree of sparsity/locality imposed to define the hypothesis space affect the complexity of the space of classifiers and, consequently, the performance of the model, picked up from this set.

[1]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[2]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[3]  Marcos M. Campos,et al.  SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[4]  Gert Cauwenberghs,et al.  Kerneltron: Support Vector 'Machine' in Silicon , 2002, SVM.

[5]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[6]  Davide Anguita,et al.  Maximal Discrepancy for Support Vector Machines , 2011, ESANN.

[7]  Narayanan Vijaykrishnan,et al.  A Hardware Efficient Support Vector Machine Architecture for FPGA , 2008, 2008 16th International Symposium on Field-Programmable Custom Computing Machines.

[8]  Shiliang Sun,et al.  PAC-bayes bounds with data dependent priors , 2012, J. Mach. Learn. Res..

[9]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[10]  Davide Anguita,et al.  Maximal Discrepancy vs. Rademacher Complexity for error estimation , 2011, ESANN.

[11]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[12]  Davide Anguita,et al.  Selecting the hypothesis space for improving the generalization ability of Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[13]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[14]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[15]  Davide Anguita,et al.  In-sample model selection for Support Vector Machines , 2011, The 2011 International Joint Conference on Neural Networks.

[16]  V. Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[17]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[18]  Nello Cristianini,et al.  Margin Distribution and Soft Margin , 2000 .

[19]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[20]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[21]  Davide Anguita,et al.  In-sample Model Selection for Trimmed Hinge Loss Support Vector Machine , 2012, Neural Processing Letters.

[22]  Shili Lin,et al.  Sparse Support Vector Machines with L_{p} Penalty for Biomarker Identification , 2010, TCBB.

[23]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[24]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Manfred Mücke,et al.  Effects of Reduced Precision on Floating-Point SVM Classification Accuracy , 2011, International Conference on Conceptual Structures.

[26]  Shaogang Gong,et al.  Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.

[27]  Graziano Pesole,et al.  On the statistical assessment of classifiers using DNA microarray data , 2006, BMC Bioinformatics.

[28]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[29]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[30]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[31]  Davide Anguita,et al.  A Learning Machine with a Bit-Based Hypothesis Space , 2013, ESANN.

[32]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Hartmut Neven,et al.  Training a Large Scale Classifier with the Quantum Adiabatic Algorithm , 2009, ArXiv.

[34]  David Page Comparative Data Mining for Microarrays : A Case Study Based on Multiple Myeloma , 2002 .

[35]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[36]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[37]  M. Daumer,et al.  Evaluating Microarray-based Classifiers: An Overview , 2008, Cancer informatics.

[38]  Chih-Jen Lin,et al.  Asymptotic convergence of an SMO algorithm without any assumptions , 2002, IEEE Trans. Neural Networks.

[39]  Mahesan Niranjan,et al.  Uncertainty in geometric computations , 2002 .

[40]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[41]  Davide Anguita,et al.  A support vector machine with integer parameters , 2008, Neurocomputing.

[42]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[43]  Vojislav Kecman,et al.  Bias Term b in SVMs Again , 2004, ESANN.

[44]  Sang-Woong Lee,et al.  Real-Time Implementation of Face Recognition Algorithms on DSP Chip , 2003, AVBPA.

[45]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[46]  E. L. Lawler,et al.  Branch-and-Bound Methods: A Survey , 1966, Oper. Res..

[47]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[48]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[49]  Michael G. Epitropakis,et al.  Hardware-friendly Higher-Order Neural Network Training using Distributed Evolutionary Algorithms , 2010, Appl. Soft Comput..

[50]  Davide Anguita,et al.  In-Sample and Out-of-Sample Model Selection and Error Estimation for Support Vector Machines , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[52]  Enrique Alba,et al.  Using Variable Neighborhood Search to improve the Support Vector Machine performance in embedded automotive applications , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).