Deconstructing Kernel Machines

This paper studies the following problem: Given an SVM (kernel)-based binary classifier C as a black-box oracle, how much can we learn of its internal working by querying it? Specifically, we assume the feature space ℝ d is known and the kernel machine has m support vectors such that d > m (or d > > m), and in addition, the classifier C is laconic in the sense that for a feature vector, it only provides a predicted label (a±1) without divulging other information such as margin or confidence level. We formulate the problem of understanding the inner working of C as characterizing the decision boundary of the classifier, and we introduce the simple notion of bracketing to sample points on the decision boundary within a prescribed accuracy. For the five most common types of kernel function, linear, quadratic and cubic polynomial kernels, hyperbolic tangent kernel and Gaussian kernel, we show that with O(dm) number of queries, the type of kernel function and the (kernel) subspace spanned by the support vectors can be determined. In particular, for polynomial kernels, additional O(m 3) queries are sufficient to reconstruct the entire decision boundary, providing a set of quasi-support vectors that can be used to efficiently evaluate the deconstructed classifier. We speculate briefly on the future application potential of deconstructing kernel machines and we present experimental results validating the proposed method.

[1]  Gunnar E. Carlsson,et al.  Topology and data , 2009 .

[2]  John Langford,et al.  Agnostic active learning , 2006, J. Comput. Syst. Sci..

[3]  Tamal K. Dey,et al.  Curve and Surface Reconstruction , 2004, Handbook of Discrete and Computational Geometry, 2nd Ed..

[4]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[5]  Christopher Meek,et al.  Adversarial learning , 2005, KDD '05.

[6]  Pierre Comon,et al.  Symmetric tensor decomposition , 2009, 2009 17th European Signal Processing Conference.

[7]  Daniel Lowd,et al.  Convex Adversarial Collective Classification , 2013, ICML.

[8]  Sanjoy Dasgupta,et al.  Analysis of a greedy active learning strategy , 2004, NIPS.

[9]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[10]  Gert Cauwenberghs,et al.  SVM incremental learning, adaptation and optimization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[11]  Gene H. Golub,et al.  Matrix computations , 1983 .

[12]  Yurii Nesterov,et al.  On first-order algorithms for l1/nuclear norm minimization , 2013, Acta Numerica.

[13]  Mark de Berg,et al.  Computational geometry: algorithms and applications , 1997 .

[14]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[15]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[16]  Pascal Vincent,et al.  The Manifold Tangent Classifier , 2011, NIPS.

[17]  Robert L. Grossman,et al.  Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining , 2005, KDD 2005.

[18]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[19]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[20]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[21]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[22]  E. Ballico,et al.  Decomposition of homogeneous polynomials with low rank , 2010, 1003.5157.

[23]  Michael T. Heath,et al.  Scientific Computing , 2018 .

[24]  Maria-Florina Balcan,et al.  Margin Based Active Learning , 2007, COLT.