Compositional Kernel Machines

Convolutional neural networks (convnets) have achieved impressive results on recent computer vision benchmarks. While they benefit from multiple layers that encode nonlinear decision boundaries and a degree of translation invariance, training convnets is a lengthy procedure fraught with local optima. Alternatively, a kernel method that incorporates the compositionality and symmetry of convnets could learn similar nonlinear concepts yet with easier training and architecture selection. We propose compositional kernel machines (CKMs), which effectively create an exponential number of virtual training instances by composing transformed sub-regions of the original ones. Despite this, CKM discriminant functions can be computed efficiently using ideas from sum-product networks. The ability to compose virtual instances in this way gives CKMs invariance to translations and other symmetries, and combats the curse of dimensionality. Just as support vector machines (SVMs) provided a compelling alternative to multilayer perceptrons when they were introduced, CKMs could become an attractive approach for object recognition and other vision problems. In this paper we define CKMs, explore their properties, and present promising results on NORB datasets. Experiments show that CKMs can outperform SVMs and be competitive with convnets in a number of dimensions, by learning symmetries and compositional concepts from fewer samples without data augmentation.

[1]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[2]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[3]  C PlattJ,et al.  A Neural Network Classifier for the I1000 OCR Chip. , 1996 .

[4]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[7]  Michael G. Thomason,et al.  Syntactic Methods in Pattern Recognition , 1982 .

[8]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[9]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[10]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[11]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[12]  Pedro M. Domingos,et al.  The Sum-Product Theorem: A Foundation for Learning Tractable Models , 2016, ICML.

[13]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrew W. Moore,et al.  Efficient Algorithms for Minimizing Cross Validation Error , 1994, ICML.

[15]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[16]  Eli Shechtman,et al.  In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Shih-Fu Chang,et al.  Spherical hashing , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[20]  伊藤 正男 The cerebellum : brain for an implicit self , 2012 .