A Mean-Field Theory for Learning the Schönberg Measure of Radial Basis Functions

We develop and analyze a projected particle Langevin optimization method to learn the distribution in the Schonberg integral representation of the radial basis functions from training samples. More specifically, we characterize a distributionally robust optimization method with respect to the Wasserstein distance to optimize the distribution in the Schonberg integral representation. To provide theoretical performance guarantees, we analyze the scaling limits of a projected particle online (stochastic) optimization method in the mean-field regime. In particular, we prove that in the scaling limits, the empirical measure of the Langevin particles converges to the law of a reflected Ito diffusion-drift process. Moreover, the drift is also a function of the law of the underlying process. Using Ito lemma for semi-martingales and Grisanov's change of measure for the Wiener processes, we then derive a Mckean-Vlasov type partial differential equation (PDE) with Robin boundary conditions that describes the evolution of the empirical measure of the projected Langevin particles in the mean-field regime. In addition, we establish the existence and uniqueness of the steady-state solutions of the derived PDE in the weak sense. We apply our learning approach to train radial kernels in the kernel locally sensitive hash (LSH) functions, where the training data-set is generated via a $k$-mean clustering method on a small subset of data-base. We subsequently apply our kernel LSH with a trained kernel for image retrieval task on MNIST data-set, and demonstrate the efficacy of our kernel learning approach. We also apply our kernel learning approach in conjunction with the kernel support vector machines (SVMs) for classification of benchmark data-sets.

[1]  Richard Sinkhorn,et al.  Concerning nonnegative matrices and doubly stochastic matrices , 1967 .

[2]  T. Chan,et al.  Dynamics of the McKean-Vlasov equation , 1994 .

[3]  I. J. Schoenberg Metric spaces and completely monotone functions , 1938 .

[4]  J. Harrison,et al.  On the distribution of m ultidimen-sional reflected Brownian motion , 1981 .

[5]  M. J. D. Powell,et al.  Radial basis function methods for interpolation to functions of many variables , 2001, HERCMA.

[6]  Martin Burger,et al.  Flow characteristics in a crowded transport model , 2015, 1502.02715.

[7]  M. Reiß,et al.  Wasserstein and total variation distance between marginals of L\'evy processes , 2017, 1710.02715.

[8]  Mehryar Mohri,et al.  Two-Stage Learning Kernel Algorithms , 2010, ICML.

[9]  Grant M. Rotskoff,et al.  Neural Networks as Interacting Particle Systems: Asymptotic Convexity of the Loss Landscape and Universal Scaling of the Approximation Error , 2018, ArXiv.

[10]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[11]  Guangliang Chen,et al.  Simple, fast and accurate hyper-parameter tuning in Gaussian-kernel SVM , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[12]  I. V. Girsanov On Transforming a Certain Class of Stochastic Processes by Absolutely Continuous Substitution of Measures , 1960 .

[13]  H. Kunita,et al.  On Square Integrable Martingales , 1967, Nagoya Mathematical Journal.

[14]  Asuman E. Ozdaglar,et al.  Approximate Primal Solutions and Rate Analysis for Dual Subgradient Methods , 2008, SIAM J. Optim..

[15]  E. Cheney Introduction to approximation theory , 1966 .

[16]  F. Otto THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[17]  Adel Javanmard,et al.  Analysis of a Two-Layer Neural Network via Displacement Convexity , 2019, The Annals of Statistics.

[18]  D. Gilat The best bound in the inequality of Hardy and Littlewood and its martingale counterpart , 1986 .

[19]  Martin Burger,et al.  On Fokker-Planck equations with In- and Outflow of Mass , 2018, Kinetic & Related Models.

[20]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[21]  Shahin Shahrampour,et al.  A Mean-Field Theory for Kernel Alignment with Random Features in Generative and Discriminative Models , 2019 .

[22]  D. Kinderlehrer,et al.  THE VARIATIONAL FORMULATION OF THE FOKKER-PLANCK EQUATION , 1996 .

[23]  Yue M. Lu,et al.  Scaling Limit: Exact and Tractable Analysis of Online Learning Algorithms with Applications to Regularized Regression and PCA , 2017, ArXiv.

[24]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[25]  Andrea Montanari,et al.  Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit , 2019, COLT.

[26]  A. A. Novikov On moment inequalities and identities for stochastic integrals , 1973 .

[27]  B. Efron,et al.  The Jackknife Estimate of Variance , 1981 .

[28]  Nello Cristianini,et al.  Learning the Kernel Matrix with Semidefinite Programming , 2002, J. Mach. Learn. Res..

[29]  Yinyu Ye,et al.  Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems , 2010, Oper. Res..

[30]  C. Villani Optimal Transport: Old and New , 2008 .

[31]  Michael I. Jordan,et al.  Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[32]  John C. Duchi,et al.  Learning Kernels with Random Features , 2016, NIPS.

[33]  Duduchava Roland On Poincaré, Friedrichs and Korns inequalities on domains and hypersurfaces , 2015, 1504.01677.

[34]  A. Kleywegt,et al.  Distributionally Robust Stochastic Optimization with Wasserstein Distance , 2016, Math. Oper. Res..

[35]  Konstantinos Spiliopoulos,et al.  Mean Field Analysis of Neural Networks: A Law of Large Numbers , 2018, SIAM J. Appl. Math..

[36]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[37]  Na Li,et al.  Stochastic Primal-Dual Method on Riemannian Manifolds of Bounded Sectional Curvature , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).