BaKer-Nets: Bayesian Random Kernel Mapping Networks

Recently, deep spectral kernel networks (DSKNs) have attracted wide attention. They consist of periodic computational elements that can be activated across the whole feature spaces. In theory, DSKNs have the potential to reveal input-dependent and long-range characteristics, and thus are expected to perform more competitive than prevailing networks. But in practice, they are still unable to achieve the desired effects. The structural superiority of DSKNs comes at the cost of the difficult optimization. The periodicity of computational elements leads to many poor and dense local minima in loss landscapes. DSKNs are more likely stuck in these local minima, and perform worse than expected. Hence, in this paper, we propose the novel Bayesian random Kernel mapping Networks (BaKer-Nets) with preferable learning processes by escaping randomly from most local minima. Specifically, BaKer-Nets consist of two core components: 1) a prior-posterior bridge is derived to enable the uncertainty of computational elements reasonably; 2) a Bayesian learning paradigm is presented to optimize the prior-posterior bridge efficiently. With the well-tuned uncertainty, BaKerNets can not only explore more potential solutions to avoid local minima, but also exploit these ensemble solutions to strengthen their robustness. Systematical experiments demonstrate the significance of BaKer-Nets in improving learning processes on the premise of preserving the structural superiority.

[1]  Arthur Charpentier,et al.  Tails of multivariate Archimedean copulas , 2009, J. Multivar. Anal..

[2]  B. Swart,et al.  Quantitative Finance , 2006, Metals and Energy Finance.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  S. Crawford,et al.  Volume 1 , 2012, Journal of Diabetes Investigation.

[5]  David X. Li On Default Correlation: A Copula Function Approach , 1999 .

[6]  Andrew Gordon Wilson,et al.  Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP) , 2015, ICML.

[7]  G. Casella,et al.  Rao-Blackwellisation of sampling schemes , 1996 .

[8]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[9]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[10]  A. Frigessi,et al.  Pair-copula constructions of multiple dependence , 2009 .

[11]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[12]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[13]  Andrew Gordon Wilson,et al.  Stochastic Variational Deep Kernel Learning , 2016, NIPS.

[14]  David M. Blei,et al.  Smoothed Gradients for Stochastic Variational Inference , 2014, NIPS.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Guodong Zhang,et al.  Differentiable Compositional Kernel Learning for Gaussian Processes , 2018, ICML.

[17]  Hui Xue,et al.  Deep Spectral Kernel Learning , 2019, IJCAI.

[18]  Bin Yu,et al.  Artificial intelligence and statistics , 2018, Frontiers of Information Technology & Electronic Engineering.

[19]  K. Pearson Biometrika , 1902, The American Naturalist.

[20]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[21]  Nicolas Le Roux,et al.  The Curse of Highly Variable Functions for Local Kernel Machines , 2005, NIPS.

[22]  Jianxin Li,et al.  Stacked Kernel Network , 2017, ArXiv.

[23]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[24]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[27]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[28]  Sheldon M. Ross,et al.  A NEW SIMULATION ESTIMATOR OF SYSTEM RELIABILITY , 1994 .