Flexible Bayesian Nonlinear Model Configuration

Regression models are used in a wide range of applications providing a powerful scientific tool for researchers from different fields. Linear, or simple parametric, models are often not sufficient to describe complex relationships between input variables and a response. Such relationships can be better described through  flexible approaches such as neural networks, but this results in less interpretable models and potential overfitting. Alternatively, specific parametric nonlinear functions can be used, but the specification of such functions is in general complicated. In this paper, we introduce a  flexible approach for the construction and selection of highly  flexible nonlinear parametric regression models. Nonlinear features are generated hierarchically, similarly to deep learning, but have additional  flexibility on the possible types of features to be considered. This  flexibility, combined with variable selection, allows us to find a small set of important features and thereby more interpretable models. Within the space of possible functions, a Bayesian approach, introducing priors for functions based on their complexity, is considered. A genetically modi ed mode jumping Markov chain Monte Carlo algorithm is adopted to perform Bayesian inference and estimate posterior probabilities for model averaging. In various applications, we illustrate how our approach is used to obtain meaningful nonlinear models. Additionally, we compare its predictive performance with several machine learning algorithms.  

[1]  Kjersti Aas,et al.  Explaining individual predictions when features are dependent: More accurate approximations to Shapley values , 2019, Artif. Intell..

[2]  G. Storvik,et al.  Estimating the marginal likelihood with Integrated nested Laplace approximation (INLA) , 2016, 1611.01450.

[3]  R. Kohn,et al.  Speeding Up MCMC by Efficient Data Subsampling , 2014, Journal of the American Statistical Association.

[4]  Katja Ickstadt,et al.  Comparing Logic Regression Based Methods for Identifying SNP Interactions , 2007, BIRD.

[5]  Achilleas Zapranis,et al.  Stock performance modeling using neural networks: A comparative study with regression models , 1994, Neural Networks.

[6]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[7]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[8]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[9]  N. Pillai,et al.  Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets , 2014, 1405.0182.

[10]  Santi Cassisi,et al.  Evolution of Stars and Stellar Populations , 2005, Galactic Astronomy.

[11]  Yarin Gal,et al.  Uncertainty in Deep Learning , 2016 .

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[14]  Robert Kohn,et al.  Speeding up MCMC by Delayed Acceptance and Data Subsampling , 2015, 1507.06110.

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[20]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[21]  Michael L. Littman,et al.  Bayesian Adaptive Sampling for Variable Selection and Model Averaging , 2011 .

[22]  H. Rue,et al.  Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations , 2009 .

[23]  Fred Collopy,et al.  How effective are neural networks at forecasting and prediction? A review and evaluation , 1998 .

[24]  Charles M. Bishop,et al.  Ensemble learning in Bayesian neural networks , 1998 .

[25]  John R. Koza,et al.  Genetic programming as a means for programming computers by natural selection , 1994 .

[26]  Sg Waugh,et al.  Extending and benchmarking Cascade-Correlation : extensions to the Cascade-Correlation architecture and benchmarking of feed-forward supervised artificial neural networks , 1995 .

[27]  Nial Friel,et al.  Estimating the evidence – a review , 2011, 1111.1957.

[28]  D. Sargent,et al.  Comparison of artificial neural networks with other statistical approaches , 2001, Cancer.

[29]  S. M. Samuels On the Number of Successes in Independent Trials , 1965 .

[30]  M. LeBlanc,et al.  Logic Regression , 2003 .

[31]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[32]  Aliaksandr Hubin,et al.  Mode jumping MCMC for Bayesian variable selection in GLMM , 2016, Comput. Stat. Data Anal..

[33]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[34]  Aliaksandr Hubin,et al.  A Novel Algorithmic Approach to Bayesian Logic Regression (with Discussion) , 2017, Bayesian Analysis.

[35]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[36]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[37]  James M. Flegal Applicability of Subsampling Bootstrap Methods in Markov Chain Monte Carlo , 2012 .

[38]  H. Tjelmeland,et al.  Mode Jumping Proposals in MCMC , 2001 .

[39]  Robert Kohn,et al.  Exact Subsampling MCMC , 2016 .

[40]  Kuriakose Athappilly,et al.  A comparative predictive analysis of neural networks (NNs), nonlinear regression and classification and regression tree (CART) models , 2005, Expert Syst. Appl..

[41]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[42]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[43]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[44]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[45]  M. Stephens,et al.  Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies , 2012 .

[46]  Merlise A. Clyde,et al.  Mixtures of g-Priors in Generalized Linear Models , 2015, Journal of the American Statistical Association.

[47]  J. Ghosh,et al.  A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing , 2008, 0805.2479.

[48]  Douglas G. Altman,et al.  Approximating statistical functions by using fractional polynomial regression , 1997 .

[49]  Christopher M. Bishop,et al.  Bayesian Neural Networks , 1997, J. Braz. Comput. Soc..

[50]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[51]  Karsten M. Borgwardt,et al.  Spontaneous epigenetic variation in the Arabidopsis thaliana methylome , 2011, Nature.

[52]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[53]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[54]  Yann LeCun,et al.  Transforming Neural-Net Output Levels to Probability Distributions , 1990, NIPS.

[55]  Florian Frommlet,et al.  Identifying Important Predictors in Large Data Bases − Multiple Testing and Model Selection , 2020, Handbook of Multiple Comparisons.

[56]  G. Kuiper The Empirical Mass-Luminosity Relation. , 1938 .

[57]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[58]  D. Mackay,et al.  Bayesian neural networks and density networks , 1995 .

[59]  Alois Potton Spam , 2003, Prax. Inf.verarb. Kommun..

[60]  Radford M. Neal Bayesian training of backpropagation networks by the hybrid Monte-Carlo method , 1992 .

[61]  David Firth,et al.  Generalized nonlinear models in R: An overview of the gnm package , 2007 .