Fast inference in generalized linear models via expected log-likelihoods

Generalized linear models play an essential role in a wide variety of statistical applications. This paper discusses an approximation of the likelihood in these models that can greatly facilitate computation. The basic idea is to replace a sum that appears in the exact log-likelihood by an expectation over the model covariates; the resulting “expected log-likelihood” can in many cases be computed significantly faster than the exact log-likelihood. In many neuroscience experiments the distribution over model covariates is controlled by the experimenter and the expected log-likelihood approximation becomes particularly useful; for example, estimators based on maximizing this expected log-likelihood (or a penalized version thereof) can often be obtained with orders of magnitude computational savings compared to the exact maximum likelihood estimators. A risk analysis establishes that these maximum EL estimators often come with little cost in accuracy (and in some cases even improved accuracy) compared to standard maximum likelihood estimates. Finally, we find that these methods can significantly decrease the computation time of marginal likelihood calculations for model selection and of Markov chain Monte Carlo methods for sampling from the posterior parameter distribution. We illustrate our results by applying these methods to a computationally-challenging dataset of neural spike trains obtained via large-scale multi-electrode recordings in the primate retina.

[1]  L. Wasserman,et al.  Computing Bayes Factors by Combining Simulation and Asymptotic Approximations , 1997 .

[2]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[3]  Robert J. Butera,et al.  Sequential Optimal Design of Neurophysiology Experiments , 2009, Neural Computation.

[4]  Christopher K. I. Williams,et al.  Understanding Gaussian Process Regression Using the Equivalent Kernel , 2004, Deterministic and Statistical Methods in Machine Learning.

[5]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[6]  R. Yuste,et al.  Attractor dynamics of network UP states in the neocortex , 2003, Nature.

[7]  Timothy A. Machado,et al.  Functional connectivity in the retina at the resolution of photoreceptors , 2010, Nature.

[8]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[9]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[10]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[11]  G D Field,et al.  Monte Carlo Methods for Localization of Cones given Multielectrode Retinal Ganglion Cell Recordings , 2012 .

[12]  Paul Sajda,et al.  Fast, Exact Model Selection and Permutation Testing for l2-Regularized Logistic Regression , 2012, AISTATS.

[13]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[14]  M. Larkum,et al.  Frontiers in Neural Circuits Neural Circuits Methods Article , 2022 .

[15]  Eero P. Simoncelli,et al.  Modeling the Impact of Common Noise Inputs on the Network Activity of Retinal Ganglion Cells Action Editor: Brent Doiron , 2022 .

[16]  J. Donoghue,et al.  Collective dynamics in human and monkey sensorimotor cortex: predicting single neuron spikes , 2009, Nature Neuroscience.

[17]  Carolyn Pillers Dobler,et al.  Mathematical Statistics , 2002 .

[18]  Tong Zhang,et al.  Adaptive Forward-Backward Greedy Algorithm for Learning Sparse Representations , 2011, IEEE Transactions on Information Theory.

[19]  Pierre Priouret,et al.  Adaptive Algorithms and Stochastic Approximations , 1990, Applications of Mathematics.

[20]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[21]  Sooyoung Chung,et al.  Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex , 2005, Nature.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  V. Marčenko,et al.  DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .

[24]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  John Shawe-Taylor,et al.  Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a meeting held 12-14 December 2011, Granada, Spain , 2011, NIPS.

[27]  Sarah M. N. Woolley,et al.  A Generalized Linear Model for Estimating Spectrotemporal Receptive Fields from Responses to Natural Sounds , 2011, PloS one.

[28]  H. Jörntell,et al.  Presynaptic Calcium Signalling in Cerebellar Mossy Fibres , 2009, Front. Neural Circuits.

[29]  B. Silverman,et al.  Spline Smoothing: The Equivalent Variable Kernel Method , 1984 .

[30]  John P. Donoghue,et al.  Connecting cortex to machines: recent advances in brain interfaces , 2002, Nature Neuroscience.

[31]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[32]  Jonathon Shlens,et al.  The Structure of Multi-Neuron Firing Patterns in Primate Retina , 2006, The Journal of Neuroscience.

[33]  A. U.S.,et al.  Hierarchical Models for Assessing Variability among Functions , 2005 .

[34]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[35]  Konrad P Kording,et al.  How advances in neural recording affect data analysis , 2011, Nature Neuroscience.

[36]  R. Kass,et al.  Multiple neural spike train data analysis: state-of-the-art and future challenges , 2004, Nature Neuroscience.

[37]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[38]  Gene H. Golub,et al.  Matrix computations , 1983 .

[39]  L. Paninski Maximum likelihood estimation of cascade point-process neural encoding models , 2004, Network.

[40]  Byron M. Yu,et al.  A high-performance brain–computer interface , 2006, Nature.

[41]  Kamiar Rahnama Rad,et al.  Information Rates and Optimal Decoding in Large Neural Populations , 2011, NIPS.

[42]  Wei Wu,et al.  A new look at state-space models for neural data , 2010, Journal of Computational Neuroscience.

[43]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[44]  Eero P. Simoncelli,et al.  Spatio-temporal correlations and visual signalling in a complete neuronal population , 2008, Nature.

[45]  Il Memming Park,et al.  Bayesian Spike-Triggered Covariance Analysis , 2011, NIPS.

[46]  D. Brillinger Maximum likelihood analysis of spike trains of interacting nerve cells , 2004, Biological Cybernetics.

[47]  S. David,et al.  Estimating sparse spectro-temporal receptive fields with natural stimuli , 2007, Network.

[48]  P. McCullagh,et al.  Generalized Linear Models, 2nd Edn. , 1990 .

[49]  D. Freedman,et al.  Asymptotics of Graphical Projection Pursuit , 1984 .

[50]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[51]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[52]  Juliet Popper Shaffer,et al.  The Gauss—Markov Theorem and Random Regressors , 1991 .

[53]  Eero P. Simoncelli,et al.  To appear in: The New Cognitive Neurosciences, 3rd edition Editor: M. Gazzaniga. MIT Press, 2004. Characterization of Neural Responses with Stochastic Stimuli , 2022 .

[54]  Liam Paninski,et al.  Statistical models for neural encoding, decoding, and optimal stimulus design. , 2007, Progress in brain research.

[55]  P. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics , 1977 .

[56]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[57]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[58]  Liam Paninski,et al.  Efficient methods for sampling spike trains in networks of coupled neurons , 2011, 1111.7098.

[59]  Peter J. Bickel,et al.  Mathematical Statistics: Basic Ideas and Selected Topics, Volume II , 2015 .

[60]  Uri T Eden,et al.  A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. , 2005, Journal of neurophysiology.

[61]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[62]  Levi Boyles,et al.  Statistical Tests for Optimization Efficiency , 2011, NIPS.

[63]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[64]  J. Shewchuk An Introduction to the Conjugate Gradient Method Without the Agonizing Pain , 1994 .

[65]  Tom Minka,et al.  A family of algorithms for approximate Bayesian inference , 2001 .