Latent variable models for neural data analysis

The brain is perhaps the most complex system to have ever been subjected to rigorous scientific investigation. The scale is staggering: over 10^11 neurons, each making an average of 10^3 synapses, with computation occurring on scales ranging from a single dendritic spine, to an entire cortical area. Slowly, we are beginning to acquire experimental tools that can gather the massive amounts of data needed to characterize this system. However, to understand and interpret these data will also require substantial strides in inferential and statistical techniques. This dissertation attempts to meet this need, extending and applying the modern tools of latent variable modeling to problems in neural data analysis. It is divided into two parts. The first begins with an exposition of the general techniques of latent variable modeling. A new, extremely general, optimization algorithm is proposed - called Relaxation Expectation Maximization (REM) - that may be used to learn the optimal parameter values of arbitrary latent variable models. This algorithm appears to alleviate the common problem of convergence to local, sub-optimal, likelihood maxima. REM leads to a natural framework for model size selection; in combination with standard model selection techniques the quality of fits may be further improved, while the appropriate model size is automatically and efficiently determined. Next, a new latent variable model, the mixture of sparse hidden Markov models, is introduced, and approximate inference and learning algorithms are derived for it. This model is applied in the second part of the thesis. The second part brings the technology of part I to bear on two important problems in experimental neuroscience. The first is known as spike sorting; this is the problem of separating the spikes from different neurons embedded within an extracellular recording. The dissertation offers the first thorough statistical analysis of this problem, which then yields the first powerful probabilistic solution. The second problem addressed is that of characterizing the distribution of spike trains recorded from the same neuron under identical experimental conditions. A latent variable model is proposed. Inference and learning in this model leads to new principled algorithms for smoothing and clustering of spike data.

[1]  A. Hodgkin,et al.  A quantitative description of membrane current and its application to conduction and excitation in nerve , 1952, The Journal of physiology.

[2]  D. Cox Some Statistical Methods Connected with Series of Events , 1955 .

[3]  T. W. Anderson ASYMPTOTIC THEORY FOR PRINCIPAL COMPONENT ANALYSIS , 1963 .

[4]  Geoffrey H. Ball,et al.  ISODATA, A NOVEL METHOD OF DATA ANALYSIS AND PATTERN CLASSIFICATION , 1965 .

[5]  J. Powell Mathematical Methods in Physics , 1965 .

[6]  David R. Cox,et al.  The statistical analysis of series of events , 1966 .

[7]  A. Fuchs,et al.  A method for measuring horizontal and vertical eye movement chronically in the monkey. , 1966, Journal of applied physiology.

[8]  David R. Cox,et al.  The statistical analysis of series of events , 1966 .

[9]  G. P. Moore,et al.  Neuronal spike trains and stochastic point processes. I. The single spike train. , 1967, Biophysical journal.

[10]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[11]  D. Robinson,et al.  The electrical properties of metal microelectrodes , 1968 .

[12]  A. Scott,et al.  Clustering methods based on likelihood ratio criteria. , 1971 .

[13]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[14]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[15]  G. J. Tomko,et al.  Neuronal variability: non-stationary responses to identical visual stimuli. , 1974, Brain research.

[16]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[17]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[18]  V. Mountcastle,et al.  Posterior parietal association cortex of the monkey: command functions for operations within extrapersonal space. , 1975, Journal of neurophysiology.

[19]  Daniel K. Hartline,et al.  Separation of multi-unit nerve impulse trains by a multi-channel linear filter algorithm , 1975, Brain Research.

[20]  M. Stone,et al.  Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[21]  M. Abeles,et al.  Multispike train analysis , 1977, Proceedings of the IEEE.

[22]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[23]  E. Backer,et al.  Cluster analysis by optimal decomposition of induced fuzzy sets , 1978 .

[24]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[25]  B. Richmond,et al.  Implantation of magnetic search coils for measurement of eye position: An improved method , 1980, Vision Research.

[26]  S. Karlin,et al.  A second course in stochastic processes , 1981 .

[27]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[28]  J. Movshon,et al.  The statistical reliability of signals in single neurons in cat and monkey visual cortex , 1983, Vision Research.

[29]  Valerie Isham,et al.  Non‐Negative Matrices and Markov Chains , 1983 .

[30]  Bruce L. McNaughton,et al.  The stereotrode: A new technique for simultaneous isolation of several single units in the central nervous system from multiple unit records , 1983, Journal of Neuroscience Methods.

[31]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[32]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  R. Redner,et al.  Mixture densities, maximum likelihood, and the EM algorithm , 1984 .

[34]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[35]  S. Grossberg,et al.  ART 2: self-organization of stable category recognition codes for analog input patterns. , 1987, Applied optics.

[36]  L. Optican,et al.  Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. III. Information theoretic analysis. , 1987, Journal of neurophysiology.

[37]  Richard Durbin,et al.  An analogue approach to the travelling salesman problem using an elastic net method , 1987, Nature.

[38]  B J Richmond,et al.  Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex. II. Quantification of response waveform. , 1987, Journal of neurophysiology.

[39]  Adrian F. M. Smith,et al.  Bayesian computation via the gibbs sampler and related markov chain monte carlo methods (with discus , 1993 .

[40]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[41]  G. Fasano,et al.  A multidimensional version of the Kolmogorov–Smirnov test , 1987 .

[42]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[43]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[44]  B. Moore,et al.  ART1 and pattern clustering , 1989 .

[45]  Richard Szeliski,et al.  An Analysis of the Elastic Net Approach to the Traveling Salesman Problem , 1989, Neural Computation.

[46]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[47]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[48]  Rose,et al.  Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[49]  Alan L. Yuille,et al.  Generalized Deformable Models, Statistical Physics, and Matching Problems , 1990, Neural Computation.

[50]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[51]  Petar D. Simic,et al.  Statistical mechanics as the underlying theory of ‘elastic’ and ‘neural’ optimisations , 1990 .

[52]  Stephen Grossberg,et al.  ART 3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures , 1990, Neural Networks.

[53]  Donald L. Snyder,et al.  Random Point Processes in Time and Space , 1991 .

[54]  Christian Jutten,et al.  Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture , 1991, Signal Process..

[55]  L. Optican,et al.  Lateral geniculate neurons in behaving primates. II. Encoding of visual information in the temporal shape of the response. , 1991, Journal of neurophysiology.

[56]  Steven J. Nowlan,et al.  Soft competitive adaptation: neural network learning algorithms based on fitting statistical mixtures , 1991 .

[57]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[58]  Radford M. Neal Bayesian Mixture Modeling by Monte Carlo Simulation , 1991 .

[59]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[60]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[61]  Shun-ichi Amari,et al.  Learning Curves, Model Selection and Complexity of Neural Networks , 1992, NIPS.

[62]  A. Raftery,et al.  Model-based Gaussian and non-Gaussian clustering , 1993 .

[63]  Geoffrey C. Fox,et al.  Constrained Clustering as an Optimization Method , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Joachim M. Buhmann,et al.  Vector quantization with complexity costs , 1993, IEEE Trans. Inf. Theory.

[65]  E. Vaadia,et al.  Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. , 1993, Journal of neurophysiology.

[66]  J.P. Miller,et al.  Optimal discrimination and classification of neuronal action potential waveforms from multiunit, multichannel recordings using software-based linear filters , 1994, IEEE Transactions on Biomedical Engineering.

[67]  L. Beda Thermal physics , 1994 .

[68]  Alan L. Yuille,et al.  Statistical Physics, Mixtures of Distributions, and the EM Algorithm , 1994, Neural Computation.

[69]  R. Palmer,et al.  Introduction to the theory of neural computation , 1994, The advanced book program.

[70]  Michael S. Lewicki,et al.  Bayesian Modeling and Classification of Neural Signals , 1993, Neural Computation.

[71]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[72]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[73]  B. Sakmann,et al.  Active propagation of somatic action potentials into neocortical pyramidal cell dendrites , 1994, Nature.

[74]  William S. Rhode,et al.  A neural network-based spike discriminator , 1994, Journal of Neuroscience Methods.

[75]  B. McNaughton,et al.  Tetrodes markedly improve the reliability and yield of multiple single-unit isolation from multi-unit recordings in cat striate cortex , 1995, Journal of Neuroscience Methods.

[76]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[77]  N. Spruston,et al.  Activity-dependent action potential invasion and calcium influx into hippocampal CA1 dendrites. , 1995, Science.

[78]  Geoffrey E. Hinton,et al.  The Helmholtz Machine , 1995, Neural Computation.

[79]  Geoffrey E. Hinton,et al.  The "wake-sleep" algorithm for unsupervised neural networks. , 1995, Science.

[80]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[81]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[82]  D. Kleinfeld,et al.  Variability of extracellular spike waveforms of cortical neurons. , 1996, Journal of neurophysiology.

[83]  C. J.,et al.  Maximum Likelihood and Covariant Algorithms for Independent Component Analysis , 1996 .

[84]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[85]  David Heckerman,et al.  Asymptotic Model Selection for Directed Networks with Hidden Variables , 1996, UAI.

[86]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[87]  Michael I. Jordan,et al.  On Convergence Properties of the EM Algorithm for Gaussian Mixtures , 1996, Neural Computation.

[88]  E. Seidemann,et al.  Simultaneously recorded single units in the frontal cortex go through sequences of discrete and stable states in monkeys performing a delayed localization task , 1996, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[89]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[90]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[91]  Kenneth Rose,et al.  A global optimization technique for statistical classifier design , 1996, IEEE Trans. Signal Process..

[92]  Partha P. Mitra,et al.  Automatic sorting of multiple unit neuronal signals in the presence of anisotropic and non-Gaussian variability , 1996, Journal of Neuroscience Methods.

[93]  A. Grinvald,et al.  Dynamics of Ongoing Activity: Explanation of the Large Variability in Evoked Cortical Responses , 1996, Science.

[94]  Yair Weiss,et al.  Phase Transitions and the Perceptual Organization of Video Sequences , 1997, NIPS.

[95]  Jonathan D. Victor,et al.  Metric-space analysis of spike trains: theory, algorithms and application , 1998, q-bio/0309031.

[96]  P. Tavan,et al.  Deterministic annealing for density estimation by multivariate normal mixtures , 1997 .

[97]  Maneesh Sahani,et al.  Tetrodes for monkeys , 1997 .

[98]  B. Sakmann,et al.  Action potential initiation and propagation in rat neocortical pyramidal neurons , 1997, The Journal of physiology.

[99]  Naftali Tishby,et al.  Hidden Markov modelling of simultaneously recorded cells in the associative cortex of behaving monkeys , 1997 .

[100]  Sam T. Roweis,et al.  EM Algorithms for PCA and SPCA , 1997, NIPS.

[101]  Kenneth Rose,et al.  Mixture of experts regression modeling by deterministic annealing , 1997, IEEE Trans. Signal Process..

[102]  D. Kleinfeld,et al.  In vivo dendritic calcium dynamics in neocortical pyramidal neurons , 1997, Nature.

[103]  C. Kittel,et al.  Thermal Physics, 2nd ed. , 1998 .

[104]  G. Buzsáki,et al.  Somadendritic backpropagation of action potentials in cortical pyramidal cells of the awake rat. , 1998, Journal of neurophysiology.

[105]  Naftali Tishby,et al.  Multi-Electrode Spike Sorting by Clustering Transfer Functions , 1998, NIPS.

[106]  J. Alonso,et al.  Functional connectivity between simple cells and complex cells in cat striate cortex , 1998, Nature Neuroscience.

[107]  K. Rose Deterministic annealing for clustering, compression, classification, regression, and related optimization problems , 1998, Proc. IEEE.

[108]  R. Reid,et al.  Paired-spike interactions and synaptic efficacy of retinal inputs to the thalamus , 1998, Nature.

[109]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[110]  W. Newsome,et al.  The Variable Discharge of Cortical Neurons: Implications for Connectivity, Computation, and Information Coding , 1998, The Journal of Neuroscience.

[111]  K. Miller,et al.  Analysis of tetrode recordings in cat visual system , 1998 .

[112]  B. Richmond,et al.  Coding strategies in monkey V1 and inferior temporal cortices. , 1998, Journal of neurophysiology.

[113]  Xavier Boyen,et al.  Approximate Learning of Dynamic Models , 1998, NIPS.

[114]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[115]  Michael I. Jordan Graphical Models , 2003 .

[116]  Hagai Attias,et al.  Blind Source Separation and Deconvolution: The Dynamic Component Analysis Algorithm , 1998, Neural Computation.

[117]  M. Oram,et al.  Accurately predicting precisely replicating spike patterns in neural responses of monkey striate cortex and LGN , 1998 .

[118]  C. Brody Slow covariations in neuronal resting potentials can lead to artefactually fast cross-correlations in their spike trains. , 1998, Journal of neurophysiology.

[119]  Hagai Attias,et al.  Independent Factor Analysis , 1999, Neural Computation.

[120]  Maneesh Sahani,et al.  Simultaneous paired intracellular and tetrode recordings for evaluating the performance of spike sorting algorithms , 1999, Neurocomputing.

[121]  Kenneth Rose,et al.  A Deterministic Annealing Approach for Parsimonious Design of Piecewise Regression Models , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[122]  R. Andersen,et al.  Responses to auditory stimuli in macaque lateral intraparietal area. I. Effects of training. , 1999, Journal of neurophysiology.

[123]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[124]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.