ALGORITHMS FOR APPROXIMATE BAYESIAN INFERENCE WITH APPLICATIONS TO ASTRONOMICAL DATA ANALYSIS

Bayesian inference is a theoretically well-founded and conceptually simple approach to data analysis. The computations in practical problems are anything but simple though, and thus approximations are almost always a necessity. The topic of this thesis is approximate Bayesian inference and its applications in three intertwined problem domains. Variational Bayesian learning is one type of approximate inference. Its main advantage is its computational efficiency compared to the much applied sampling based methods. Its main disadvantage, on the other hand, is the large amount of analytical work required to derive the necessary components for the algorithm. One part of this thesis reports on an effort to automate variational Bayesian learning of a certain class of models. The second part of the thesis is concerned with heteroscedastic modelling which is synonymous to variance modelling. Heteroscedastic models are particularly suitable for the Bayesian treatment as many of the traditional estimation methods do not produce satisfactory results for them. In the thesis, variance models and algorithms for estimating them are studied in two different contexts: in source separation and in regression. Astronomical applications constitute the third part of the thesis. Two problems are posed. One is concerned with the separation of stellar subpopulation spectra from observed galaxy spectra; the other is concerned with estimating the time-delays in gravitational lensing. Solutions to both of these problems are presented, which heavily rely on the machinery of approximate inference.

[1]  Charles M. Bishop,et al.  Variational Message Passing , 2005, J. Mach. Learn. Res..

[2]  Wolfram Burgard,et al.  Most likely heteroscedastic Gaussian process regression , 2007, ICML '07.

[3]  P. N. Wilkinson,et al.  Time delay for the gravitational lens system B0218+357 , 1998, astro-ph/9811282.

[4]  D. Long,et al.  A Robust Determination of the Time Delay in 0957+561A, B and a Measurement of the Global Value of Hubble's Constant , 1996, astro-ph/9610162.

[5]  Geoffrey E. Hinton,et al.  A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[6]  Phil Gregory Bayesian Logical Data Analysis for the Physical Sciences: References , 2005 .

[7]  R. Engle Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation , 1982 .

[8]  Keming Yu,et al.  Quantile regression: applications and current research areas , 2003 .

[9]  Gareth J. Janacek,et al.  Predictive Uncertainty in Environmental Modelling , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  R. Baierlein Probability Theory: The Logic of Science , 2004 .

[12]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[13]  Antti Honkela,et al.  Variational learning and bits-back coding: an information-theoretic view to Bayesian learning , 2004, IEEE Transactions on Neural Networks.

[14]  A. Kabán,et al.  Young stellar populations in early-type galaxies in the Sloan Digital Sky Survey , 2006, astro-ph/0608623.

[15]  Nikunj C. Oza,et al.  Online Ensemble Learning , 2000, AAAI/IAAI.

[16]  Alexander Ilin,et al.  On the Effect of the Form of the Posterior Approximation in Variational Learning of ICA Models , 2005, Neural Processing Letters.

[17]  Michael R. Lyu,et al.  Nonnegative independent component analysis based on minimizing mutual information technique , 2006, Neurocomputing.

[18]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[19]  Aapo Hyvärinen,et al.  Blind separation of sources that have spatiotemporal variance dependencies , 2004, Signal Process..

[20]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[21]  Mark A. Girolami,et al.  A Variational Method for Learning Sparse and Overcomplete Representations , 2001, Neural Computation.

[22]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[23]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[24]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[25]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[26]  N. Shephard,et al.  Stochastic Volatility: Likelihood Inference And Comparison With Arch Models , 1996 .

[27]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[28]  P. M. Williams,et al.  Using Neural Networks to Model Conditional Multivariate Densities , 1996, Neural Computation.

[29]  Dale J. Poirier,et al.  The Growth of Bayesian Methods in Statistics and Economics Since 1970 , 2006 .

[30]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[31]  R Edelson,et al.  The Discrete Correlation Function: a New Method for Analyzing Unevenly Sampled Variability Data , 1988 .

[32]  S. Refsdal On the possibility of determining Hubble's parameter and the masses of galaxies from the gravitational lens effect , 1964 .

[33]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[34]  Andreas S. Weigend,et al.  Predictions with Confidence Intervals ( Local Error Bars ) , 1994 .

[35]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[36]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.

[37]  T. Bollerslev,et al.  Generalized autoregressive conditional heteroskedasticity , 1986 .

[38]  Jeffrey D. Scargle Bayesian Estimation of Time Series Lags and Structure , 2001 .

[39]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[40]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[41]  Michael I. Jordan,et al.  Regression with input-dependent noise: A Gaussian process treatment , 1998 .

[42]  Michael S. Lewicki,et al.  A Hierarchical Bayesian Model for Learning Nonlinear Statistical Regularities in Nonstationary Natural Signals , 2005, Neural Computation.

[43]  Erkki Oja,et al.  Blind Separation of Positive Sources by Globally Convergent Gradient Search , 2004, Neural Computation.

[44]  Erkki Oja,et al.  Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings , 1997, NIPS.

[45]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[46]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[47]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[48]  K. Gebhardt,et al.  The Quadruple Gravitational Lens PG 1115+080: Time Delays and Models , 1996, astro-ph/9611051.

[49]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[50]  David Barber,et al.  Ensemble Learning for Multi-Layer Networks , 1997, NIPS.

[51]  Antti Honkela,et al.  Unsupervised Variational Bayesian Learning of Nonlinear Models , 2004, NIPS.

[52]  Dinh-Tuan Pham,et al.  Blind separation of instantaneous mixtures of nonstationary sources , 2001, IEEE Trans. Signal Process..

[53]  T. Loredo From Laplace to Supernova SN 1987A: Bayesian Inference in Astrophysics , 1990 .

[54]  Harri Lappalainen,et al.  Ensemble learning for independent component analysis , 1999 .

[55]  Aapo Hyvärinen,et al.  Emergence of Phase- and Shift-Invariant Features by Decomposition of Natural Images into Independent Feature Subspaces , 2000, Neural Computation.

[56]  Yen-Wei Chen,et al.  Ensemble learning for independent component analysis , 2006, Pattern Recognit..

[57]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[58]  Lucas C. Parra,et al.  Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals , 2000, NIPS.

[59]  Radford M. Neal,et al.  Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation , 1995, Learning in Graphical Models.

[60]  Bernard F. Burke,et al.  The radio time delay in the double quasar 0957 + 561 , 1992 .

[61]  R. Sherman Lehman,et al.  On confirmation and rational betting , 1955, Journal of Symbolic Logic.

[62]  Zoubin Ghahramani,et al.  Variable Noise and Dimensionality Reduction for Sparse Gaussian processes , 2006, UAI.

[63]  Rennan Barkana,et al.  Analysis of Time Delays in the Gravitational Lens PG 1115+080 , 1997, astro-ph/9701068.

[64]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[65]  C. Bishop Mixture density networks , 1994 .

[66]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[67]  Aapo Hyvärinen,et al.  Topographic Independent Component Analysis , 2001, Neural Computation.

[68]  Adrian E. Raftery,et al.  Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draper and E. I. George, and a rejoinder by the authors , 1999 .

[69]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[70]  O. Zoeter,et al.  Improved unscented kalman smoothing for stock volatility estimation , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[71]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[72]  Ole Winther,et al.  Expectation Consistent Approximate Inference , 2005, J. Mach. Learn. Res..

[73]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[74]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[75]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[76]  Kaare Brandt Petersen,et al.  Flexible and efficient implementations of Bayesian independent component analysis , 2007, Neurocomputing.

[77]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[78]  J.Pelt,et al.  Time delay controversy on QSO 0957+561 not yet decided , 1994, astro-ph/9401013.

[79]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[80]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[81]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[82]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[83]  J. Carlos,et al.  Estimating time delays between irregularly sampled time series , 2007 .

[84]  Louisa Anne Nolan The star formation history of elliptical galaxies , 2002 .

[85]  Charles M. Bishop Variational principal components , 1999 .

[86]  R. Koenker,et al.  Regression Quantiles , 2007 .

[87]  Peter Tiño,et al.  How accurate are the time delay estimates in gravitational lensing? , 2006, ArXiv.

[88]  G. Parisi,et al.  Statistical Field Theory , 1988 .

[89]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[90]  T. Heskes,et al.  Expectation propagation for approximate inference in dynamic bayesian networks , 2002, UAI 2002.