Lightweight Data Fusion with Conjugate Mappings

We present an approach to data fusion that combines the interpretability of structured probabilistic graphical models with the flexibility of neural networks. The proposed method, lightweight data fusion (LDF), emphasizes posterior analysis over latent variables using two types of information: primary data, which are well-characterized but with limited availability, and auxiliary data, readily available but lacking a well-characterized statistical relationship to the latent quantity of interest. The lack of a forward model for the auxiliary data precludes the use of standard data fusion approaches, while the inability to acquire latent variable observations severely limits direct application of most supervised learning methods. LDF addresses these issues by utilizing neural networks as conjugate mappings of the auxiliary data: nonlinear transformations into sufficient statistics with respect to the latent variables. This facilitates efficient inference by preserving the conjugacy properties of the primary data and leads to compact representations of the latent variable posterior distributions. We demonstrate the LDF methodology on two challenging inference problems: (1) learning electrification rates in Rwanda from satellite imagery, high-level grid infrastructure, and other sources; and (2) inferring county-level homicide rates in the USA by integrating socio-economic data using a mixture model of multiple conjugate mappings.

[1]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[2]  David Hemenway,et al.  Firearm availability and homicide: A review of the literature , 2004 .

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Minh-Ngoc Tran,et al.  Bayesian Deep Net GLM and GLMM , 2018, Journal of Computational and Graphical Statistics.

[5]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[6]  Ben S. Meeker The Challenge of Crime in a Free Society , 1968 .

[7]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[8]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[9]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  John W. Fisher,et al.  Semantically-Aware Aerial Reconstruction from Multi-modal Data , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Trevor Darrell,et al.  Learning Joint Statistical Models for Audio-Visual Fusion and Segregation , 2000, NIPS.

[14]  M. Large,et al.  Rates of homicide during the first episode of psychosis and after treatment: a systematic review and meta-analysis. , 2010, Schizophrenia bulletin.

[15]  Jacob Kaplan Uniform Crime Reporting (UCR) Program Data: County-Level Detailed Arrest and Offense Data , 2019 .

[16]  Xiao-Li Meng,et al.  POSTERIOR PREDICTIVE ASSESSMENT OF MODEL FITNESS VIA REALIZED DISCREPANCIES , 1996 .

[17]  David M. Blei,et al.  Exponential Family Embeddings , 2016, NIPS.

[18]  C. Elvidge,et al.  Mapping City Lights With Nighttime Data from the DMSP Operational Linescan System , 1997 .

[19]  K. Williams,et al.  Economic sources of homicide: reestimating the effects of poverty and inequality. , 1984, American sociological review.

[20]  D. Osgood Poisson-Based Regression Analysis of Aggregate Crime Rates , 2000 .

[21]  Geoffrey E. Hinton,et al.  Bayesian Learning for Neural Networks , 1995 .

[22]  C. Elvidge,et al.  Night-time lights of the world: 1994–1995 , 2001 .

[23]  Sang Michael Xie,et al.  Combining satellite imagery and machine learning to predict poverty , 2016, Science.

[24]  I JordanMichael,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008 .

[25]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[26]  Trevor Darrell,et al.  Speaker association with signal-level audiovisual fusion , 2004, IEEE Transactions on Multimedia.

[27]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[28]  Alemayehu Midekisa,et al.  Household electricity access in Africa (2000–2013): Closing information gaps with model-based geostatistics , 2019, PloS one.

[29]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[30]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[31]  Brendan J. Frey,et al.  Variational Learning in Nonlinear Gaussian Belief Networks , 1999, Neural Computation.

[32]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[33]  Trevor Darrell,et al.  A multi-modal approach for determining speaker location and focus , 2003, ICMI '03.

[34]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[36]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[37]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[40]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[41]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Trevor Darrell,et al.  Learning cross-modal appearance models with application to tracking , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[44]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[45]  N. Metropolis,et al.  Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[46]  Martin J. Wainwright,et al.  Distributed fusion in sensor networks: a graphical models perspective , 2006 .

[47]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[48]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[49]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.

[50]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[51]  David M. Blei,et al.  Deep Exponential Families , 2014, AISTATS.

[52]  P. Diaconis,et al.  Conjugate Priors for Exponential Families , 1979 .

[53]  Martin J. Wainwright,et al.  Statistical and Information-Theoretic Methods for Self-Organization and Fusion of Multimodal, Networked Sensors , 2002, Int. J. High Perform. Comput. Appl..

[54]  Bo Zhang,et al.  Graphical Generative Adversarial Networks , 2018, NeurIPS.

[55]  J. Lawless Negative binomial and mixed Poisson regression , 1987 .