Embedded Restricted Boltzmann Machines for fusion of mixed data types and applications in social measurements analysis

Analysis and fusion of social measurements is important to understand what shapes the public's opinion and the sustainability of the global development. However, modeling data collected from social responses is challenging as the data is typically complex and heterogeneous, which might take the form of stated facts, subjective assessment, choices, preferences or any combination thereof. Model-wise, these responses are a mixture of data types including binary, categorical, multicategorical, continuous, ordinal, count and rank data. The challenge is therefore to effectively handle mixed data in the a unified fusion framework in order to perform inference and analysis. To that end, this paper introduces eRBM (Embedded Restricted Boltzmann Machine) - a probabilistic latent variable model that can represent mixed data using a layer of hidden variables transparent across different types of data. The proposed model can comfortably support large-scale data analysis tasks, including distribution modelling, data completion, prediction and visualisation. We demonstrate these versatile features on several moderate and large-scale publicly available social survey datasets.

[1]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[2]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[3]  Patricia A. Berglund,et al.  Applied Survey Data Analysis , 2010 .

[4]  A. D. de Leon,et al.  Classification with discrete and continuous variables via general mixed-data models , 2011 .

[5]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[6]  M. Opper,et al.  Advanced mean field methods: theory and practice , 2001 .

[7]  N M Laird,et al.  Regression models for mixed discrete and continuous responses with potentially missing values. , 1997, Biometrics.

[8]  Ingram Olkin,et al.  Multivariate Correlation Models with Mixed Discrete and Continuous Variables , 1961 .

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Svetha Venkatesh,et al.  Ordinal Boltzmann Machines for Collaborative Filtering , 2009, UAI.

[11]  Michael I. Jordan Graphical Models , 1998 .

[12]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[13]  Paul Smolensky,et al.  Information processing in dynamical systems: foundations of harmony theory , 1986 .

[14]  Svetha Venkatesh,et al.  Mixed-Variate Restricted Boltzmann Machines , 2014, ACML.

[15]  Jared S. Murray,et al.  Bayesian Gaussian Copula Factor Models for Mixed Data , 2011, Journal of the American Statistical Association.

[16]  L. Ryan,et al.  Latent Variable Models for Mixed Discrete and Continuous Outcomes , 1997 .

[17]  D. Dunson,et al.  Bayesian latent variable models for mixed discrete outcomes. , 2005, Biostatistics.

[18]  R. Little,et al.  Maximum likelihood estimation for mixed continuous and categorical data with missing values , 1985 .

[19]  L. Younes Parametric Inference for imperfectly observed Gibbsian fields , 1989 .

[20]  P. McCullagh Regression Models for Ordinal Data , 1980 .