Statistical Latent Space Approach for Mixed Data Modelling and Applications

The analysis of mixed data has been raising challenges in statistics and machine learning. One of two most prominent challenges is to develop new statistical techniques and methodologies to effectively handle mixed data by making the data less heterogeneous with minimum loss of information. The other challenge is that such methods must be able to apply in large-scale tasks when dealing with huge amount of mixed data. To tackle these challenges, we introduce parameter sharing and balancing extensions to our recent model, the mixed-variate restricted Boltzmann machine (MV.RBM) which can transform heterogeneous data into homogeneous representation. We also integrate structured sparsity and distance metric learning into RBM-based models. Our proposed methods are applied in various applications including latent patient profile modelling in medical data analysis and representation learning for image retrieval. The experimental results demonstrate the models perform better than baseline methods in medical data and outperform state-of-the-art rivals in image dataset.

[1]  Lawrence Carin,et al.  Nonparametric factor analysis with beta process priors , 2009, ICML '09.

[2]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  D. Dunson,et al.  Bayesian latent variable models for mixed discrete outcomes. , 2005, Biostatistics.

[4]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[5]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[6]  Rama Chellappa,et al.  Discriminant analysis of principal components for face recognition , 1998 .

[7]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[8]  Rama Chellappa,et al.  Discriminant analysis of principal components for face recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[9]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[10]  Ian T. Jolliffe,et al.  Principal Component Analysis , 1986, Springer Series in Statistics.

[11]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[12]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[13]  Quasi-Maximum Likelihood Estimation For Latent Variable Models With Mixed Continuous And Polytomous Data , 2005 .

[14]  C. McCulloch Joint modelling of mixed outcome types using latent variables , 2008, Statistical methods in medical research.

[15]  Svetha Venkatesh,et al.  Mixed-Variate Restricted Boltzmann Machines , 2014, ACML.

[16]  Holger Schwenk,et al.  Large, Pruned or Continuous Space Language Models on a GPU for Statistical Machine Translation , 2012, WLM@NAACL-HLT.

[17]  Svetha Venkatesh,et al.  Learning Boltzmann Distance Metric for Face Recognition , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[18]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[19]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[20]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[21]  I. Jolliffe Principal Component Analysis , 2002 .

[22]  Svetha Venkatesh,et al.  Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted Boltzmann Machine , 2013, PAKDD.

[23]  Yongmin Li,et al.  Video classification using spatial-temporal features and PCA , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[24]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[25]  Marcel Worring,et al.  The MediaMill TRECVID 2009 Semantic Video Search Engine , 2009, TRECVID.

[26]  Andrew E. Johnson,et al.  Using Spin Images for Efficient Object Recognition in Cluttered 3D Scenes , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[28]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[29]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[30]  Geoffrey E. Hinton,et al.  Modeling pixel means and covariances using factorized third-order boltzmann machines , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  P J Catalano,et al.  Bivariate modelling of clustered continuous and ordered categorical outcomes. , 1997, Statistics in medicine.

[32]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[36]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[37]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[38]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[39]  Geoffrey E. Hinton,et al.  Acoustic Modeling Using Deep Belief Networks , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Mingyao Li,et al.  Joint Regression Analysis of Correlated Data Using Gaussian Copulas , 2009, Biometrics.

[41]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[42]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[43]  L. Ryan,et al.  Latent Variable Models for Mixed Discrete and Continuous Outcomes , 1997 .

[44]  David Haussler,et al.  Unsupervised learning of distributions on binary vectors using two layer networks , 1991, NIPS 1991.

[45]  Svetha Venkatesh,et al.  Cumulative Restricted Boltzmann Machines for Ordinal Matrix Data Analysis , 2014, ACML.

[46]  Geoffrey E. Hinton,et al.  Learning Multilevel Distributed Representations for High-Dimensional Sequences , 2007, AISTATS.

[47]  Kari Torkkola,et al.  Linear Discriminant Analysis in Document Classification , 2007 .

[48]  Nicolas Le Roux,et al.  Learning a Generative Model of Images by Factoring Appearance and Shape , 2011, Neural Computation.

[49]  Svetha Venkatesh,et al.  Embedded Restricted Boltzmann Machines for fusion of mixed data types and applications in social measurements analysis , 2012, 2012 15th International Conference on Information Fusion.

[50]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[51]  Jared S. Murray,et al.  Bayesian Gaussian Copula Factor Models for Mixed Data , 2011, Journal of the American Statistical Association.

[52]  Stéphane Ayache,et al.  Classifier Fusion for SVM-Based Multimedia Semantic Indexing , 2007, ECIR.

[53]  D. Dunson Dynamic Latent Trait Models for Multidimensional Longitudinal Data , 2003 .

[54]  Svetha Venkatesh,et al.  A Slice Sampler for Restricted Hierarchical Beta Process with Applications to Shared Subspace Learning , 2012, UAI.

[55]  Geoffrey E. Hinton,et al.  Two Distributed-State Models For Generating High-Dimensional Time Series , 2011, J. Mach. Learn. Res..

[56]  Geoffrey E. Hinton,et al.  Deep Belief Networks for phone recognition , 2009 .

[57]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[58]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[59]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[60]  Geoffrey E. Hinton,et al.  Modeling Human Motion Using Binary Latent Variables , 2006, NIPS.

[61]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[62]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[63]  T. Speed,et al.  Markov Fields and Log-Linear Interaction Models for Contingency Tables , 1980 .

[64]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[65]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[66]  Ruimin Shen,et al.  Sparse Group Restricted Boltzmann Machines , 2010, AAAI.

[67]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[68]  John J. Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities , 1999 .

[69]  Geoffrey E. Hinton,et al.  Phone recognition using Restricted Boltzmann Machines , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[70]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[71]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[72]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[73]  Svetha Venkatesh,et al.  A Bayesian Framework for Learning Shared and Individual Subspaces from Multiple Data Sources , 2011, PAKDD.

[74]  T. Bodenheimer,et al.  Confronting the growing burden of chronic disease: can the U.S. health care workforce do the job? , 2009, Health affairs.

[75]  Daphna Weinshall,et al.  Learning distance functions for image retrieval , 2004, CVPR 2004.

[76]  Peter Glöckner,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2013 .

[77]  Svetha Venkatesh,et al.  Learning sparse latent representation and distance metric for image retrieval , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[78]  Nicu Sebe,et al.  A New Study on Distance Metrics as Similarity Measurement , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[79]  Yee Whye Teh,et al.  Rate-coded Restricted Boltzmann Machines for Face Recognition , 2000, NIPS.

[80]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[81]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[82]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[83]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[84]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[85]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[86]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[87]  D. Dunson,et al.  Bayesian latent variable models for clustered mixed outcomes , 2000 .

[88]  B. Wu,et al.  Copula‐based regression models for a bivariate mixed discrete and continuous outcome , 2011, Statistics in medicine.