Canonical Correlation Analysis on Data With Censoring and Error Information

We developed a probabilistic model for canonical correlation analysis in the case when the associated datasets are incomplete. This case can arise where data entries either contain measurement errors or are censored (i.e., nonignorable missing) due to uncertainties in instrument calibration and physical limitations of devices and experimental conditions. The aim of our model is to estimate the true correlation coefficients, through eliminating the effects of measurement errors and abstracting helpful information from censored data. As exact inference is not possible for the proposed model, a modified variational Expectation-Maximization (EM) algorithm was developed. In the algorithm developed, we approximated the posteriors of the latent variables as normal distributions. In the experiment, the modified E-step approximation accuracy is first empirically demonstrated by being compared to hybrid Monte Carlo (HMC) sampling. The following experiments were carried out on synthetic datasets with different numbers of censored data and different correlation coefficient settings to compare the proposed algorithm with a maximum a posteriori (MAP) solution and a Markov Chain-EM solution. Experimental results showed that the variational EM solution compares favorably against the MAP solution, approaching the accuracy of the Markov Chain-EM, while maintaining computational simplicity. We finally applied the proposed algorithm to finding the mostly correlated properties of galaxy group with the X-ray luminosity.

[1]  Malte Kuss,et al.  The Geometry Of Kernel Canonical Correlation Analysis , 2003 .

[2]  Yiannis Demiris,et al.  Nonparametric Mixtures of Gaussian Processes With Power-Law Behavior , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  David R. Hardoon,et al.  KCCA for different level precision in content-based image retrieval , 2003 .

[4]  Aeilko H Zwinderman,et al.  Penalized canonical correlation analysis to quantify the association between gene expression and DNA markers , 2007, BMC proceedings.

[5]  D Cordes,et al.  A Novel Test Statistic for Local Canonical Correlation Analysis of fMRI Data , 2009, NeuroImage.

[6]  Kenji Fukumizu,et al.  Statistical Consistency of Kernel Canonical Correlation Analysis , 2007 .

[7]  K. W. Lee,et al.  Joint use of DEA and constrained canonical correlation analysis for efficiency valuations involving categorical variables , 2009, J. Oper. Res. Soc..

[8]  Roberto Tagliaferri,et al.  A novel neural network-based survival analysis model , 2003, Neural Networks.

[9]  Christos Boutsidis,et al.  Efficient Dimensionality Reduction for Canonical Correlation Analysis , 2012, SIAM J. Sci. Comput..

[10]  Neil D. Lawrence,et al.  Probe-level measurement error improves accuracy in detecting differential gene expression , 2006, Bioinform..

[11]  Dipak K. Dey,et al.  Bayesian nonlinear regression models with scale mixtures of skew-normal distributions: Estimation and case influence diagnostics , 2011, Comput. Stat. Data Anal..

[12]  Olcay Kursun,et al.  A method for combining mutual information and canonical correlation analysis: Predictive Mutual Information and its use in feature selection , 2012, Expert Syst. Appl..

[13]  Vince D. Calhoun,et al.  Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI , 2010, NeuroImage.

[14]  K. Obermayer,et al.  Multiple-step ahead prediction for non linear dynamic systems: A Gaussian Process treatment with propagation of the uncertainty , 2003, NIPS 2003.

[15]  Ali Faisal,et al.  Biomarker discovery via dependency analysis of multiview functional genomics data , 2011 .

[16]  R. Fletcher Practical Methods of Optimization , 1988 .

[17]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[18]  Heleno Bolfarine,et al.  Bayesian inference for an extended simple regression measurement error model using skewed priors , 2007 .

[19]  Elisa T. Lee,et al.  Statistical Methods for Survival Data Analysis , 1994, IEEE Transactions on Reliability.

[20]  Jianyong Sun,et al.  A fast algorithm for robust mixtures in the presence of measurement errors , 2010, IEEE Trans. Neural Networks.

[21]  Hans Schneeweiß,et al.  On the estimation of the linear relation when the error variances are known , 2007, Comput. Stat. Data Anal..

[22]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[24]  Colin Fyfe,et al.  Kernel and Nonlinear Canonical Correlation Analysis , 2000, IJCNN.

[25]  Joachim M. Buhmann,et al.  Time-series alignment by non-negative multiple generalized canonical correlation analysis , 2007, BMC Bioinformatics.

[26]  A. O'Hagan,et al.  Bayes estimation subject to uncertainty about parameter constraints , 1976 .

[27]  Mahmoud Hassan,et al.  Combination of Canonical Correlation Analysis and Empirical Mode Decomposition Applied to Denoising the Labor Electrohysterogram , 2011, IEEE Transactions on Biomedical Engineering.

[28]  Florian Yger,et al.  Adaptive Canonical Correlation Analysis Based On Matrix Manifolds , 2012, ICML.

[29]  T. Adali,et al.  A group study of simulated driving fMRI data by multi-set canonical correlation analysis , 2009, NeuroImage.

[30]  Regina C. Elandt-Johnson,et al.  Survival Models and Data Analysis: Elandt-Johnson/Survival , 1999 .

[31]  Sujit K. Ghosh,et al.  Nonparametric regression models for right-censored data using Bernstein polynomials , 2012, Comput. Stat. Data Anal..

[32]  M. May Bayesian Survival Analysis. , 2002 .

[33]  Aeilko H. Zwinderman,et al.  Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks , 2009, BMC Bioinformatics.

[34]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[35]  Kenji Fukumizu,et al.  Consistency of Kernel Canonical Correlation Analysis , 2005 .

[36]  Sabine Van Huffel,et al.  Total least squares and errors-in-variables modeling , 2007, Signal Process..

[37]  Yufeng Liu,et al.  LOCAL KERNEL CANONICAL CORRELATION ANALYSIS WITH APPLICATION TO VIRTUAL DRUG SCREENING. , 2012, The annals of applied statistics.

[38]  Tae-Kyun Kim,et al.  Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Cédric Heuchenne,et al.  Nonlinear Regression With Censored Data , 2007, Technometrics.

[40]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[41]  Brandon C. Kelly,et al.  Measurement Error Models in Astronomy , 2011, 1112.1745.

[42]  M. Galea,et al.  Robust inference in an heteroscedastic measurement error model , 2010 .

[43]  Xiuping Liu,et al.  A new fuzzy approach for handling class labels in canonical correlation analysis , 2008, Neurocomputing.

[44]  Michel Verleysen,et al.  Robust probabilistic projections , 2006, ICML.

[45]  Samuel Kaski,et al.  Probabilistic approach to detecting dependencies between data sets , 2008, Neurocomputing.

[46]  J. Taylor An Introduction to Error Analysis , 1982 .

[47]  Francisco Escolano,et al.  Entropy-Based Incremental Variational Bayes Learning of Gaussian Mixtures , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Yoshio Takane,et al.  Generalized canonical correlation analysis with missing values , 2009, Computational Statistics.

[49]  Aeilko H. Zwinderman,et al.  Correlating multiple SNPs and multiple disease phenotypes: penalized non-linear canonical correlation analysis , 2009, Bioinform..

[50]  Marie Davidian,et al.  Nonlinear models for repeated measurement data: An overview and update , 2003 .

[51]  Wei Chu,et al.  A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[52]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[53]  Jianyong Sun,et al.  A Fast Algorithm for Robust Mixtures in the Presence of Measurement Errors , 2007, IEEE Transactions on Neural Networks.

[54]  Kerstin Preuschoff,et al.  Investigating signal integration with canonical correlation analysis of fMRI brain activation data , 2008, NeuroImage.

[55]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[56]  Byoung-Tak Zhang,et al.  Identification of cell cycle-related regulatory motifs using a kernel canonical correlation analysis , 2009, BMC Genomics.

[57]  A. Azzalini A class of distributions which includes the normal ones , 1985 .

[58]  N. L. Johnson,et al.  Survival Models and Data Analysis , 1982 .

[59]  Yangxin Huang,et al.  Simultaneous Bayesian inference for skew-normal semiparametric nonlinear mixed-effects models with covariate measurement errors. , 2012, Bayesian analysis.

[60]  Michael I. Jordan,et al.  A Probabilistic Interpretation of Canonical Correlation Analysis , 2005 .

[61]  Hulin Wu,et al.  Joint inference for nonlinear mixed-effects models and time to event at the presence of missing data. , 2007, Biostatistics.

[62]  John Shawe-Taylor,et al.  Sparse canonical correlation analysis , 2009, Machine Learning.

[63]  Ignacio Santamaría,et al.  A learning algorithm for adaptive canonical correlation analysis of several data sets , 2007, Neural Networks.