Covariate‐driven factorization by thresholding for multiblock data

Multi-block data, where multiple groups of variables from different sources are observed for a common set of subjects, are routinely collected in many areas of science. Methods for joint factorization of such multi-block data are being developed to explore the potentially joint variation structure of the data. While most of the existing work focuses on delineating joint components, shared across all data blocks, from individual components, which is only relevant to a single data block, we propose to model and estimate partially-joint components across some, but not all, data blocks. If covariates, with potential multi-block structures, are available, then the components are further modeled to be driven by the covariate information. To estimate such a covariate-driven, block-structured factor model, we propose an iterative algorithm based on thresholding, by transforming the problem of signal segmentation into a grouped variable selection problem. The proposed factorization provides accurate estimation of individual and (partially) joint structures in multi-block data, as confirmed by simulation studies. In the analysis of a real multi-block genomic dataset from the Cancer Genome Atlas project, we demonstrate that the estimated block structures provide straightforward interpretation and facilitate subsequent analyses. This article is protected by copyright. All rights reserved.

[1]  Jianhua Z. Huang,et al.  Sparse Reduced-Rank Regression for Simultaneous Dimension Reduction and Variable Selection , 2012 .

[2]  Tommy Löfstedt,et al.  Global, local and unique decompositions in OnPLS for multiblock data analysis. , 2013, Analytica chimica acta.

[3]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[4]  Gen Li,et al.  Structural learning and integrative decomposition of multi‐view data , 2017, Biometrics.

[5]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[6]  George C Tseng,et al.  Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. , 2017, Biostatistics.

[7]  Jonathan Taylor,et al.  Statistical learning and selective inference , 2015, Proceedings of the National Academy of Sciences.

[8]  Tommy Löfstedt,et al.  OnPLS—a novel multiblock method for the modelling of predictive and orthogonal variation , 2011 .

[9]  Samuel Kaski,et al.  Group Factor Analysis , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[10]  Zhiguang Huo,et al.  Integrative Sparse K-Means With Overlapping Group Lasso in Genomic Applications for Disease Subtype Discovery. , 2017, The annals of applied statistics.

[11]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[12]  Raymond J. Carroll,et al.  Data integration with high dimensionality , 2016, Biometrika.

[13]  Andrew B. Nobel,et al.  Supervised singular value decomposition and its asymptotic properties , 2016, J. Multivar. Anal..

[14]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[15]  Andrzej Cichocki,et al.  Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Sungkyu Jung,et al.  Incorporating covariates into integrated factor analysis of multi‐view data , 2017, Biometrics.

[17]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[18]  R. Tibshirani,et al.  On the “degrees of freedom” of the lasso , 2007, 0712.0881.

[19]  Lek-Heng Lim,et al.  Schubert Varieties and Distances between Subspaces of Different Dimensions , 2014, SIAM J. Matrix Anal. Appl..

[20]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[21]  Eric F Lock,et al.  Generalized integrative principal component analysis for multi-type data with block-wise missing structure. , 2018, Biostatistics.

[22]  J. S. Marron,et al.  Angle-based joint and individual variation explained , 2017, J. Multivar. Anal..