Statistical Inference for Data Integration

In the age of big data, data integration is a critical step especially in the understanding of how diverse data types work together and work separately. Among the data integration methods, the Angle-Based Joint and Individual Variation Explained (AJIVE) is particularly attractive because it not only studies joint behavior but also individual behavior. Typically scores indicate relationships between data objects. The drivers of those relationships are determined by the loadings. A fundamental question is which loadings are statistically significant. A useful approach for assessing this is the jackstraw method. In this paper, we develop jackstraw for the loadings of the AJIVE data analysis. This provides statistical inference about the drivers in both joint and individual feature spaces.

[1]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Olufunmilayo I. Olopade,et al.  Basal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survival , 2011, Breast Cancer Research and Treatment.

[3]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[4]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[5]  Steven J. M. Jones,et al.  Comprehensive Molecular Portraits of Invasive Lobular Breast Cancer , 2015, Cell.

[6]  J. S. Marron,et al.  Direction-Projection-Permutation for High-Dimensional Hypothesis Tests , 2013, 1304.0796.

[7]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[8]  John D. Storey,et al.  Statistical significance of variables driving systematic variation in high-dimensional data , 2013, Bioinform..

[9]  C. Hutter,et al.  The Cancer Genome Atlas: Creating Lasting Value beyond Its Data , 2018, Cell.

[10]  J. S. Marron,et al.  Angle-based joint and individual variation explained , 2017, J. Multivar. Anal..

[11]  Peter W. Laird,et al.  Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer , 2018, Cell.

[12]  John E. Walsh,et al.  Bounded probability properties of Kolmogorov-Smirnov and similar statistics for discrete data , 1963 .

[13]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .