Integrating omics datasets with the OmicsPLS package

AbstractBackgroundWith the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS.ResultsWe introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data.ConclusionsWe propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLS and can be installed in R via install.packages(“OmicsPLS”).

[1]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[2]  Florian Rohart,et al.  mixOmics: an R package for ‘omics feature selection and multiple data integration , 2017 .

[3]  Jeanine J. Houwing-Duistermaat,et al.  Evaluation of O2PLS in Omics data integration , 2016, BMC Bioinformatics.

[4]  C. Hayward,et al.  Dataset pertaining to the publication “Loci Associated with N-Glycosylation of Human Immunoglobulin G Show Pleiotropy with Autoimmune Diseases and Haematological Cancers” , 2016 .

[5]  H. Wold Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments , 1973 .

[6]  Johan Trygg,et al.  O2‐PLS, a two‐block (X–Y) latent variable regression (LVR) method with an integral OSC filter , 2003 .

[7]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[8]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[9]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[10]  Aedín C. Culhane,et al.  Dimension reduction techniques for the integrative analysis of multi-omics data , 2016, Briefings Bioinform..

[11]  M. Larkin Software , 2003, The Lancet.

[12]  Age K. Smilde,et al.  Separating common from distinctive variation , 2016, BMC Bioinformatics.

[13]  Ignacio González,et al.  Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework , 2016, BMC Bioinformatics.

[14]  R. Wehrens,et al.  Bootstrapping principal component regression models , 1997 .

[15]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[16]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[17]  Eric F. Lock,et al.  R.JIVE for exploration of multi-source molecular data , 2016, Bioinform..

[18]  Iven Van Mechelen,et al.  UvA-DARE ( Digital Academic Repository ) A structured overview of simultaneous component based data integration , 2009 .

[19]  L. De Lathauwer,et al.  DISCO-SCA and Properly Applied GSVD as Swinging Methods to Find Common and Distinctive Processes , 2012, PloS one.

[20]  Dongdong Lin,et al.  An integrative imputation method based on multi-omics datasets , 2016, BMC Bioinformatics.

[21]  Javier Cabrera,et al.  Analysis of Data From Viral DNA Microchips , 2001 .