Machine Learning and Integrative Analysis of Biomedical Big Data

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.

[1]  Hui Xiao,et al.  Multi-omics data integration using cross-modal neural networks , 2018, ESANN.

[2]  Georgi Z. Genchev,et al.  Multi-omics facilitated variable selection in Cox-regression model for cancer prognosis prediction. , 2017, Methods.

[3]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[4]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Giorgio Valentini,et al.  Simple ensemble methods are competitive with state-of-the-art data integration methods for gene function prediction , 2010, MLSB.

[7]  Lena Osterhagen,et al.  Multiple Imputation For Nonresponse In Surveys , 2016 .

[8]  Fabian J. Theis,et al.  Gaussian graphical modeling reconstructs pathway reactions from high-throughput metabolomics data , 2011, BMC Systems Biology.

[9]  B. Palsson,et al.  The model organism as a system: integrating 'omics' data sets , 2006, Nature Reviews Molecular Cell Biology.

[10]  Jiawei Han,et al.  Multi-Dimensional, Phrase-Based Summarization in Text Cubes , 2016, IEEE Data Eng. Bull..

[11]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[12]  Ke Wang,et al.  Recovering loss to followup information using denoising autoencoders , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[13]  Peter Kovacs,et al.  Combined proteomic and metabolomic profiling of serum reveals association of the complement system with obesity and identifies novel markers of body fat mass changes. , 2011, Journal of proteome research.

[14]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[15]  Basav Roychoudhury,et al.  Handling missing values: A study of popular imputation packages in R , 2018, Knowl. Based Syst..

[16]  Marianthi Markatou,et al.  A semiparametric method for clustering mixed data , 2016, Machine Learning.

[17]  Aedín C. Culhane,et al.  Dimension reduction techniques for the integrative analysis of multi-omics data , 2016, Briefings Bioinform..

[18]  Svetha Venkatesh,et al.  Latent Patient Profile Modelling and Applications with Mixed-Variate Restricted Boltzmann Machine , 2013, PAKDD.

[19]  Marianthi Markatou,et al.  kamila: Clustering Mixed-Type Data in R and Hadoop , 2018 .

[20]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[21]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[22]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[23]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[24]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[25]  Steven C. H. Hoi,et al.  Online Deep Learning: Learning Deep Neural Networks on the Fly , 2017, IJCAI.

[26]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[27]  Volkan Cevher,et al.  Convex Optimization for Big Data: Scalable, randomized, and parallel algorithms for big data analytics , 2014, IEEE Signal Processing Magazine.

[28]  Xibei Yang,et al.  Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers , 2013, BioMed research international.

[29]  Xizhao Wang,et al.  A review on neural networks with random weights , 2018, Neurocomputing.

[30]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[31]  Henry Markram,et al.  Real-Time Computing Without Stable States: A New Framework for Neural Computation Based on Perturbations , 2002, Neural Computation.

[32]  Corrado Priami,et al.  Multi-omics integration - a comparison of unsupervised clustering methodologies , 2019, Briefings Bioinform..

[33]  Akira R. Kinjo,et al.  Neuro-symbolic representation learning on biological knowledge graphs , 2016, Bioinform..

[34]  Y. Kluger,et al.  Zero-preserving imputation of scRNA-seq data using low-rank approximation , 2018, bioRxiv.

[35]  Francisco J. Veredas,et al.  A machine learning approach for predicting methionine oxidation sites , 2017, BMC Bioinformatics.

[36]  Charles Auffray,et al.  Systems analysis of transcriptome and proteome in retinoic acid/arsenic trioxide-induced cell differentiation/apoptosis of promyelocytic leukemia. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Zachary A. Szpiech,et al.  High-resolution network biology: connecting sequence with function , 2013, Nature Reviews Genetics.

[38]  Shuigeng Zhou,et al.  MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features , 2010, BMC Bioinformatics.

[39]  P. Kharchenko,et al.  Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain , 2017, Nature Biotechnology.

[40]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[41]  Paul D. Allison,et al.  Handling Missing Data by Maximum Likelihood , 2012 .

[42]  D. Rubin,et al.  Fully conditional specification in multivariate imputation , 2006 .

[43]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[44]  Guang-Bin Huang,et al.  Extreme Learning Machine for Multilayer Perceptron , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[45]  Florian Rohart,et al.  DIABLO: from multi-omics assays to biomarker discovery, an integrative approach , 2018, bioRxiv.

[46]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[47]  Qing Chang,et al.  Feature selection methods for big data bioinformatics: A survey from the search perspective. , 2016, Methods.

[48]  Anton Nekrutenko,et al.  Harnessing cloud computing with Galaxy Cloud , 2011, Nature Biotechnology.

[49]  Francisco Herrera,et al.  ROSEFW-RF: The winner algorithm for the ECBDL'14 big data competition: An extremely imbalanced big data bioinformatics problem , 2015, Knowl. Based Syst..

[50]  Xuan Wang,et al.  Phrase mining of textual data to analyze extracellular matrix protein patterns across cardiovascular disease. , 2018, American journal of physiology. Heart and circulatory physiology.

[51]  Minseon Kim,et al.  An Improved Method for Prediction of Cancer Prognosis by Network Learning , 2018, Genes.

[52]  John C. Earls,et al.  A wellness study of 108 individuals using personal, dense, dynamic data clouds , 2017, Nature Biotechnology.

[53]  David Haussler,et al.  Inference of patient-specific pathway activities from multi-dimensional cancer genomics data using PARADIGM , 2010, Bioinform..

[54]  Aedín C. Culhane,et al.  A multivariate approach to the integration of multi-omics datasets , 2014, BMC Bioinformatics.

[55]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[56]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[57]  Bin Gu,et al.  Chunk incremental learning for cost-sensitive hinge loss support vector machine , 2018, Pattern Recognit..

[58]  Paolo Bossi,et al.  Integrative miRNA-Gene Expression Analysis Enables Refinement of Associated Biology and Prediction of Response to Cetuximab in Head and Neck Squamous Cell Cancer , 2017, Genes.

[59]  Georgina Stegmayer,et al.  Extreme learning machines for reverse engineering of gene regulatory networks from expression time series , 2018, Bioinform..

[60]  May D. Wang,et al.  Exploration of genomic, proteomic, and histopathological image data integration methods for clinical prediction , 2013, 2013 IEEE China Summit and International Conference on Signal and Information Processing.

[61]  Nello Cristianini,et al.  Controlling the Sensitivity of Support Vector Machines , 1999 .

[62]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[63]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[64]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[65]  Simon Myers,et al.  Rapid genotype imputation from sequence without reference panels , 2016, Nature Genetics.

[66]  Haijun Lei,et al.  Protein–Protein Interactions Prediction via Multimodal Deep Polynomial Network and Regularized Extreme Learning Machine , 2019, IEEE Journal of Biomedical and Health Informatics.

[67]  Barbara Sitek,et al.  A practical data processing workflow for multi-OMICS projects. , 2014, Biochimica et biophysica acta.

[68]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[69]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[70]  Ivan C Gerling,et al.  New Data Analysis and Mining Approaches Identify Unique Proteome and Transcriptome Markers of Susceptibility to Autoimmune Diabetes* , 2006, Molecular & Cellular Proteomics.

[71]  S. Pineda,et al.  Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer , 2015, PLoS genetics.

[72]  Fritz J Sedlazeck,et al.  Piercing the dark matter: bioinformatics of long-range sequencing and mapping , 2018, Nature Reviews Genetics.

[73]  Marylyn D. Ritchie,et al.  Large-Scale Analysis of Genetic and Clinical Patient Data , 2018, Annual Review of Biomedical Data Science.

[74]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[75]  T. Spector,et al.  Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements , 2013, Genome Biology.

[76]  Dimitrios I. Fotiadis,et al.  Online prediction of glucose concentration in type 1 diabetes using extreme learning machines , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[77]  Junfeng Xia,et al.  Cancer Subtype Discovery Based on Integrative Model of Multigenomic Data , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[78]  Hongxun Yao,et al.  Auto-encoder based dimensionality reduction , 2016, Neurocomputing.

[79]  Xiaoyong Pan,et al.  Discriminating cirRNAs from other lncRNAs using a hierarchical extreme learning machine (H-ELM) algorithm with feature selection , 2017, Molecular Genetics and Genomics.

[80]  Yang Yang,et al.  Optimizing a machine learning based glioma grading system using multi-parametric MRI histogram and texture features , 2017, Oncotarget.

[81]  Michael G. Kenward,et al.  Multiple Imputation and its Application , 2013 .

[82]  Casey S. Greene,et al.  Privacy-preserving generative deep neural networks support clinical data sharing , 2017 .

[83]  Chen Peng,et al.  Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[84]  Michael Q. Ding,et al.  Precision Oncology beyond Targeted Therapy: Combining Omics Data with Machine Learning Matches the Majority of Cancer Cells to Effective Therapeutics , 2017, Molecular Cancer Research.

[85]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[86]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[87]  Zhaohui S. Qin,et al.  DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles , 2016, Genome Biology.

[88]  et al.,et al.  Missing Data Imputation in the Electronic Health Record Using Deeply Learned Autoencoders , 2017, PSB.

[89]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[90]  Biswapriya B Misra,et al.  Integrated Omics: Tools, Advances, and Future Approaches. , 2019, Journal of molecular endocrinology.

[91]  James J. Chen,et al.  Class-imbalanced classifiers for high-dimensional data , 2013, Briefings Bioinform..

[92]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[93]  P. Allison Estimation of Linear Models with Incomplete Data , 1987 .

[94]  Brooke L. Fridley,et al.  Integrative clustering of multi-level ‘omic data based on non-negative matrix factorization algorithm , 2017, PloS one.

[95]  David S. Wishart,et al.  MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis , 2018, Nucleic Acids Res..

[96]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[97]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[98]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[99]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[100]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.

[101]  Andrew W. Moore,et al.  Mix-nets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables , 2000, UAI.

[102]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[103]  Martin Sill,et al.  Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data , 2015, Bioinform..

[104]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[105]  Hsung-Pin Chang,et al.  PClass: Protein Quaternary Structure Classification by Using Bootstrapping Strategy as Model Selection , 2018, Genes.

[106]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[107]  Hyungwon Choi,et al.  When One and One Gives More than Two: Challenges and Opportunities of Integrative Omics , 2011, Front. Gene..

[108]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[109]  Byunghan Lee,et al.  Deep learning in bioinformatics , 2016, Briefings Bioinform..

[110]  Tahir Yusufaly,et al.  MathIOmica: An Integrative Platform for Dynamic Omics , 2016, Scientific Reports.

[111]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[112]  Roger H. Johnson,et al.  Integrative analysis of transcriptomic and proteomic data of Desulfovibrio vulgaris: a non-linear model to predict abundance of undetected proteins , 2009, Bioinform..

[113]  Paul M. Thompson,et al.  Analysis of sampling techniques for imbalanced data: An n=648 ADNI study , 2014, NeuroImage.

[114]  J. Graham,et al.  Missing data analysis: making it work in the real world. , 2009, Annual review of psychology.

[115]  Ke Chen,et al.  Survey of MapReduce frame operation in bioinformatics , 2013, Briefings Bioinform..

[116]  Fabian J Theis,et al.  Multi-omic signature of body weight change: results from a population-based cohort study , 2015, BMC Medicine.

[117]  Jae-Hwan Jhong,et al.  Erratum to: Meta-analytic support vector machine for integrating multiple omics data , 2017, BioData Mining.

[118]  Smaranda Belciug,et al.  Learning a single-hidden layer feedforward neural network using a rank correlation-based strategy with application to high dimensional gene expression and proteomic spectra datasets in cancer detection , 2018, J. Biomed. Informatics.

[119]  Jean-Philippe Vert,et al.  TIGRESS: Trustful Inference of Gene REgulation using Stability Selection , 2012, BMC Systems Biology.

[120]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[121]  Ke Wang,et al.  MIDA: Multiple Imputation Using Denoising Autoencoders , 2017, PAKDD.

[122]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[123]  Xinyi Liu,et al.  Decision Variants for the Automatic Determination of Optimal Feature Subset in RF-RFE , 2018, Genes.

[124]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[125]  Aleksandra Werner,et al.  The study of under- and over-sampling methods' utility in analysis of highly imbalanced data on osteoporosis , 2017, Inf. Sci..

[126]  D. Altman,et al.  Missing data , 2007, BMJ : British Medical Journal.

[127]  Enrico Glaab,et al.  Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification , 2015, Briefings Bioinform..

[128]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[129]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[130]  George Michailidis,et al.  A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data , 2015, Bioinform..

[131]  J. Troisi,et al.  A metabolomics-based approach for non-invasive diagnosis of chromosomal anomalies , 2017, Metabolomics.

[132]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[133]  Åsa M. Wheelock,et al.  Integration of multi-omics datasets enables molecular classification of COPD , 2018, European Respiratory Journal.

[134]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[135]  Wei Wang,et al.  Spotlite: web application and augmented algorithms for predicting co-complexed proteins from affinity purification--mass spectrometry data. , 2014, Journal of proteome research.

[136]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[137]  Liwei Zhang,et al.  Synergistic Drug Combination Prediction by Integrating Multi-omics Data in Deep Learning Models , 2018, Methods in molecular biology.

[138]  Allen K. Bourdon,et al.  Metabolomic analysis of mouse prefrontal cortex reveals upregulated analytes during wakefulness compared to sleep , 2018, Scientific Reports.

[139]  Cong Pian,et al.  LncRNApred: Classification of Long Non-Coding RNAs and Protein-Coding Transcripts by the Ensemble Algorithm with a New Hybrid Feature , 2016, PloS one.

[140]  Hongyu Zhao,et al.  A comparison of graph- and kernel-based –omics data integration algorithms for classifying complex traits , 2017, BMC Bioinformatics.

[141]  Vittorio Fortino,et al.  A Robust and Accurate Method for Feature Selection and Prioritization from Multi-Class OMICs Data , 2014, PloS one.

[142]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[143]  Alioune Ngom,et al.  A review on machine learning principles for multi-view biological data integration , 2016, Briefings Bioinform..

[144]  Peggy L. Peissig,et al.  Machine Learning-as-a-Service and Its Application to Medical Informatics , 2017, MLDM.

[145]  Niko Beerenwinkel,et al.  Network-based integration of multi-omics data for prioritizing cancer genes , 2018, Bioinform..

[146]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[147]  K. Pollard,et al.  Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin , 2016, Nature Genetics.

[148]  Dong Xu,et al.  Imbalanced multi-label learning for identifying antimicrobial peptides and their functional types , 2016, Bioinform..

[149]  Kim-Anh Lê Cao,et al.  DIABLO - an integrative, multi-omics, multivariate method for multi-group classification , 2017 .

[150]  Age K. Smilde,et al.  Generalized simultaneous component analysis of binary and quantitative data , 2018, Journal of Chemometrics.

[151]  Giorgio Valentini,et al.  UNIPred: Unbalance-Aware Network Integration and Prediction of Protein Functions , 2015, J. Comput. Biol..

[152]  Nan Liu,et al.  Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift , 2015, Neurocomputing.

[153]  F. Kaper,et al.  Submegabase copy number variations arise during cerebral cortical neurogenesis as revealed by single-cell whole-genome sequencing , 2018, Proceedings of the National Academy of Sciences.

[154]  Xue-wen Chen,et al.  Heterogeneous data integration by tree‐augmented naïve Bayes for protein–protein interactions prediction , 2013, Proteomics.

[155]  Gabriele Multhoff,et al.  Integrative proteomics and targeted transcriptomics analyses in cardiac endothelial cells unravel mechanisms of long-term radiation-induced vascular dysfunction. , 2015, Journal of proteome research.

[156]  Tam V. Nguyen,et al.  Dual-layer kernel extreme learning machine for action recognition , 2017, Neurocomputing.

[157]  P. N. Suganthan,et al.  A comprehensive evaluation of random vector functional link networks , 2016, Inf. Sci..

[158]  Patrick Royston,et al.  Tuning multiple imputation by predictive mean matching and local residual draws , 2014, BMC Medical Research Methodology.

[159]  Reynold Xin,et al.  Apache Spark , 2016 .

[160]  P. Tsao,et al.  Decoding the Genomics of Abdominal Aortic Aneurysm , 2018, Cell.

[161]  Ilias Tagkopoulos,et al.  Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli , 2016, Nature Communications.

[162]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[163]  Enrique J. deAndrés-Galiana,et al.  Genomic data integration in chronic lymphocytic leukemia , 2017, The journal of gene medicine.

[164]  Jing-Yu Yang,et al.  A Cascade Random Forests Algorithm for Predicting Protein-Protein Interaction Sites , 2015, IEEE Transactions on NanoBioscience.

[165]  Jing-Yu Yang,et al.  Protein-protein interaction sites prediction by ensembling SVM and sample-weighted random forests , 2016, Neurocomputing.

[166]  Lei Chen,et al.  Identification of Differentially Expressed Genes between Original Breast Cancer and Xenograft Using Machine Learning Algorithms , 2018, Genes.

[167]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[168]  Le Zhang,et al.  A survey of randomized algorithms for training neural networks , 2016, Inf. Sci..

[169]  Olga G Troyanskaya,et al.  An integrative tissue-network approach to identify and test human disease genes , 2018, Nature Biotechnology.

[170]  John D. Storey,et al.  Statistical significance of variables driving systematic variation in high-dimensional data , 2013, Bioinform..

[171]  J. Marioni,et al.  Multi‐Omics Factor Analysis—a framework for unsupervised integration of multi‐omics data sets , 2018, Molecular systems biology.

[172]  R. Aebersold,et al.  Mass spectrometry-based proteomics , 2003, Nature.

[173]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[174]  C. Newgard,et al.  Missing Data: How to Best Account for What Is Not Known. , 2015, JAMA.

[175]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[176]  Majid Sarrafzadeh,et al.  HeteroMed: Heterogeneous Information Network for Medical Diagnosis , 2018, CIKM.

[177]  Zhen Liu,et al.  A hybrid method based on ensemble WELM for handling multi class imbalance in cancer microarray data , 2017, Neurocomputing.

[178]  Jin‐Young Jang,et al.  Erratum to: Integrative analysis of multi-omics data for identifying multi-markers for diagnosing pancreatic cancer , 2015, BMC Genomics.

[179]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[180]  Zhiping Lin,et al.  A Novel Relaxed ADMM with Highly Parallel Implementation for Extreme Learning Machine , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[181]  Marco Beccuti,et al.  Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets , 2017, PloS one.

[182]  George C Tseng,et al.  Integrative clustering of multi-level omics data for disease subtype discovery using sequential double regularization. , 2017, Biostatistics.

[183]  A. Benczúr,et al.  Prediction and characterization of human ageing-related proteins by using machine learning , 2018, Scientific Reports.

[184]  Ignacio González,et al.  Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework , 2016, BMC Bioinformatics.

[185]  Angshul Majumdar,et al.  AutoImpute: Autoencoder based imputation of single-cell RNA-seq data , 2018, Scientific Reports.

[186]  Sean Owen,et al.  Mahout in Action , 2011 .

[187]  Aidong Zhang,et al.  Multi-view Factorization AutoEncoder with Network Constraints for Multi-omic Integrative Analysis , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[188]  Dongdong Sun,et al.  A Multimodal Deep Neural Network for Human Breast Cancer Prognosis Prediction by Integrating Multi-Dimensional Data , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[189]  Angela M Yu,et al.  High-throughput determination of RNA structures , 2018, Nature Reviews Genetics.

[190]  Zhiwei Steven Wu,et al.  Privacy-Preserving Generative Deep Neural Networks Support Clinical Data Sharing , 2017, bioRxiv.

[191]  Hongming Zhou,et al.  Extreme Learning Machine for Regression and Multiclass Classification , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[192]  Xiuzhen Huang,et al.  SparRec: An effective matrix completion framework of missing data imputation for GWAS , 2016, Scientific reports.

[193]  Y. Takefuji,et al.  Functional-link net computing: theory, system architecture, and functionalities , 1992, Computer.

[194]  Hailong Zhu,et al.  Integrating multiple networks for protein function prediction , 2015, BMC Systems Biology.

[195]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[196]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[197]  Søren Brunak,et al.  Integration of clinical chemistry, expression, and metabolite data leads to better toxicological class separation. , 2008, Toxicological sciences : an official journal of the Society of Toxicology.

[198]  Ujjwal Maulik,et al.  Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data , 2017, IEEE Transactions on NanoBioscience.

[199]  Andrew D. Rouillard,et al.  Reprint of "Abstraction for data integration: Fusing mammalian molecular, cellular and phenotype big datasets for better knowledge extraction" , 2015, Comput. Biol. Chem..

[200]  David E. Booth,et al.  Analysis of Incomplete Multivariate Data , 2000, Technometrics.

[201]  Dinggang Shen,et al.  Late Fusion Incomplete Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[202]  Xizhao Wang,et al.  Non-iterative approaches in training feed-forward neural networks and their applications , 2018, Soft Comput..

[203]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[204]  Andrew I. Su,et al.  Omics Pipe: a community-based framework for reproducible multi-omics data analysis , 2015, Bioinform..

[205]  Anthony Rowe,et al.  A computational framework for complex disease stratification from multiple large-scale datasets , 2018, BMC Systems Biology.

[206]  Stanley Kok,et al.  Multi-layer Online Sequential Extreme Learning Machine for Image Classification , 2016 .

[207]  Van-Huy Pham,et al.  Drug Response Prediction by Globally Capturing Drug and Cell Line Information in a Heterogeneous Network. , 2018, Journal of molecular biology.

[208]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, ICDM.

[209]  Kumardeep Chaudhary,et al.  Deep Learning–Based Multi-Omics Integration Robustly Predicts Survival in Liver Cancer , 2017, Clinical Cancer Research.

[210]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[211]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[212]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[213]  Hong-Bin Shen,et al.  Prediction of Protein–Protein Interaction Sites with Machine-Learning-Based Data-Cleaning and Post-Filtering Procedures , 2015, The Journal of Membrane Biology.

[214]  Wei Cheng,et al.  Fast and robust group-wise eQTL mapping using sparse graphical models , 2015, BMC Bioinformatics.

[215]  Changyin Sun,et al.  A Review of Class Imbalance Learning Methods in Bioinformatics , 2015 .

[216]  Qionghai Dai,et al.  Bosco: Boosting Corrections for Genome-Wide Association Studies With Imbalanced Samples , 2017, IEEE Transactions on NanoBioscience.

[217]  Yang Li,et al.  Predicting Protein-DNA Binding Residues by Weightedly Combining Sequence-Based Features and Boosting Multiple SVMs , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[218]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[219]  Jos Kleinjans,et al.  Transcriptomic and metabolomic data integration , 2016, Briefings Bioinform..

[220]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[221]  Fengzhu Sun,et al.  Sparse generalized linear model with L0 approximation for feature selection and prediction with big omics data , 2017, BioData Mining.

[222]  Dongdong Lin,et al.  An integrative imputation method based on multi-omics datasets , 2016, BMC Bioinformatics.

[223]  Lin Yuan,et al.  Pan-cancer transcriptome analysis reveals a gene expression signature for the identification of tumor tissue origin , 2016, Modern Pathology.

[224]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[225]  Lodewyk F. A. Wessels,et al.  TANDEM: a two-stage approach to maximize interpretability of drug response models based on multiple molecular data types , 2016, Bioinform..

[226]  V. Bajic,et al.  DEEP: a general computational framework for predicting enhancers , 2014, Nucleic acids research.

[227]  Tao Huan,et al.  Data processing, multi-omic pathway mapping, and metabolite activity analysis using XCMS Online , 2018, Nature Protocols.

[228]  Ting Chen,et al.  Integrative Data Analysis of Multi-Platform Cancer Data with a Multimodal Deep Learning Approach , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[229]  Amaury Lendasse,et al.  Extreme learning machine for missing data using multiple imputations , 2016, Neurocomputing.

[230]  Amparo Alonso-Betanzos,et al.  Filter Methods for Feature Selection - A Comparative Study , 2007, IDEAL.

[231]  Koki Tsuyuzaki,et al.  Biological Systems as Heterogeneous Information Networks: A Mini-review and Perspectives , 2017, ArXiv.

[232]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[233]  J. Lee,et al.  Single-cell RNA sequencing technologies and bioinformatics pipelines , 2018, Experimental & Molecular Medicine.

[234]  Sanghyun Park,et al.  Improved prediction of breast cancer outcome by identifying heterogeneous biomarkers , 2017, Bioinform..

[235]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.