Exploiting transfer learning for the reconstruction of the human gene regulatory network

MOTIVATION The reconstruction of Gene Regulatory Networks (GRNs) from gene expression data has received increasing attention in recent years, due to its usefulness in the understanding of regulatory mechanisms involved in human diseases. Most of the existing methods reconstruct the network through machine learning approaches, by analyzing known examples of interactions. However, i) they often produce poor results when the amount of labeled examples is limited, or when no negative example is available and ii) they are not able to exploit information extracted from GRNs of other (better studied) related organisms, when this information is available. RESULTS In this paper we propose a novel machine learning method which overcomes these limitations, by exploiting the knowledge about the GRN of a source organism for the reconstruction of the GRN of the target organism, by means of a novel transfer learning technique. Moreover, the proposed method is natively able to work in the Positive-Unlabeled setting, where no negative example is available, by fruitfully exploiting a (possibly large) set of unlabeled examples. In our experiments we reconstructed the human GRN, by exploiting the knowledge of the GRN of M. musculus. Results showed that the proposed method outperforms state-of-the-art approaches and identifies previously unknown functional relationships among the analyzed genes. AVAILABILITY http://www.di.uniba.it/~mignone/systems/biosfer/index.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Jing Zhang,et al.  Joint Geometrical and Statistical Alignment for Visual Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning and Data Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[3]  Stephen A Ramsey,et al.  Differential gene regulatory networks in development and disease , 2017, Cellular and Molecular Life Sciences.

[4]  Philip S. Yu,et al.  Transfer Joint Matching for Unsupervised Domain Adaptation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  P. Park ChIP–seq: advantages and challenges of a maturing technology , 2009, Nature Reviews Genetics.

[6]  XuLei Yang,et al.  Weighted support vector machine for data classification , 2005 .

[7]  P. Celichowski,et al.  Transfer RNA-derived fragments target and regulate ribosome-associated aminoacyl-transfer RNA synthetases. , 2018, Biochimica et biophysica acta. Gene regulatory mechanisms.

[8]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[9]  Michelangelo Ceci,et al.  ComiRNet: a web-based system for the analysis of miRNA-gene regulatory networks , 2015, BMC Bioinformatics.

[10]  Michelangelo Ceci,et al.  Semi-Supervised Multi-View Learning for Gene Network Reconstruction , 2015, SEBD.

[11]  Ljupco Todorovski,et al.  Equation Discovery , 2010, Encyclopedia of Machine Learning and Data Mining.

[12]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[13]  J. Eisert,et al.  Extracting dynamical equations from experimental data is NP hard. , 2010, Physical review letters.

[14]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[15]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[16]  Kathryn Beal,et al.  Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution , 2014, BMC Genomics.

[17]  Kathryn S. Lilley,et al.  Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics , 2015, bioRxiv.

[18]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[19]  L. Tang,et al.  Long Noncoding RNA LINC00657 Acting as a miR-590-3p Sponge to Facilitate Low Concentration Oxidized Low-Density Lipoprotein–Induced Angiogenesis , 2018, Molecular Pharmacology.

[20]  V. Anne Smith,et al.  Evaluating functional network inference using simulations of complex biological systems , 2002, ISMB.

[21]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[22]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[23]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[24]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[25]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[26]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[27]  Andreas Pircher,et al.  Ribosome-associated ncRNAs: An emerging class of translation regulators , 2014, RNA biology.

[28]  Seth A Ament,et al.  Transcriptional regulatory networks underlying gene expression changes in Huntington's disease , 2018, Molecular systems biology.

[29]  Linyuan Lu,et al.  Link Prediction in Complex Networks: A Survey , 2010, ArXiv.

[30]  Henning Redestig,et al.  Prioritising candidate genes causing QTL using hierarchical orthologous groups , 2018, Bioinform..

[31]  Yitao Yang,et al.  Multiview Transfer Learning for Software Defect Prediction , 2019, IEEE Access.

[32]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[33]  Hao Hu,et al.  Transfer learning for WiFi-based indoor localization , 2008, AAAI 2008.

[34]  Brahim Chaib-draa,et al.  Generative Adversarial Positive-Unlabelled Learning , 2017, IJCAI.

[35]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[36]  M. Beal,et al.  Mitochondrial diseases of the brain. , 2013, Free radical biology & medicine.

[37]  Yiqiang Chen,et al.  Balanced Distribution Adaptation for Transfer Learning , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[38]  Yan Cui,et al.  Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  J. Cate,et al.  Regulating the ribosome: a spotlight on RNA dark matter. , 2014, Molecular cell.

[40]  Philip S. Yu,et al.  A robust one-class transfer learning method with uncertain data , 2014, Knowledge and Information Systems.

[41]  Juan Li,et al.  Reconstruction of the Gene Regulatory Network Involved in the Sonic Hedgehog Pathway with a Potential Role in Early Development of the Mouse Brain , 2014, PLoS Comput. Biol..

[42]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[43]  Gianvito Pio,et al.  Positive Unlabeled Link Prediction via Transfer Learning for Gene Network Reconstruction (Discussion Paper) , 2018, SEBD.

[44]  Yong Wang,et al.  Naive Bayes Classifier for Positive Unlabeled Learning with Uncertainty , 2010, SDM.

[45]  Wanli Zuo,et al.  Reliable Negative Extracting Based on kNN for Learning from Positive and Unlabeled Examples , 2009, J. Comput..

[46]  Hedi Peterson,et al.  g:Profiler—a web server for functional interpretation of gene lists (2016 update) , 2016, Nucleic Acids Res..

[47]  Mathukumalli Vidyasagar,et al.  A transfer learning approach for integrating biological data across platforms , 2016, 2016 American Control Conference (ACC).

[48]  M. Berger,et al.  Universal protein-binding microarrays for the comprehensive characterization of the DNA-binding specificities of transcription factors , 2009, Nature Protocols.

[49]  Zhi Ding,et al.  Fast network component analysis (FastNCA) for gene regulatory network reconstruction from microarray data , 2008, Bioinform..

[50]  Maryam Anwar,et al.  Experimental approaches for gene regulatory network construction: The chick as a model system , 2013, Genesis.

[51]  Michelangelo Ceci,et al.  Self-training for multi-target regression with tree ensembles , 2017, Knowl. Based Syst..

[52]  Christopher A. Penfold,et al.  How to infer gene networks from expression profiles, revisited , 2011, Interface Focus.