Genome, transcriptome and proteome: the rise of omics data and their integration in biomedical sciences

Abstract Advances in the technologies and informatics used to generate and process large biological data sets (omics data) are promoting a critical shift in the study of biomedical sciences. While genomics, transcriptomics and proteinomics, coupled with bioinformatics and biostatistics, are gaining momentum, they are still, for the most part, assessed individually with distinct approaches generating monothematic rather than integrated knowledge. As other areas of biomedical sciences, including metabolomics, epigenomics and pharmacogenomics, are moving towards the omics scale, we are witnessing the rise of inter-disciplinary data integration strategies to support a better understanding of biological systems and eventually the development of successful precision medicine. This review cuts across the boundaries between genomics, transcriptomics and proteomics, summarizing how omics data are generated, analysed and shared, and provides an overview of the current strengths and weaknesses of this global approach. This work intends to target students and researchers seeking knowledge outside of their field of expertise and fosters a leap from the reductionist to the global-integrative analytical approach in research.

[1]  Nuno A. Fonseca,et al.  RNA-Seq Gene Profiling - A Systematic Empirical Comparison , 2014, bioRxiv.

[2]  Gil McVean,et al.  Improved genome inference in the MHC using a population reference graph , 2014, Nature Genetics.

[3]  Jason H. Moore,et al.  Missing heritability and strategies for finding the underlying causes of complex disease , 2010, Nature Reviews Genetics.

[4]  Matej Oresic,et al.  COordination of Standards in MetabOlomicS (COSMOS): facilitating integrated metabolomics data access , 2015, Metabolomics.

[5]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Eric Londin,et al.  Use of linkage analysis, genome-wide association studies, and next-generation sequencing in the identification of disease-causing mutations. , 2013, Methods in molecular biology.

[7]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[8]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[9]  Yasset Perez-Riverol,et al.  Making proteomics data accessible and reusable: Current state of proteomics databases and repositories , 2015, Proteomics.

[10]  M. Tewari,et al.  MicroRNA profiling: approaches and considerations , 2012, Nature Reviews Genetics.

[11]  M. Eijken,et al.  Connectivity Map-based discovery of parbendazole reveals targetable human osteogenic pathway , 2015, Proceedings of the National Academy of Sciences.

[12]  L. Almasy,et al.  Plasma Lipidomic Profile Signature of Hypertension in Mexican American Families: Specific Role of Diacylglycerols , 2013, Hypertension.

[13]  L. Bertalanffy AN OUTLINE OF GENERAL SYSTEM THEORY , 1950, The British Journal for the Philosophy of Science.

[14]  Gary D Bader,et al.  PSICQUIC and PSISCORE: accessing and scoring molecular interactions , 2011, Nature Methods.

[15]  F. Crick,et al.  Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid , 1974, Nature.

[16]  Sara Ballouz,et al.  Bias tradeoffs in the creation and analysis of protein-protein interaction networks. , 2014, Journal of proteomics.

[17]  EOSC Portal,et al.  European Open Science Cloud , 2016, Nature Genetics.

[18]  Vinodh Srinivasasainagendra,et al.  Where in the genome are we? A cautionary tale of database use in genomics research , 2013, Front. Genet..

[19]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[20]  Doron Lancet,et al.  GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data , 2016, Omics : a journal of integrative biology.

[21]  Roger E Bumgarner Overview of DNA microarrays: types, applications, and their future. , 2013, Current protocols in molecular biology.

[22]  Ourania Horaitis,et al.  The challenge of documenting mutation across the genome: The human genome variation society approach , 2004, Human mutation.

[23]  Dylan S. Small,et al.  A review of instrumental variable estimators for Mendelian randomization , 2015, Statistical methods in medical research.

[24]  Johannes Goll,et al.  Protein interaction data curation: the International Molecular Exchange (IMEx) consortium , 2012, Nature Methods.

[25]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[26]  C. Wild Complementing the Genome with an “Exposome”: The Outstanding Challenge of Environmental Exposure Measurement in Molecular Epidemiology , 2005, Cancer Epidemiology Biomarkers & Prevention.

[27]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[28]  Morgan C. Giddings,et al.  Defining functional DNA elements in the human genome , 2014, Proceedings of the National Academy of Sciences.

[29]  J. Bonfield,et al.  Finishing the euchromatic sequence of the human genome , 2004, Nature.

[30]  Panayiotis V. Benos,et al.  mirConnX: condition-specific mRNA-microRNA network integrator , 2011, Nucleic Acids Res..

[31]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[32]  Minping Qian,et al.  Integrative Approaches for microRNA Target Prediction: Combining Sequence Information and the Paired mRNA and miRNA Expression Profiles , 2013 .

[33]  Qinghua Cui,et al.  Drug-repurposing identified the combination of Trolox C and Cytisine for the treatment of type 2 diabetes , 2014, Journal of Translational Medicine.

[34]  Wan Li,et al.  Prioritizing Disease Candidate Proteins in Cardiomyopathy-Specific Protein-Protein Interaction Networks Based on “Guilt by Association” Analysis , 2013, PloS one.

[35]  Kimberly R. Kukurba,et al.  RNA Sequencing and Analysis. , 2015, Cold Spring Harbor protocols.

[36]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[37]  Lennart Martens,et al.  mzML—a Community Standard for Mass Spectrometry Data* , 2010, Molecular & Cellular Proteomics.

[38]  Claudia Manzoni,et al.  Computational analysis of the LRRK2 interactome , 2015, PeerJ.

[39]  Gonçalo R. Abecasis,et al.  Genome-wide association study shows BCL11A associated with persistent fetal hemoglobin and amelioration of the phenotype of β-thalassemia , 2008, Proceedings of the National Academy of Sciences.

[40]  Anushya Muruganujan,et al.  PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of phylogenetic trees , 2012, Nucleic Acids Res..

[41]  Angela Re,et al.  Molecular portraits: the evolution of the concept of transcriptome-based cancer signatures , 2015, Briefings Bioinform..

[42]  Sandra Orchard,et al.  Molecular interaction databases , 2012, Proteomics.

[43]  Andrey V. Kartashov,et al.  BioWardrobe: an integrated platform for analysis of epigenomics and transcriptomics data , 2014, Genome Biology.

[44]  Christie S. Chang,et al.  The BioGRID interaction database: 2013 update , 2012, Nucleic Acids Res..

[45]  J. Shendure The beginning of the end for microarrays? , 2008, Nature Methods.

[46]  Jocelyn Kaiser,et al.  Proteomics. Public-private group maps out initiatives. , 2002, Science.

[47]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[48]  M. Daly,et al.  An Atlas of Genetic Correlations across Human Diseases and Traits , 2015, Nature Genetics.

[49]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[50]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[51]  Aedín C. Culhane,et al.  Public data and open source tools for multi-assay genomic investigation of disease , 2015, Briefings Bioinform..

[52]  Edgar Wingender,et al.  The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation , 2008, Briefings Bioinform..

[53]  Brendan J. Frey,et al.  Bayesian Inference of MicroRNA Targets from Sequence and Expression Data , 2007, J. Comput. Biol..

[54]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[55]  David B. Searls,et al.  The Roots of Bioinformatics , 2010, PLoS Comput. Biol..

[56]  Gary D. Bader,et al.  Pathway Commons, a web resource for biological pathway data , 2010, Nucleic Acids Res..

[57]  Eric Bender,et al.  Big data in biomedicine: 4 big questions , 2015, Nature.

[58]  M. Vidal,et al.  Literature-curated protein interaction datasets , 2009, Nature Methods.

[59]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[60]  Bjarni V. Halldórsson,et al.  Large-scale whole-genome sequencing of the Icelandic population , 2015, Nature Genetics.

[61]  Simon Heath,et al.  A QTL influencing F cell production maps to a gene encoding a zinc-finger protein on chromosome 2p15 , 2007, Nature Genetics.

[62]  Ching-Seng Ang,et al.  FunRich: An open access standalone functional enrichment and interaction network analysis tool , 2015, Proteomics.

[63]  Marco Brandizi,et al.  The BioSample Database (BioSD) at the European Bioinformatics Institute , 2011, Nucleic Acids Res..

[64]  John Degaspari Managing the data explosion. , 2013, Healthcare informatics : the business magazine for information and communication systems.

[65]  Oliver Fiehn,et al.  Mass-spectrometry-based metabolomics: limitations and recommendations for future progress with particular focus on nutrition research , 2009, Metabolomics.

[66]  Steven Finkbeiner,et al.  Cell-Based Screening: Extracting Meaning from Complex Data , 2015, Neuron.

[67]  J. Kleinman,et al.  Spatiotemporal transcriptome of the human brain , 2011, Nature.

[68]  Christoph Steinbeck,et al.  MetaboLights—an open-access general-purpose repository for metabolomics studies and associated meta-data , 2012, Nucleic Acids Res..

[69]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[70]  James K. Ellis,et al.  Systematic integration of molecular profiles identifies miR-22 as a regulator of lipid and folate metabolism in breast cancer cells , 2016, Oncogene.

[71]  Xintao Wu,et al.  An overview of human genetic privacy , 2017, Annals of the New York Academy of Sciences.

[72]  Andrew R. Jones,et al.  ProteomeXchange provides globally co-ordinated proteomics data submission and dissemination , 2014, Nature Biotechnology.

[73]  Robert S. Kirsner,et al.  Integrative analysis of miRNA and mRNA paired expression profiling of primary fibroblast derived from diabetic foot ulcers reveals multiple impaired cellular functions , 2016, Wound repair and regeneration : official publication of the Wound Healing Society [and] the European Tissue Repair Society.

[74]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[75]  Michael P H Stumpf,et al.  Topological sensitivity analysis for systems biology , 2014, Proceedings of the National Academy of Sciences.

[76]  A. Singleton,et al.  Genetic variability in the regulation of gene expression in ten regions of the human brain , 2014, Nature Neuroscience.

[77]  Ben-Ari FuchsShani,et al.  GeneAnalytics: An Integrative Gene Set Analysis Tool for Next Generation Sequencing, RNAseq and Microarray Data , 2016 .

[78]  M. Gilbert,et al.  Clinical Cancer Advances 2013: Annual Report on Progress Against Cancer from the American Society of Clinical Oncology. , 2010, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[79]  D. Meierhofer,et al.  Advantages and Pitfalls of Mass Spectrometry Based Metabolome Profiling in Systems Biology , 2016, International journal of molecular sciences.

[80]  Jean Armengaud,et al.  Improving the quality of genome, protein sequence, and taxonomy databases: A prerequisite for microbiome meta‐omics 2.0 , 2015, Proteomics.

[81]  Kwanjeera Wanichthanarak,et al.  Genomic, Proteomic, and Metabolomic Data Integration Strategies , 2015, Biomarker insights.

[82]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[83]  Gary D Bader,et al.  BMC Biology BioMed Central , 2007 .

[84]  Tom R. Gaunt,et al.  The UK10K project identifies rare variants in health and disease , 2016 .

[85]  M. Nirenberg,et al.  RNA CODEWORDS AND PROTEIN SYNTHESIS. THE NUCLEOTIDE SEQUENCES OF MULTIPLE CODEWORDS FOR PHENYLALANINE, SERINE, LEUCINE, AND PROLINE. , 1965, Science.

[86]  Oznur Tastan,et al.  Integromic Analysis of Genetic Variation and Gene Expression Identifies Networks for Cardiovascular Disease Phenotypes , 2015, Circulation.

[87]  J. Leek,et al.  Temporal dynamics and genetic control of transcription in the human prefrontal cortex , 2011, Nature.

[88]  J. Lupski,et al.  Human genome sequencing in health and disease. , 2012, Annual review of medicine.

[89]  Lennart Martens,et al.  Proteomics databases and repositories. , 2011, Methods in molecular biology.

[90]  Mathias Wilhelm,et al.  Global proteome analysis of the NCI-60 cell line panel. , 2013, Cell reports.

[91]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[92]  Jane Loveland,et al.  The Vertebrate Genome Annotation browser 10 years on , 2013, Nucleic Acids Res..

[93]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[94]  David S. Wishart,et al.  DrugBank 4.0: shedding new light on drug metabolism , 2013, Nucleic Acids Res..

[95]  M. Peters,et al.  Systematic identification of trans eQTLs as putative drivers of known disease associations , 2013, Nature Genetics.

[96]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[97]  Deng Minghua,et al.  Integrative Approaches for microRNA Target Prediction: Combining Sequence Information and the Paired mRNA and miRNA Expression Profiles. , 2013, Current bioinformatics.

[98]  Jaak Vilo,et al.  g:Profiler—a web server for functional interpretation of gene lists (2011 update) , 2011, Nucleic Acids Res..

[99]  Emily K. Tsang,et al.  Effect of predicted protein-truncating genetic variants on the human transcriptome , 2015, Science.

[100]  Robert D. Finn,et al.  The European Bioinformatics Institute in 2016: Data growth and integration , 2015, Nucleic Acids Res..

[101]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[102]  Bronwen L. Aken,et al.  GENCODE: The reference human genome annotation for The ENCODE Project , 2012, Genome research.

[103]  Alexander G. Gray,et al.  Highly-accurate metabolomic detection of early-stage ovarian cancer , 2015, Scientific Reports.

[104]  Chuong B. Do,et al.  Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson’s disease , 2014, Nature Genetics.

[105]  Jörg D. Hoheisel,et al.  Clinical proteomics: Promises, challenges and limitations of affinity arrays , 2015, Proteomics. Clinical applications.

[106]  Helge G. Roider,et al.  Drug2Gene: an exhaustive resource to explore effectively the drug-target relation network , 2013, BMC Bioinformatics.

[107]  Henning Hermjakob,et al.  Analyzing protein-protein interaction networks. , 2012, Journal of proteome research.

[108]  Paul A Clemons,et al.  The Connectivity Map: Using Gene-Expression Signatures to Connect Small Molecules, Genes, and Disease , 2006, Science.

[109]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[110]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[111]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[112]  Qian Wang,et al.  A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing , 2015, Front. Genet..

[113]  Brian T. Lee,et al.  The UCSC Genome Browser database: 2015 update , 2014, Nucleic Acids Res..

[114]  Vivien Marx,et al.  The DNA of a nation , 2015, Nature.

[115]  T. Manolio,et al.  How to Interpret a Genome-wide Association Study Topic Collections , 2022 .

[116]  Arnald Alonso,et al.  Analytical Methods in Untargeted Metabolomics: State of the Art in 2015 , 2015, Front. Bioeng. Biotechnol..

[117]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[118]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[119]  J. Carpten,et al.  Translating RNA sequencing into clinical diagnostics: opportunities and challenges , 2016, Nature Reviews Genetics.

[120]  S. Nelson,et al.  Resolving Individuals Contributing Trace Amounts of DNA to Highly Complex Mixtures Using High-Density SNP Genotyping Microarrays , 2008, PLoS genetics.

[121]  Hui-Hsien Chou,et al.  Thermodynamically optimal whole-genome tiling microarray design and validation , 2016, BMC Research Notes.

[122]  Mathew W. Wright,et al.  Guidelines for human gene nomenclature. , 2002, Genomics.

[123]  R. Abagyan,et al.  METLIN: A Metabolite Mass Spectral Database , 2005, Therapeutic drug monitoring.

[124]  Leonardo Candela European Open Science Cloud , 2019 .

[125]  K. Dolinski,et al.  Use and misuse of the gene ontology annotations , 2008, Nature Reviews Genetics.

[126]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[127]  Michael P. Snyder,et al.  RNA‐Seq: A Method for Comprehensive Transcriptome Analysis , 2010, Current protocols in molecular biology.

[128]  A. Frigessi,et al.  Principles and methods of integrative genomic analyses in cancer , 2014, Nature Reviews Cancer.

[129]  ENCODEConsortium,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[130]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[131]  Damian Smedley,et al.  The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data , 2014, Nucleic Acids Res..

[132]  Gabriele Sales,et al.  MAGIA, a web-based tool for miRNA and Genes Integrated Analysis , 2010, Nucleic Acids Res..

[133]  Purvesh Khatri,et al.  Ontological analysis of gene expression data: current tools, limitations, and open problems , 2005, Bioinform..

[134]  Martin H. Schaefer,et al.  HIPPIE: Integrating Protein Interaction Networks with Experiment Based Quality Scores , 2012, PloS one.

[135]  C. Thermes,et al.  Ten years of next-generation sequencing technology. , 2014, Trends in genetics : TIG.

[136]  Rodrigo Lopez,et al.  Analysis Tool Web Services from the EMBL-EBI , 2013, Nucleic Acids Res..

[137]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[138]  Massimo Negrini,et al.  Integrating miRNA and gene expression profiling analysis revealed regulatory networks in gastrointestinal stromal tumors. , 2016, Epigenomics.

[139]  J. Mattick,et al.  Non-coding RNA. , 2006, Human molecular genetics.

[140]  Nigel W. Hardy,et al.  The Metabolomics Standards Initiative , 2007, Nature Biotechnology.

[141]  Sylvia Stockler,et al.  Treatable inborn errors of metabolism causing intellectual disability: a systematic literature review. , 2012, Molecular genetics and metabolism.

[142]  Martin Eisenacher,et al.  Managing the Data Explosion A Report on the HUPO‐PSI Workshop August 2008, Amsterdam, The Netherlands , 2009, Proteomics.

[143]  Francis S Collins,et al.  A HapMap harvest of insights into the genetics of common disease. , 2008, The Journal of clinical investigation.

[144]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[145]  International Human Genome Sequencing Consortium Finishing the euchromatic sequence of the human genome , 2004 .

[146]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[147]  Susan M. Chang,et al.  Clinical cancer advances 2011: Annual Report on Progress Against Cancer from the American Society of Clinical Oncology. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[148]  M. Vidal,et al.  Selecting causal genes from genome-wide association studies via functionally coherent subnetworks , 2014, Nature Methods.

[149]  T. Williams,et al.  Human red blood cell polymorphisms and malaria. , 2006, Current opinion in microbiology.

[150]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[151]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[152]  Jocelyn Kaiser,et al.  Public-Private Group Maps Out Initiatives , 2002, Science.

[153]  A. M. Turing,et al.  The chemical basis of morphogenesis , 1952, Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences.

[154]  Robert E. Kearney,et al.  A HUPO test sample study reveals common problems in mass spectrometry-based proteomics , 2009, Nature Methods.

[155]  Juan P. Bustamante,et al.  Structural flexibility of the heme cavity in the cold‐adapted truncated hemoglobin from the Antarctic marine bacterium Pseudoalteromonas haloplanktis TAC125 , 2015, The FEBS journal.

[156]  贺福初 Discovery of Novel Genes and Gene Isoforms by Integrating Transcriptomic and Proteomic Profiling from Mouse Liver. , 2014 .

[157]  F. Farassati,et al.  Effect of sequential docetaxel followed by mTOR inhibitor temsirolimus on suppression of PI3K overactivation resistance mechanism. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.