Defining Essentiality Score of Protein-Coding Genes and Long Noncoding RNAs

Measuring the essentiality of genes is critically important in biology and medicine. Here we proposed a computational method, GIC (Gene Importance Calculator), which can efficiently predict the essentiality of both protein-coding genes and long noncoding RNAs (lncRNAs) based on only sequence information. For identifying the essentiality of protein-coding genes, GIC outperformed well-established computational scores. In an independent mouse lncRNA dataset, GIC also achieved an exciting performance (AUC = 0.918). In contrast, the traditional computational methods are not applicable to lncRNAs. Moreover, we explored several potential applications of GIC score. Firstly, we revealed a correlation between gene GIC score and research hotspots of genes. Moreover, GIC score can be used to evaluate whether a gene in mouse is representative for its homolog in human by dissecting its cross-species difference. This is critical for basic medicine because many basic medical studies are performed in animal models. Finally, we showed that GIC score can be used to identify candidate genes from a transcriptomics study. GIC is freely available at http://www.cuilab.cn/gic/.

[1]  Xing Chen,et al.  IRWRLDA: improved random walk with restart for lncRNA-disease association prediction , 2016, Oncotarget.

[2]  Xueyong Li,et al.  Essential protein discovery based on a combination of modularity and conservatism. , 2016, Methods.

[3]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[4]  Zhongzheng Cao,et al.  Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR–Cas9 library , 2016, Nature Biotechnology.

[5]  John S. Hawkins,et al.  A Comprehensive, CRISPR-based Functional Analysis of Essential Genes in Bacteria , 2016, Cell.

[6]  Janan T. Eppig,et al.  Allele, phenotype and disease data at Mouse Genome Informatics: improving access and analysis , 2015, Mammalian Genome.

[7]  Feng-Biao Guo,et al.  Geptop: A Gene Essentiality Prediction Tool for Sequenced Bacterial Genomes Based on Orthology and Phylogeny , 2013, PloS one.

[8]  Shiyou Zhu,et al.  High-throughput screening of a CRISPR/Cas9 library for functional genomics in human cells , 2014, Nature.

[9]  A. Baras,et al.  Integrative genomics identifies DSCR1 (RCAN1) as a novel NFAT-dependent mediator of phenotypic modulation in vascular smooth muscle cells. , 2010, Human molecular genetics.

[10]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[11]  Yi Pan,et al.  Construction of Refined Protein Interaction Network for Predicting Essential Proteins , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Huang Gao,et al.  Database resources of the National Center for Biotechnology Information , 2015, Nucleic Acids Res..

[13]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[14]  Michael Morse,et al.  Multiple knockout mouse models reveal lincRNAs are required for life and brain development , 2013, eLife.

[15]  B. Geng,et al.  FAM3A promotes vascular smooth muscle cell proliferation and migration and exacerbates neointima formation in rat artery after balloon injury. , 2014, Journal of molecular and cellular cardiology.

[16]  Amalio Telenti,et al.  Human gene essentiality , 2017, Nature Reviews Genetics.

[17]  Julio Saez-Rodriguez,et al.  A CRISPR Dropout Screen Identifies Genetic Vulnerabilities and Therapeutic Targets in Acute Myeloid Leukemia , 2016, Cell reports.

[18]  Hyungwon Choi,et al.  Gene Essentiality Is a Quantitative Property Linked to Cellular Evolvability , 2015, Cell.

[19]  Jeffrey D Orth,et al.  What is flux balance analysis? , 2010, Nature Biotechnology.

[20]  Xing Chen,et al.  Long non-coding RNAs and complex diseases: from experimental results to computational models , 2016, Briefings Bioinform..

[21]  Yi Pan,et al.  A Topology Potential-Based Method for Identifying Essential Proteins from PPI Networks , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Xing Chen,et al.  Predicting lncRNA-disease associations and constructing lncRNA functional similarity network based on the information of miRNA , 2015, Scientific Reports.

[23]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[24]  Terrence S. Furey,et al.  The UCSC Table Browser data retrieval tool , 2004, Nucleic Acids Res..

[25]  R. Bernards,et al.  CRISPR knockout screening outperforms shRNA and CRISPRi in identifying essential genes , 2016, Nature Biotechnology.

[26]  Jichun Yang,et al.  FAM3A activates PI3K p110α/Akt signaling to ameliorate hepatic gluconeogenesis and lipogenesis , 2014, Hepatology.

[27]  E. Lander,et al.  Identification and characterization of essential genes in the human genome , 2015, Science.

[28]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[29]  D. Morgens,et al.  Systematic comparison of CRISPR-Cas9 and RNAi screens for essential genes , 2016, Nature Biotechnology.

[30]  Hao Luo,et al.  Accurate prediction of human essential genes using only nucleotide composition and association information , 2016, bioRxiv.

[31]  Jens Nielsen,et al.  Flux balance analysis predicts essential genes in clear cell renal cell carcinoma metabolism , 2015, Scientific Reports.

[32]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[33]  Xing Chen,et al.  Novel human lncRNA-disease association inference based on lncRNA expression profiles , 2013, Bioinform..

[34]  Qiangfeng Cliff Zhang,et al.  Landscape and variation of RNA secondary structure across the human transcriptome , 2014, Nature.

[35]  Wei Wu,et al.  NONCODE 2016: an informative and valuable data source of long non-coding RNAs , 2015, Nucleic Acids Res..

[36]  R. Korona,et al.  Gene dispensability. , 2011, Current opinion in biotechnology.

[37]  Judith A. Blake,et al.  Mouse genome database 2016 , 2015, Nucleic Acids Res..

[38]  G. Superti-Furga,et al.  Gene essentiality and synthetic lethality in haploid human cells , 2015, Science.

[39]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[40]  S. Dhanasekaran,et al.  The landscape of long noncoding RNAs in the human transcriptome , 2015, Nature Genetics.

[41]  Norman Pavelka,et al.  Emerging and evolving concepts in gene essentiality , 2017, Nature Reviews Genetics.

[42]  Alexander Souvorov,et al.  The relationship of protein conservation and sequence length , 2002, BMC Evolutionary Biology.

[43]  Yan Lin,et al.  DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements , 2013, Nucleic Acids Res..

[44]  Xing Chen,et al.  LncRNADisease: a database for long-non-coding RNA-associated diseases , 2012, Nucleic Acids Res..

[45]  E. Lander,et al.  Genetic Screens in Human Cells Using the CRISPR-Cas9 System , 2013, Science.

[46]  Arul M. Chinnaiyan,et al.  Cancer transcriptome profiling at the juncture of clinical translation , 2017, Nature Reviews Genetics.