Long non-coding RNA transcriptome of uncharacterized samples can be accurately imputed using protein-coding genes

Long non-coding RNAs (lncRNAs) play an important role in gene regulation and are increasingly being recognized as crucial mediators of disease pathogenesis. However, the vast majority of published transcriptome datasets lack high-quality lncRNA profiles compared to protein-coding genes (PCGs). Here we propose a framework to harnesses the correlative expression patterns between lncRNA and PCGs to impute unknown lncRNA profiles. The lncRNA expression imputation (LEXI) framework enables characterization of lncRNA transcriptome of samples lacking any lncRNA data using only their PCG profiles. We compare various machine learning and missing value imputation algorithms to implement LEXI and demonstrate the feasibility of this approach to impute lncRNA transcriptome of normal and cancer tissues. Additionally, we determine the factors that influence imputation accuracy and provide guidelines for implementing this approach.

[1]  S. Miyano,et al.  Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. , 2011, Cancer research.

[2]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[3]  Emanuel J. V. Gonçalves,et al.  A Landscape of Pharmacogenomic Interactions in Cancer , 2016, Cell.

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  X. Wan,et al.  The long non-coding RNA HOTAIR is upregulated in endometrial carcinoma and correlates with poor prognosis. , 2014, International journal of molecular medicine.

[6]  Hong Yan,et al.  Missing value imputation for gene expression data: computational techniques to recover missing data from available information , 2011, Briefings Bioinform..

[7]  Jindan Yu,et al.  LncRNA HOTAIR Enhances the Androgen-Receptor-Mediated Transcriptional Program and Drives Castration-Resistant Prostate Cancer , 2015, Cell reports.

[8]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[9]  Hugues Bersini,et al.  Batch effect removal methods for microarray gene expression data integration: a survey , 2013, Briefings Bioinform..

[10]  Min-hui Yang,et al.  MALAT-1: a long non-coding RNA and its important 3' end functional motif in colorectal cancer metastasis. , 2011, International journal of oncology.

[11]  Robert Petryszak,et al.  ArrayExpress update—simplifying data submissions , 2014, Nucleic Acids Res..

[12]  Ming Liu,et al.  Long intergenic noncoding RNA HOTAIR is overexpressed and regulates PTEN methylation in laryngeal squamous cell carcinoma. , 2013, The American journal of pathology.

[13]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[14]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[15]  Jun S. Liu,et al.  The Genotype-Tissue Expression (GTEx) pilot analysis: Multitissue gene regulation in humans , 2015, Science.

[16]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[17]  H. Yao,et al.  Long Noncoding RNA HOTAIR Is a Prognostic Marker for Esophageal Squamous Cell Carcinoma Progression and Survival , 2013, PloS one.

[18]  Haiyang Xie,et al.  Overexpression of Long Non-coding RNA HOTAIR Predicts Tumor Recurrence in Hepatocellular Carcinoma Patients Following Liver Transplantation , 2011, Annals of Surgical Oncology.

[19]  A. Chinnaiyan,et al.  The emergence of lncRNAs in cancer biology. , 2011, Cancer discovery.

[20]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[21]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[22]  J. Rinn,et al.  Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution , 2015, Genome Biology.

[23]  Erik Larsson,et al.  Pan-cancer transcriptomic analysis associates long non-coding RNAs with key mutational driver events , 2016, Nature Communications.

[24]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[25]  Jin-hua Jiang,et al.  Upregulation of the long noncoding RNA HOTAIR predicts recurrence in stage Ta/T1 bladder cancer , 2014, Tumor Biology.

[26]  Thomas D. Wu,et al.  A comprehensive transcriptional portrait of human cancer cell lines , 2014, Nature Biotechnology.

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[28]  Howard Y. Chang,et al.  Functional Demarcation of Active and Silent Chromatin Domains in Human HOX Loci by Noncoding RNAs , 2007, Cell.

[29]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[30]  Florenza Lüder Ripoli,et al.  A Comparison of Fresh Frozen vs. Formalin-Fixed, Paraffin-Embedded Specimens of Canine Mammary Tumors via Branched-DNA Assay , 2016, International journal of molecular sciences.

[31]  C. Gong,et al.  Long non‐coding RNA HOTAIR is an independent prognostic marker for nasopharyngeal carcinoma progression and survival , 2013, Cancer science.

[32]  Maite Huarte The emerging role of lncRNAs in cancer , 2015, Nature Medicine.

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[34]  Melissa J. Fullwood,et al.  Roles, Functions, and Mechanisms of Long Non-coding RNAs in Cancer , 2016, Genom. Proteom. Bioinform..

[35]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[36]  Lin S. Chen,et al.  Imputing Gene Expression in Uncollected Tissues Within and Beyond GTEx. , 2015, American journal of human genetics.

[37]  Cole Trapnell,et al.  Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. , 2011, Genes & development.

[38]  Jun Ma,et al.  Erlotinib versus chemotherapy as first-line treatment for patients with advanced EGFR mutation-positive non-small-cell lung cancer (OPTIMAL, CTONG-0802): a multicentre, open-label, randomised, phase 3 study. , 2011, The Lancet. Oncology.

[39]  Michael F. Lin,et al.  Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals , 2009, Nature.

[40]  Jing Liu,et al.  The long non-coding RNA HOTAIR indicates a poor prognosis and promotes metastasis in non-small cell lung cancer , 2013, BMC Cancer.

[41]  Michael Thomas,et al.  MALAT-1, a novel noncoding RNA, and thymosin β4 predict metastasis and survival in early-stage non-small cell lung cancer , 2003, Oncogene.