LONGO: an R package for interactive gene length dependent analysis for neuronal identity

Motivation Reprogramming somatic cells into neurons holds great promise to model neuronal development and disease. The efficiency and success rate of neuronal reprogramming, however, may vary between different conversion platforms and cell types, thereby necessitating an unbiased, systematic approach to estimate neuronal identity of converted cells. Recent studies have demonstrated that long genes (>100 kb from transcription start to end) are highly enriched in neurons, which provides an opportunity to identify neurons based on the expression of these long genes. Results We have developed a versatile R package, LONGO, to analyze gene expression based on gene length. We propose a systematic analysis of long gene expression (LGE) with a metric termed the long gene quotient (LQ) that quantifies LGE in RNA‐seq or microarray data to validate neuronal identity at the single‐cell and population levels. This unique feature of neurons provides an opportunity to utilize measurements of LGE in transcriptome data to quickly and easily distinguish neurons from non‐neuronal cells. By combining this conceptual advancement and statistical tool in a user‐friendly and interactive software package, we intend to encourage and simplify further investigation into LGE, particularly as it applies to validating and improving neuronal differentiation and reprogramming methodologies. Availability and implementation LONGO is freely available for download at https://github.com/biohpc/longo.

[1]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[2]  Peter A. Jones,et al.  Appendix: Supplemental Figures , 2020, Deserter Country.

[3]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[4]  Matheus B. Victor,et al.  Generation of Human Striatal Neurons by MicroRNA-Dependent Direct Conversion of Fibroblasts , 2014, Neuron.

[5]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[6]  Maria Teresa Dell'Anno,et al.  Rapid Conversion of Fibroblasts into Functional Forebrain GABAergic Interneurons by Direct Genetic Reprogramming. , 2015, Cell stem cell.

[7]  S. Zipursky,et al.  Probabilistic Splicing of Dscam1 Establishes Identity at the Level of Single Neurons , 2013, Cell.

[8]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[9]  Michael D. Wilson,et al.  The Evolutionary Landscape of Alternative Splicing in Vertebrate Species , 2012, Science.

[10]  F. Gage,et al.  Evaluating cell reprogramming, differentiation and conversion technologies in neuroscience , 2016, Nature Reviews Neuroscience.

[11]  Harrison W. Gabel,et al.  Disruption of DNA methylation-dependent long gene repression in Rett syndrome , 2015, Nature.

[12]  Grace E. Lidgerwood,et al.  Enriched retinal ganglion cells derived from human embryonic stem cells , 2016, Scientific Reports.

[13]  E. Chang,et al.  Purification and Characterization of Progenitor and Mature Human Astrocytes Reveals Transcriptional and Functional Differences with Mouse , 2016, Neuron.

[14]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[15]  S. Nelson,et al.  Cell-Type-Specific Repression by Methyl-CpG-Binding Protein 2 Is Biased toward Long Genes , 2014, The Journal of Neuroscience.

[16]  J. Ule,et al.  Evolution of Nova-Dependent Splicing Regulation in the Brain , 2007, PLoS genetics.

[17]  C. Burge,et al.  Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues , 2012, Science.

[18]  Hynek Wichterle,et al.  Ligand-dependent dynamics of retinoic acid receptor binding during early neurogenesis , 2011, Genome Biology.

[19]  N. Neff,et al.  Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq , 2016, Nature.

[20]  J. Dougherty,et al.  MicroRNAs Induce a Permissive Chromatin Environment that Enables Neuronal Subtype-Specific Reprogramming of Adult Human Fibroblasts. , 2017, Cell stem cell.

[21]  Qinying Wang,et al.  Direct Conversion of Normal and Alzheimer's Disease Human Fibroblasts into Neuronal Cells by Small Molecules. , 2015, Cell Stem Cell.

[22]  J. Galagan,et al.  Cross-kingdom patterns of alternative splicing and splice recognition , 2008, Genome Biology.

[23]  Xiang-Dong Fu,et al.  Sequential Regulatory Loops as Key Gatekeepers for Neuronal Reprogramming in Human Cells , 2016, Nature Neuroscience.

[24]  Pietro Liò,et al.  The BioMart community portal: an innovative alternative to large, centralized data repositories , 2015, Nucleic Acids Res..

[25]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[26]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[27]  Allan R. Jones,et al.  Transcriptional Landscape of the Prenatal Human Brain , 2014, Nature.

[28]  B. Oliver,et al.  Microarrays, deep sequencing and the true measure of the transcriptome , 2011, BMC Biology.

[29]  Cynthia C. Hession,et al.  Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons , 2016, Science.

[30]  Wolfgang Huber,et al.  RNA-Seq workflow: gene-level exploratory analysis and differential expression , 2015, F1000Research.

[31]  C. Mason,et al.  A rat RNA-Seq transcriptomic BodyMap across 11 organs and 4 developmental stages , 2014, Nature Communications.

[32]  Stormy J. Chamberlain,et al.  Topoisomerases facilitate transcription of long genes linked to autism , 2013, Nature.

[33]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[34]  Matheus B. Victor,et al.  Striatal neurons directly converted from Huntington’s disease patient fibroblasts recapitulate age-associated disease phenotypes , 2018, Nature Neuroscience.