CellMapper: rapid and accurate inference of gene expression in difficult-to-isolate cell types

We present a sensitive approach to predict genes expressed selectively in specific cell types, by searching publicly available expression data for genes with a similar expression profile to known cell-specific markers. Our method, CellMapper, strongly outperforms previous computational algorithms to predict cell type-specific expression, especially for rare and difficult-to-isolate cell types. Furthermore, CellMapper makes accurate predictions for human brain cell types that have never been isolated, and can be rapidly applied to diverse cell types from many tissues. We demonstrate a clinically relevant application to prioritize candidate genes in disease susceptibility loci identified by GWAS.

[1]  Michael J. Lush,et al.  HCOP: a searchable database of human orthology predictions , 2006, Briefings Bioinform..

[2]  R. Faull,et al.  Population-specific expression analysis (PSEA) reveals molecular changes in diseased brain , 2011, Nature Methods.

[3]  F. Benes,et al.  GABAergic Interneurons: Implications for Understanding Schizophrenia and Bipolar Disorder , 2001, Neuropsychopharmacology.

[4]  H. Parkinson,et al.  Large scale comparison of global gene expression patterns in human and mouse , 2010, Genome Biology.

[5]  S. Danese,et al.  Platelets in Inflammatory Bowel Disease: Clinical, Pathogenic, and Therapeutic Implications , 2004, American Journal of Gastroenterology.

[6]  F. Speleman,et al.  Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes , 2002, Genome Biology.

[7]  S. Potter,et al.  Defining the Molecular Character of the Developing and Adult Kidney Podocyte , 2011, PloS one.

[8]  J. Grimm,et al.  Molecular basis for catecholaminergic neuron diversity. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Christian Gieger,et al.  New gene functions in megakaryopoiesis and platelet formation , 2011, Nature.

[10]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[11]  Hans Clevers,et al.  Long-term expansion of epithelial organoids from human colon, adenoma, adenocarcinoma, and Barrett's epithelium. , 2011, Gastroenterology.

[12]  S. Horvath,et al.  Divergence of human and mouse brain transcriptome highlights Alzheimer disease pathways , 2010, Proceedings of the National Academy of Sciences.

[13]  Yi Zhong,et al.  Digital sorting of complex tissues for cell type-specific gene expression profiles , 2013, BMC Bioinformatics.

[14]  Liangjiang Wang,et al.  Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data , 2013, BMC Medical Genomics.

[15]  Allan R. Jones,et al.  An anatomically comprehensive atlas of the adult human brain transcriptome , 2012, Nature.

[16]  H. Parkinson,et al.  A global map of human gene expression , 2010, Nature Biotechnology.

[17]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[18]  Kai Li,et al.  Exploring the functional landscape of gene expression: directed search of large microarray compendia , 2007, Bioinform..

[19]  Sacha B. Nelson,et al.  A Quantitative Comparison of Cell-Type-Specific Microarray Gene Expression Profiling Methods in the Mouse Brain , 2011, PloS one.

[20]  Jiang Qian,et al.  TiGER: A database for tissue-specific gene expression and regulation , 2008, BMC Bioinformatics.

[21]  S. Shen-Orr,et al.  Computational deconvolution: extracting cell type-specific information from heterogeneous samples. , 2013, Current opinion in immunology.

[22]  Alex A. Pollen,et al.  Radial glia require PDGFD/PDGFRß signaling in human but not mouse neocortex , 2014, Nature.

[23]  Christian Gieger,et al.  Seventy-five genetic loci influencing the human red blood cell , 2012, Nature.

[24]  Nathaniel G Mahieu,et al.  The Disruption of Celf6, a Gene Identified by Translational Profiling of Serotonergic Neurons, Results in Autism-Related Behaviors , 2013, The Journal of Neuroscience.

[25]  Daniel S. Himmelstein,et al.  Understanding multicellular function and disease with human tissue-specific networks , 2015, Nature Genetics.

[26]  Jian Ye,et al.  Primer-BLAST: A tool to design target-specific primers for polymerase chain reaction , 2012, BMC Bioinformatics.

[27]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[28]  O. Troyanskaya,et al.  Defining cell-type specificity at the transcriptional level in human disease , 2013, Genome research.

[29]  Enrico Petretto,et al.  Multi-tissue Analysis of Co-expression Networks by Higher-Order Generalized Singular Value Decomposition Identifies Functionally Coherent Transcriptional Modules , 2014, PLoS genetics.

[30]  Olga G. Troyanskaya,et al.  Global Prediction of Tissue-Specific Gene Expression and Context-Dependent Gene Networks in Caenorhabditis elegans , 2009, PLoS Comput. Biol..

[31]  A. Owen,et al.  A gene recommender algorithm to identify coexpressed genes in C. elegans. , 2003, Genome research.

[32]  B. Hall,et al.  Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest , 2006, Biological reviews of the Cambridge Philosophical Society.

[33]  M. Daly,et al.  Identifying Relationships among Genomic Disease Regions: Predicting Genes at Pathogenic SNP Associations and Rare Deletions , 2009, PLoS genetics.

[34]  S. Nelson,et al.  Molecular taxonomy of major neuronal classes in the adult mouse forebrain , 2006, Nature Neuroscience.

[35]  T. Maniatis,et al.  An RNA-Sequencing Transcriptome and Splicing Database of Glia, Neurons, and Vascular Cells of the Cerebral Cortex , 2014, The Journal of Neuroscience.

[36]  Jian-Ping Xu 许建平,et al.  Roles of NG2 glial cells in diseases of the central nervous system , 2011, Neuroscience Bulletin.

[37]  S. Gygi,et al.  Identification of a Unique TGF-β Dependent Molecular and Functional Signature in Microglia , 2013, Nature Neuroscience.

[38]  David Bryder,et al.  Elucidation of the phenotypic, functional, and molecular topography of a myeloerythroid progenitor cell hierarchy. , 2007, Cell stem cell.

[39]  Petter Mostad,et al.  Prediction of cell type-specific gene modules: identification and initial characterization of a core set of smooth muscle-specific genes. , 2003, Genome research.

[40]  Jamie A Davies,et al.  GUDMAP: the genitourinary developmental molecular anatomy project. , 2008, Journal of the American Society of Nephrology : JASN.

[41]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[42]  You Zhou,et al.  Inferring Gene Regulatory Networks by Singular Value Decomposition and Gravitation Field Algorithm , 2012, PloS one.

[43]  C. Seoighe,et al.  Semi-supervised Nonnegative Matrix Factorization for gene expression deconvolution: a case study. , 2012, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[44]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[45]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[46]  Cesare Furlanello,et al.  A promoter-level mammalian expression atlas , 2015 .

[47]  Ta-Chiang Liu,et al.  Genetics and Pathogenesis of Inflammatory Bowel Disease. , 2016, Annual review of pathology.

[48]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[49]  Z. Szallasi,et al.  Correction of technical bias in clinical microarray data improves concordance with known biological information , 2008, Genome Biology.

[50]  Joachim Selbig,et al.  Biomarker discovery in heterogeneous tissue samples -taking the in-silico deconfounding approach , 2010, BMC Bioinformatics.

[51]  Renaud Gaujoux,et al.  CellMix: a comprehensive toolbox for gene expression deconvolution , 2013, Bioinform..

[52]  J. Mackay,et al.  Trim58 degrades Dynein and regulates terminal erythropoiesis. , 2014, Developmental cell.

[53]  S. Horvath,et al.  Functional organization of the transcriptome in human brain , 2008, Nature Neuroscience.

[54]  R. Kolde,et al.  Mining for coexpression across hundreds of datasets using novel rank aggregation and visualization methods , 2009, Genome Biology.