How many human genes can be defined as housekeeping with current expression data?

BackgroundHousekeeping (HK) genes are ubiquitously expressed in all tissue/cell types and constitute a basal transcriptome for the maintenance of basic cellular functions. Partitioning transcriptomes into HK and tissue-specific (TS) genes relatively is fundamental for studying gene expression and cellular differentiation. Although many studies have aimed at large-scale and thorough categorization of human HK genes, a meaningful consensus has yet to be reached.ResultsWe collected two latest gene expression datasets (both EST and microarray data) from public databases and analyzed the gene expression profiles in 18 human tissues that have been well-documented by both two data types. Benchmarked by a manually-curated HK gene collection (HK408), we demonstrated that present data from EST sampling was far from saturated, and the inadequacy has limited the gene detectability and our understanding of TS expressions. Due to a likely over-stringent threshold, microarray data showed higher false negative rate compared with EST data, leading to a significant underestimation of HK genes. Based on EST data, we found that 40.0% of the currently annotated human genes were universally expressed in at least 16 of 18 tissues, as compared to only 5.1% specifically expressed in a single tissue. Our current EST-based estimate on human HK genes ranged from 3,140 to 6,909 in number, a ten-fold increase in comparison with previous microarray-based estimates.ConclusionWe concluded that a significant fraction of human genes, at least in the currently annotated data depositories, was broadly expressed. Our understanding of tissue-specific expression was still preliminary and required much more large-scale and high-quality transcriptomic data in future studies. The new HK gene list categorized in this study will be useful for genome-wide analyses on structural and functional features of HK genes.

[1]  T Lagrange,et al.  The general transcription factors of RNA polymerase II. , 1996, Genes & development.

[2]  T. Grisar,et al.  Housekeeping genes as internal standards: use and limits. , 1999, Journal of biotechnology.

[3]  Jing Zhao,et al.  Formation of mRNA 3′ Ends in Eukaryotes: Mechanism, Regulation, and Interrelationships with Other Steps in mRNA Synthesis , 1999, Microbiology and Molecular Biology Reviews.

[4]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information , 2021, Nucleic Acids Res..

[5]  R. Young,et al.  Transcription of eukaryotic protein-coding genes. , 2000, Annual review of genetics.

[6]  Aaron J. Shatkin,et al.  The ends of the affair: Capping and polyadenylation , 2000, Nature Structural Biology.

[7]  J. Warrington,et al.  Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes. , 2000, Physiological genomics.

[8]  R. Strausberg,et al.  The cancer genome anatomy project: building an annotated gene index. , 2000, Trends in genetics : TIG.

[9]  A. Butte,et al.  Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". , 2001, Physiological genomics.

[10]  G. Stephanopoulos,et al.  A compendium of gene expression in normal human tissues. , 2001, Physiological genomics.

[11]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[12]  A. Orth,et al.  Large-scale analysis of the human and mouse transcriptomes , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[13]  N. Kenmochi,et al.  The human ribosomal protein genes: sequencing and comparative analysis of 73 genes. , 2002, Genome research.

[14]  A. Ciechanover,et al.  The ubiquitin-proteasome proteolytic pathway: destruction for the sake of construction. , 2002, Physiological reviews.

[15]  M. Mann,et al.  Large-scale Proteomic Analysis of the Human Spliceosome References , 2006 .

[16]  Steven P. Gygi,et al.  Comprehensive proteomic analysis of the human spliceosome , 2002, Nature.

[17]  T. Hudson,et al.  Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. , 2002, Genome research.

[18]  Ed Hurt,et al.  A Conserved mRNA Export Machinery Coupled to pre-mRNA Splicing , 2002, Cell.

[19]  Alexander E Vinogradov,et al.  Isochores and tissue-specificity. , 2003, Nucleic acids research.

[20]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology , 2003, Nucleic Acids Res..

[21]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[22]  Melissa S Jurica,et al.  Pre-mRNA splicing: awash in a sea of proteins. , 2003, Molecular cell.

[23]  Kaushal Kumar,et al.  Comparative analysis of chromatin landscape in regulatory regions of human housekeeping and tissue specific genes , 2005, BMC Bioinformatics.

[24]  E. Liu,et al.  Interrogating the transcriptome. , 2004, Trends in biotechnology.

[25]  A. Fraser,et al.  Protein domains enriched in mammalian tissue-specific or widely expressed genes. , 2004, Trends in genetics : TIG.

[26]  Jon R Lorsch,et al.  The molecular mechanics of eukaryotic translation. , 2003, Annual review of biochemistry.

[27]  Wen-Hsiung Li,et al.  Mammalian housekeeping genes evolve more slowly than tissue-specific genes. , 2004, Molecular biology and evolution.

[28]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Nick Proudfoot,et al.  New perspectives on connecting messenger RNA 3' end formation to transcription. , 2004, Current opinion in cell biology.

[30]  A. Vinogradov Compactness of human housekeeping genes: selection for economy or genomic design? , 2004, Trends in genetics : TIG.

[31]  R. Barber,et al.  GAPDH as a housekeeping gene: analysis of GAPDH mRNA expression in a panel of 72 human tissues. , 2005, Physiological genomics.

[32]  Piero Carninci,et al.  Tag-based approaches for transcriptome research and genome annotation , 2005, Nature Methods.

[33]  J. Thornton,et al.  Relationship between the tissue-specificity of mouse gene expression and the evolutionary origin and function of the proteins , 2005, Genome Biology.

[34]  T. Schwartz Modularity within the architecture of the nuclear pore complex. , 2005, Current opinion in structural biology.

[35]  C. Hutchison,et al.  Essential genes of a minimal bacterium. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Terrence S. Furey,et al.  The UCSC Genome Browser Database: update 2006 , 2005, Nucleic Acids Res..

[37]  Stuart Aitken,et al.  Mining housekeeping genes with a Naive Bayes classifier , 2006, BMC Genomics.

[38]  Gopal R. Gopinath,et al.  Reactome: a knowledge base of biologic pathways and processes , 2007, Genome Biology.

[39]  Z. Szallasi,et al.  Reliability and reproducibility issues in DNA microarray measurements. , 2006, Trends in genetics : TIG.

[40]  Elizabeth J. Tran,et al.  Dynamic Nuclear Pore Complexes: Life on the Edge , 2006, Cell.

[41]  L. Du,et al.  Multiplex sequencing of paired-end ditags (MS-PET): a strategy for the ultra-high-throughput analysis of transcriptomes and genomes , 2006, Nucleic acids research.

[42]  Piero Carninci,et al.  Tagging mammalian transcription complexity. , 2006, Trends in genetics : TIG.

[43]  L. Mularoni,et al.  Housekeeping genes tend to show reduced upstream sequence conservation , 2007, Genome Biology.

[44]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[45]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[46]  T. Gingeras,et al.  Genome-wide transcription and the implications for genomic organization , 2007, Nature Reviews Genetics.

[47]  Thomas R Gingeras,et al.  Origin of phenotypes: genes and transcripts. , 2007, Genome research.

[48]  David Haussler,et al.  The UCSC genome browser database: update 2007 , 2006, Nucleic Acids Res..

[49]  Boris Lenhard,et al.  Mammalian RNA polymerase II core promoters: insights from genome-wide studies , 2007, Nature Reviews Genetics.

[50]  Steve Horvath,et al.  Repetitive sequence environment distinguishes housekeeping genes. , 2007, Gene.

[51]  Fuhong He,et al.  Modeling Transcriptome Based on Transcript-Sampling Data , 2008, PloS one.