Basic properties and information theory of Audic-Claverie statistic for analyzing cDNA arrays

BackgroundThe Audic-Claverie method [1] has been and still continues to be a popular approach for detection of differentially expressed genes in the SAGE framework. The method is based on the assumption that under the null hypothesis tag counts of the same gene in two libraries come from the same but unknown Poisson distribution. The problem is that each SAGE library represents only a single measurement. We ask: Given that the tag count samples from SAGE libraries are extremely limited, how useful actually is the Audic-Claverie methodology? We rigorously analyze the A-C statistic that forms a backbone of the methodology and represents our knowledge of the underlying tag generating process based on one observation.ResultsWe show that the A-C statistic and the underlying Poisson distribution of the tag counts share the same mode structure. Moreover, the K-L divergence from the true unknown Poisson distribution to the A-C statistic is minimized when the A-C statistic is conditioned on the mode of the Poisson distribution. Most importantly, the expectation of this K-L divergence never exceeds 1/2 bit.ConclusionA rigorous underpinning of the Audic-Claverie methodology has been missing. Our results constitute a rigorous argument supporting the use of Audic-Claverie method even though the SAGE libraries represent very sparse samples.

[1]  C. Appledorn The Entropy of a Poisson Distribution , 1987 .

[2]  R. Evans,et al.  The Entropy of a Poisson Distribution (C. Robert Appledorn) , 1988 .

[3]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.

[4]  Rithy K. Roth,et al.  Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays , 2000, Nature Biotechnology.

[5]  D. Stekel,et al.  The comparison of gene expression from multiple cDNA libraries. , 2000, Genome research.

[6]  J. Ruijter,et al.  Statistical evaluation of SAGE libraries: consequences for experimental design. , 2002, Physiological genomics.

[7]  Ji Huang,et al.  [Serial analysis of gene expression]. , 2002, Yi chuan = Hereditas.

[8]  Nanxiang Ge,et al.  An Empirical Bayesian Significance Test of cDNA Library Data , 2004, J. Comput. Biol..

[9]  Cinzia Pizzi,et al.  A multistep bioinformatic approach detects putative regulatory elements in gene promoters , 2005, BMC Bioinformatics.

[10]  S. Tanksley,et al.  Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts , 2005, Theoretical and Applied Genetics.

[11]  A. Vercesi,et al.  The plant energy-dissipating mitochondrial systems: depicting the genomic structure and the expression profiles of the gene families of uncoupling protein and alternative oxidase in monocots and dicots. , 2006, Journal of experimental botany.

[12]  M. Metta,et al.  No Accelerated Rate of Protein Evolution in Male-Biased Drosophila pseudoobscura Genes , 2006, Genetics.

[13]  Yangxing Zhao,et al.  Characterization and quantification of mRNA transcripts in ejaculated spermatozoa of fertile men by serial analysis of gene expression. , 2006, Human reproduction.

[14]  Ryan D. Morin,et al.  Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. , 2008, Genome research.

[15]  Hyun-Jin Kim,et al.  Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome , 2008, BMC Plant Biology.

[16]  C. Molina,et al.  SuperSAGE: the drought stress-responsive transcriptome of chickpea roots , 2008, BMC Genomics.

[17]  C. V. Van Tassell,et al.  Comparative transcriptome analysis of in vivo‐ and in vitro‐produced porcine blastocysts by small amplified RNA‐Serial analysis of gene expression (SAR‐SAGE) , 2008, Molecular reproduction and development.

[18]  G. Cervigni,et al.  Gene expression in diplosporous and sexual Eragrostis curvula genotypes with differing ploidy levels , 2008, Plant Molecular Biology.

[19]  L. Varuzza,et al.  Significance tests for comparing digital gene expression profiles , 2008, 0806.3274.