Pushing for the Extreme: Estimation of Poisson Distribution from Low Count Unreplicated Data - How Close Can We Get?

Studies of learning algorithms typically concentrate on situations where potentially ever growing training sample is available. Yet, there can be situations (e.g., detection of differentially expressed genes on unreplicated data or estimation of time delay in non-stationary gravitationally lensed photon streams) where only extremely small samples can be used in order to perform an inference. On unreplicated data, the inference has to be performed on the smallest sample possible—sample of size 1. We study whether anything useful can be learnt in such extreme situations by concentrating on a Bayesian approach that can account for possible prior information on expected counts. We perform a detailed information theoretic study of such Bayesian estimation and quantify the effect of Bayesian averaging on its first two moments. Finally, to analyze potential benefits of the Bayesian approach, we also consider Maximum Likelihood (ML) estimation as a baseline approach. We show both theoretically and empirically that the Bayesian model averaging can be potentially beneficial.

[1]  Peter Tiño,et al.  How accurate are the time delay estimates in gravitational lensing? , 2006, ArXiv.

[2]  William H. Press,et al.  The Time Delay of Gravitational Lens 0957+561. I. Methodology and Analysis of Optical Photometric Data , 1992 .

[3]  Peter Tiño,et al.  Uncovering delayed patterns in noisy and irregularly sampled time series: An astronomy application , 2009, Pattern Recognit..

[4]  R. Doerge,et al.  Statistical Design and Analysis of RNA Sequencing Data , 2010, Genetics.

[5]  C. V. Van Tassell,et al.  Comparative transcriptome analysis of in vivo‐ and in vitro‐produced porcine blastocysts by small amplified RNA‐Serial analysis of gene expression (SAR‐SAGE) , 2008, Molecular reproduction and development.

[6]  Peter Tiño,et al.  Basic properties and information theory of Audic-Claverie statistic for analyzing cDNA arrays , 2009, BMC Bioinformatics.

[7]  L. Varuzza,et al.  Significance tests for comparing digital gene expression profiles , 2008, 0806.3274.

[8]  Hyun-Jin Kim,et al.  Pepper EST database: comprehensive in silico tool for analyzing the chili pepper (Capsicum annuum) transcriptome , 2008, BMC Plant Biology.

[9]  G. Cervigni,et al.  Gene expression in diplosporous and sexual Eragrostis curvula genotypes with differing ploidy levels , 2008, Plant Molecular Biology.

[10]  C. Molina,et al.  SuperSAGE: the drought stress-responsive transcriptome of chickpea roots , 2008, BMC Genomics.

[11]  J. Hjorth,et al.  ESTIMATION OF MULTIPLE TIME DELAYS IN COMPLEX GRAVITATIONAL LENS SYSTEMS , 1998 .

[12]  J. Claverie,et al.  The significance of digital gene expression profiles. , 1997, Genome research.