Superposition of Transcriptional Behaviors Determines Gene State

We introduce a novel technique to determine the expression state of a gene from quantitative information measuring its expression. Adopting a productive abstraction from current thinking in molecular biology, we consider two expression states for a gene - Up or Down. We determine this state by using a statistical model that assumes the data behaves as a combination of two biological distributions. Given a cohort of hybridizations, our algorithm predicts, for the single reading, the probability of each gene's being in an Up or a Down state in each hybridization. Using a series of publicly available gene expression data sets, we demonstrate that our algorithm outperforms the prevalent algorithm. We also show that our algorithm can be used in conjunction with expression adjustment techniques to produce a more biologically sound gene-state call. The technique we present here enables a routine update, where the continuously evolving expression level adjustments feed into gene-state calculations. The technique can be applied in almost any multi-sample gene expression experiment, and holds equal promise for protein abundance experiments.

[1]  G. Church,et al.  Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset , 2005, Genome Biology.

[2]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[3]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[5]  Ehud Shapiro,et al.  Cells as Computation , 2003, CMSB.

[6]  Andy H. Lee,et al.  Maternity Length of Stay Modelling by Gamma Mixture Regression with Random Effects , 2007, Biometrical journal. Biometrische Zeitschrift.

[7]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[8]  Bryan Frank,et al.  Independence and reproducibility across microarray platforms , 2005, Nature Methods.

[9]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[10]  T Boes,et al.  Normalization for Affymetrix GeneChips , 2005, Methods of Information in Medicine.

[11]  E. Shapiro,et al.  Cellular abstractions: Cells as computation , 2002, Nature.

[12]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Wei-Min Liu,et al.  Analysis of high density expression microarrays with signed-rank call algorithms , 2002, Bioinform..

[15]  Sündüz Keleş,et al.  Mixture Modeling for Genome‐Wide Localization of Transcription Factors , 2007, Biometrics.