Inference on Low-Rank Data Matrices with Applications to Microarray Data

Probe-level microarray data are usually stored in matrices, where the row and column correspond to array and probe, respectively. Scientists routinely summarize each array by a single index as the expression level of each probe-set (gene). We examine the adequacy of a uni-dimensional summary for characterizing the data matrix of each probe-set. To do so, we propose a low-rank matrix model for the probe-level intensities, and develop a useful framework for testing the adequacy of uni-dimensionality against targeted alternatives. This is an interesting statistical problem where inference has to be made based on one data matrix whose entries are not i.i.d. We analyze the asymptotic properties of the proposed test statistics, and use Monte Carlo simulations to assess their small sample performance. Applications of the proposed tests to GeneChip data show that evidence against a uni-dimensional model is often indicative of practically relevant features of a probe-set.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  M. R. Leadbetter,et al.  Extremes and Related Properties of Random Sequences and Processes: Springer Series in Statistics , 1983 .

[3]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[4]  Yuk Fai Leung,et al.  Factorial microarray analysis of zebrafish retinal development , 2008, Proceedings of the National Academy of Sciences.

[5]  Sheng Zhong,et al.  Reproducibility Probability Score—incorporating measurement variability across laboratories for gene selection , 2006, Nature Biotechnology.

[6]  Steen Knudsen,et al.  Alternative mapping of probes to genes for Affymetrix chips , 2004, BMC Bioinformatics.

[7]  Yi Xing,et al.  Exon arrays provide accurate assessments of gene expression , 2007, Genome Biology.

[8]  E. Mammen When Does Bootstrap Work?: Asymptotic Results and Simulations , 1992 .

[9]  Q. Shao,et al.  A general bahadur representation of M-estimators and its application to linear regression with nonstochastic designs , 1996 .

[10]  D. Wågsäter,et al.  Expression and gene polymorphisms of the chemokine CXCL5 in colorectal cancer patients. , 2007, International journal of oncology.

[11]  G. Parmigiani,et al.  The Analysis of Gene Expression Data , 2003 .

[12]  Hanlee P. Ji,et al.  The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. , 2006, Nature biotechnology.

[13]  S. Enkemann,et al.  A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array , 2005, Nucleic acids research.

[14]  Jun Lu,et al.  Transcript-based redefinition of grouped oligonucleotide probe sets using AceView: High-resolution annotation for microarrays , 2007, BMC Bioinform..