M-quantile Regression Analysis of Temporal Gene Expression Data

In this paper, we explore the use of M-quantile regression and M-quantile coefficients to detect statistical differences between temporal curves that belong to different experimental conditions. In particular, we consider the application of temporal gene expression data. Here, the aim is to detect genes whose temporal expression is significantly different across a number of biological conditions. We present a new method to approach this problem. Firstly, the temporal profiles of the genes are modelled by a parametric M-quantile regression model. This model is particularly appealing to small-sample gene expression data, as it is very robust against outliers and it does not make any assumption on the error distribution. Secondly, we further increase the robustness of the method by summarising the M-quantile regression models for a large range of quantile values into an M-quantile coefficient. Finally, we fit a polynomial M-quantile regression model to the M-quantile coefficients over time and employ a Hotelling T2-test to detect significant differences of the temporal M-quantile coefficients profiles across conditions. Extensive simulations show the increased power and robustness of M-quantile regression methods over standard regression methods and over some of the previously published methods. We conclude by applying the method to detect differentially expressed genes from time-course microarray data on muscular dystrophy.

[1]  Ana Conesa,et al.  maSigPro: a Method to Identify Significantly Differential Expression Profiles in Time-Course Microarray Experiments , 2006, Spanish Bioinformatics Conference.

[2]  Hai Hu,et al.  Detecting Outlier Microarray Arrays by Correlation and Percentage of Outliers Spots , 2006, Cancer informatics.

[3]  Kui Wang,et al.  A Mixture model with random-effects components for clustering correlated gene-expression profiles , 2006, Bioinform..

[4]  Xuming He,et al.  Detecting Differential Expressions in GeneChip Microarray Studies , 2007 .

[5]  N. Tzavidis,et al.  M-quantile models for small area estimation , 2006 .

[6]  R. Koenker Quantile Regression: Fundamentals of Quantile Regression , 2005 .

[7]  Wenxuan Zhong,et al.  Penalized Clustering of Large-Scale Functional Data With Multiple Covariates , 2008, 0801.2555.

[8]  Christina Kendziorski,et al.  Hidden Markov Models for Microarray Time Course Data in Multiple Biological Conditions , 2006 .

[9]  Xiaohui Liu,et al.  An experimental evaluation of a loop versus a reference design for two-channel microarrays , 2005, Bioinform..

[10]  T. Speed,et al.  A multivariate empirical Bayes statistic for replicated microarray time course data , 2006, math/0702685.

[11]  Jeffrey S. Simonoff,et al.  Robust Weighted Lad Regression , 2005, Comput. Stat. Data Anal..

[12]  David E. Tyler,et al.  Constrained M-Estimation for Regression , 1996 .

[13]  Z. Bai,et al.  Robust Estimation Using the Huber Function With a Data-Dependent Tuning Constant , 2007 .

[14]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[15]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[16]  Adrian Bowman,et al.  NON-PARAMETRIC ANALYSIS OF COVARIANCE , 1995 .

[17]  K. Pearson Biometrika , 1902, The American Naturalist.

[18]  Keming Yu,et al.  Bayesian median regression for temporal gene expression data , 2008 .

[19]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Xuming He,et al.  An Enhanced Quantile Approach for Assessing Differential Gene Expressions , 2008, Biometrics.

[21]  Jeffrey T Leek,et al.  The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. , 2007, Biostatistics.

[22]  Xiaohui Liu,et al.  Exploiting the full power of temporal gene expression profiling through a new statistical test: Application to the analysis of muscular dystrophy data , 2006, BMC Bioinformatics.

[23]  Kellie J. Archer,et al.  An application for assessing quality of RNA hybridized to Affymetrix GeneChips , 2006, Bioinform..

[24]  Taesung Park,et al.  Statistical tests for identifying differentially expressed genes in time-course microarray experiments , 2003, Bioinform..

[25]  Jens Breckling,et al.  A Measure of Production Performance , 1997 .

[26]  Chao Yan,et al.  Outlier analysis for gene expression data , 2008, Journal of Computer Science and Technology.

[27]  Marianna Pensky,et al.  Statistical Applications in Genetics and Molecular Biology A Bayesian Approach to Estimation and Testing in Time-course Microarray Experiments , 2011 .

[28]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .