A model selection approach to discover age-dependent gene expression patterns using quantile regression models

BackgroundIt has been a long-standing biological challenge to understand the molecular regulatory mechanisms behind mammalian ageing. Harnessing the availability of many ageing microarray datasets, a number of studies have shown that it is possible to identify genes that have age-dependent differential expression (DE) or differential variability (DV) patterns. The majority of the studies identify "interesting" genes using a linear regression approach, which is known to perform poorly in the presence of outliers or if the underlying age-dependent pattern is non-linear. Clearly a more robust and flexible approach is needed to identify genes with various age-dependent gene expression patterns.ResultsHere we present a novel model selection approach to discover genes with linear or non-linear age-dependent gene expression patterns from microarray data. To identify DE genes, our method fits three quantile regression models (constant, linear and piecewise linear models) to the expression profile of each gene, and selects the least complex model that best fits the available data. Similarly, DV genes are identified by fitting and comparing two quantile regression models (non-DV and the DV models) to the expression profile of each gene. We show that our approach is much more robust than the standard linear regression approach in discovering age-dependent patterns. We also applied our approach to analyze two human brain ageing datasets and found many biologically interesting gene expression patterns, including some very interesting DV patterns, that have been overlooked in the original studies. Furthermore, we propose that our model selection approach can be extended to discover DE and DV genes from microarray datasets with discrete class labels, by considering different quantile regression models.ConclusionIn this paper, we present a novel application of quantile regression models to identify genes that have interesting linear or non-linear age-dependent expression patterns. One important contribution of this paper is to introduce a model selection approach to DE and DV gene identification, which is most commonly tackled by null hypothesis testing approaches. We show that our approach is robust in analyzing real and simulated datasets. We believe that our approach is applicable in many ageing or time-series data analysis tasks.

[1]  K. Reymann,et al.  Involvement of neurogranin in the modulation of calcium/calmodulin-dependent protein kinase II, synaptic plasticity, and spatial learning: a study with knockout mice. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[2]  J. Slevin,et al.  Cellular functions of NSF: Not just SNAPs and SNAREs , 2007, FEBS letters.

[3]  A. Owen,et al.  AGEMAP: A Gene Expression Database for Aging in Mice , 2007, PLoS genetics.

[4]  Thomas S Deisboeck,et al.  Personalizing medicine: a systems biology perspective , 2009, Molecular systems biology.

[5]  I. Androulakis,et al.  Analysis of time-series gene expression data: methods, challenges, and opportunities. , 2007, Annual review of biomedical engineering.

[6]  M. Tewari,et al.  The Clinical Applications of a Systems Approach , 2006, PLoS medicine.

[7]  R. Koenker Quantile Regression: Name Index , 2005 .

[8]  Jae Won Lee,et al.  OutlierD: an R package for outlier detection using quantile regression on mass spectrometry data , 2008, Bioinform..

[9]  J. Crawley,et al.  Neurogranin null mutant mice display performance deficits on spatial learning tasks with anxiety related components , 2001, Hippocampus.

[10]  Carlo Colantuoni,et al.  Age-related changes in the expression of schizophrenia susceptibility genes in the human prefrontal cortex , 2008, Brain Structure and Function.

[11]  Y. Smith,et al.  Roles of BLOC-1 and adaptor protein-3 complexes in cargo sorting to synaptic vesicles. , 2009, Molecular biology of the cell.

[12]  Hua Liu,et al.  Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments , 2005, BMC Bioinformatics.

[13]  S. Murphy,et al.  Neuregulin signaling via ErbB receptor assemblies in the nervous system , 2002, Molecular Neurobiology.

[14]  S. Snyder,et al.  Localization of an endoplasmic reticulum calcium ATPase mRNA in rat brain by in situ hybridization , 1991, Neuroscience.

[15]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[16]  Ziv Bar-Joseph,et al.  Analyzing time series gene expression data , 2004, Bioinform..

[17]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[18]  T. C. Chamberlin The Method of Multiple Working Hypotheses: With this method the dangers of parental affection for a favorite theory can be circumvented. , 1965, Science.

[19]  P. Pfluger,et al.  Modulatory calcineurin-interacting proteins 1 and 2 function as calcineurin facilitators in vivo. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[20]  D. Lawrence,et al.  Neuroserpin: a selective inhibitor of tissue-type plasminogen activator in the central nervous system , 2004, Thrombosis and Haemostasis.

[21]  B. Cade,et al.  A gentle introduction to quantile regression for ecologists , 2003 .

[22]  R. Koenker,et al.  Regression Quantiles , 2007 .

[23]  B. Cade,et al.  Estimating effects of limiting factors with regression quantiles , 1999 .

[24]  Michael A. Charleston,et al.  Differential variability analysis of gene expression and its application to human diseases , 2008, ISMB.

[25]  E. Vieth Fitting piecewise linear regression functions to biological responses. , 1989, Journal of applied physiology.

[26]  João Pedro de Magalhães,et al.  Meta-analysis of age-related gene expression profiles identifies common signatures of aging , 2009, Bioinform..

[27]  Jan Vijg,et al.  Increased cell-to-cell variation in gene expression in ageing mouse heart , 2006, Nature.

[28]  Arnold J. Stromberg,et al.  A novel application of quantile regression for identification of biomarkers exemplified by equine cartilage microarray data , 2008, BMC Bioinformatics.

[29]  T. C. Chamberlin The Method of Multiple Working Hypotheses , 1931, The Journal of Geology.

[30]  R. Jaffard,et al.  Age-related decreases in mRNA for brain nuclear receptors and target genes are reversed by retinoic acid treatment , 1997, Neuroscience Letters.

[31]  S. Hekimi How genetic analysis tests theories of animal aging , 2006, Nature Genetics.

[32]  Kiran Kamath,et al.  Gene Aging Nexus: a web database and data mining platform for microarray data on aging , 2006, Nucleic Acids Res..

[33]  Denise C. Park,et al.  The adaptive brain: aging and neurocognitive scaffolding. , 2009, Annual review of psychology.

[34]  K. Boheler,et al.  Aging-associated changes in cardiac gene expression. , 2005, Cardiovascular research.

[35]  I. Kohane,et al.  Gene regulation and DNA damage in the ageing human brain , 2004, Nature.

[36]  F. Wright,et al.  Age-dependent variability in gene expression in male Fischer 344 rat retina. , 2009, Toxicological sciences : an official journal of the Society of Toxicology.

[37]  Paul H. C. Eilers,et al.  Quantile smoothing of array CGH data , 2005, Bioinform..

[38]  K. Reymann,et al.  Neurogranin/RC3 Enhances Long-Term Potentiation and Learning by Promoting Calcium-Mediated Signaling , 2004, The Journal of Neuroscience.

[39]  Ji Zhu,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm364 Data and text mining Analysis of array CGH data for cancer studies using , 2022 .

[40]  S. Pääbo,et al.  Gene expression becomes heterogeneous with age , 2006, Current Biology.

[41]  N. Mons,et al.  Selective age‐related changes in the PKC‐sensitive, calmodulin‐binding protein, neurogranin, in the mouse brain , 2001, Journal of neurochemistry.

[42]  R. Chappell,et al.  Fitting bent lines to data, with applications to allometry. , 1989, Journal of theoretical biology.

[43]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .