Characterizing Cell Types through Differentially Expressed Gene Clusters Using a Model-Based Approach

Expression profiles of all genes can aid in getting more insight into the biological foundation of observed phenotypes or in identifying marker genes for use in clinical practice. With the invention of highthroughput DNA Microarrays profiling the expression state of cells on a whole-genome scale became feasible. Here, we propose a method based on model-based clustering to detect marker gene clusters that are most important in classifying different cell types. We show at the example of Acute Lymphoblastic Leukemia that these modules capture the expression state of different sample classes and that they give more biological insight into the different cell types than using just marker genes. Additionally, our method suggests groups of genes that can serve as clinical relevant markers.

[1]  Katrin Hoffmann,et al.  Translating microarray data for diagnostic testing in childhood leukaemia , 2006, BMC Cancer.

[2]  Soumyaroop Bhattacharya,et al.  Array of hope: expression profiling identifies disease biomarkers and mechanism. , 2009, Biochemical Society transactions.

[3]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[4]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.

[5]  Yves Van de Peer,et al.  Analysis of a Gibbs sampler method for model-based clustering of gene expression data , 2008, Bioinform..

[6]  H. Lilliefors On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown , 1967 .

[7]  Ka Yee Yeung,et al.  Bayesian mixture model based clustering of replicated microarray data , 2004, Bioinform..

[8]  R. Tibshirani,et al.  Supervised harvesting of expression trees , 2001, Genome Biology.

[9]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[10]  Hans-Peter Lenhof,et al.  GeneTrailExpress: a web-based pipeline for the statistical evaluation of microarray experiments , 2008, BMC Bioinformatics.

[11]  J. Downing,et al.  Classification of pediatric acute lymphoblastic leukemia by gene expression profiling. , 2003, Blood.

[12]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[13]  Nan Qiao,et al.  Gene expression-based classification and regulatory networks of pediatric acute lymphoblastic leukemia. , 2009, Blood.

[14]  Morris H. DeGroot,et al.  Optimal Statistical Decisions: DeGroot/Statistical Decisions WCL , 2005 .

[15]  James Bailey,et al.  Information theoretic measures for clusterings comparison: is a correction for chance necessary? , 2009, ICML '09.

[16]  J. Downing,et al.  Acute leukemia: a pediatric perspective. , 2002, Cancer cell.

[17]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[18]  M. Degroot Optimal Statistical Decisions , 1970 .