Analyzing microarray data using cluster analysis.

As pharmacogenetics researchers gather more detailed and complex data on gene polymorphisms that effect drug metabolizing enzymes, drug target receptors and drug transporters, they will need access to advanced statistical tools to mine that data. These tools include approaches from classical biostatistics, such as logistic regression or linear discriminant analysis, and supervised learning methods from computer science, such as support vector machines and artificial neural networks. In this review, we present an overview of another class of models, cluster analysis, which will likely be less familiar to pharmacogenetics researchers. Cluster analysis is used to analyze data that is not a priori known to contain any specific subgroups. The goal is to use the data itself to identify meaningful or informative subgroups. Specifically, we will focus on demonstrating the use of distance-based methods of hierarchical clustering to analyze gene expression data.

[1]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[2]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[3]  Teuvo Kohonen,et al.  The self-organizing map , 1990, Neurocomputing.

[4]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[5]  Arie Perry,et al.  Mantel statistics to correlate gene expression levels from microarrays with clinical covariates , 2002, Genetic epidemiology.

[6]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[7]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[8]  A. Bowcock,et al.  Insights into psoriasis and other inflammatory diseases from large-scale gene expression studies. , 2001, Human molecular genetics.

[9]  Kwan Lee The Analysis of Proximity Data , 1999, Technometrics.

[10]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[11]  R. Sokal,et al.  Multiple regression and correlation extensions of the mantel test of matrix correspondence , 1986 .

[12]  Terence P. Speed,et al.  Normalization for cDNA microarry data , 2001, SPIE BiOS.

[13]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[14]  Terry Speed,et al.  Normalization of cDNA microarray data. , 2003, Methods.

[15]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[16]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[17]  G. J. Mitchell,et al.  Principles and procedures of statistics: A biometrical approach , 1981 .

[18]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[19]  Cheng Li,et al.  Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application , 2001, Genome Biology.

[20]  M. Watson,et al.  Gene expression profiling with oligonucleotide microarrays distinguishes World Health Organization grade of oligodendrogliomas. , 2001, Cancer research.

[21]  D. Slonim,et al.  Transcriptional profiling in cancer: the path to clinical pharmacogenomics. , 2001, Pharmacogenomics.

[22]  Douglas M. Hawkins,et al.  A variance-stabilizing transformation for gene-expression microarray data , 2002, ISMB.

[23]  David E. Booth,et al.  Applied Multivariate Analysis , 2003, Technometrics.

[24]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[25]  James H. Torrie,et al.  Principles and procedures of statistics: a biometrical approach (2nd ed) , 1980 .

[26]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[27]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[28]  M. Ringnér,et al.  Analyzing array data using supervised methods. , 2002, Pharmacogenomics.

[29]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.