Clustering approaches to identifying gene expression patterns from DNA microarray data.

The analysis of microarray data is essential for large amounts of gene expression data. In this review we focus on clustering techniques. The biological rationale for this approach is the fact that many co-expressed genes are co-regulated, and identifying co-expressed genes could aid in functional annotation of novel genes, de novo identification of transcription factor binding sites and elucidation of complex biological pathways. Co-expressed genes are usually identified in microarray experiments by clustering techniques. There are many such methods, and the results obtained even for the same datasets may vary considerably depending on the algorithms and metrics for dissimilarity measures used, as well as on user-selectable parameters such as desired number of clusters and initial values. Therefore, biologists who want to interpret microarray data should be aware of the weakness and strengths of the clustering methods used. In this review, we survey the basic principles of clustering of DNA microarray data from crisp clustering algorithms such as hierarchical clustering, K-means and self-organizing maps, to complex clustering algorithms like fuzzy clustering.

[1]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M. Narasimha Murty,et al.  Genetic K-means algorithm , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[3]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[4]  M. J. van der Laan,et al.  A new partitioning around medoids algorithm , 2003 .

[5]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[6]  P. Woolf,et al.  A fuzzy logic approach to analyzing gene expression data. , 2000, Physiological genomics.

[7]  W. T. Williams,et al.  Dissimilarity Analysis: a new Technique of Hierarchical Sub-division , 1964, Nature.

[8]  Fang-Xiang Wu,et al.  Determination of the minimum number of microarray experiments for discovery of gene expression patterns , 2006, BMC Bioinformatics.

[9]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[10]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[11]  S. Bull,et al.  A hierarchical clustering method for estimating copy number variation. , 2007, Biostatistics.

[12]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[13]  Mu-Chun Su,et al.  A new model of self-organizing neural networks and its application in data projection , 2001, IEEE Trans. Neural Networks.

[14]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[15]  G. C. Tseng,et al.  A comparative review of gene clustering in expression profile , 2004, ICARCV 2004 8th Control, Automation, Robotics and Vision Conference, 2004..

[16]  Saman K. Halgamuge,et al.  An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data , 2003, Bioinform..

[17]  Yi Lu,et al.  FGKA: a Fast Genetic K-means Clustering Algorithm , 2004, SAC '04.

[18]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Jeffrey T. Chang,et al.  Basic microarray analysis: grouping and feature reduction. , 2001, Trends in biotechnology.

[21]  Q. Wang,et al.  Clustering methods for microarray gene expression data. , 2006, Omics : a journal of integrative biology.

[22]  Yoichi Nakazato,et al.  Systematic immunohistochemical profiling of 378 brain tumors with 37 antibodies using tissue microarray technology , 2006, Acta Neuropathologica.

[23]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[24]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[25]  Yi Lu,et al.  Incremental genetic K-means algorithm and its application in gene expression data analysis , 2004, BMC Bioinformatics.

[26]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[27]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[28]  Habtom W. Ressom,et al.  Adaptive double self-organizing maps for clustering gene expression profiles , 2003, Neural Networks.

[29]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[30]  Limin Fu,et al.  FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data , 2007, BMC Bioinformatics.

[31]  Nabil Belacel,et al.  Fuzzy J-Means and VNS methods for clustering genes from microarray data , 2004, Bioinform..

[32]  Li Cai,et al.  Measuring similarities between gene expression profiles through new data transformations , 2007, BMC Bioinformatics.

[33]  Jian Pei,et al.  Towards interactive exploration of gene expression patterns , 2003, SKDD.

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[36]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[37]  Eivind Hovig,et al.  Tumor classification and marker gene prediction by feature selection and fuzzy c-means clustering using microarray data , 2003, BMC Bioinformatics.

[38]  J. Do,et al.  Normalization of microarray data: single-labeled and dual-labeled arrays. , 2006, Molecules and cells.

[39]  Isabelle Guyon,et al.  A Stability Based Method for Discovering Structure in Clustered Data , 2001, Pacific Symposium on Biocomputing.

[40]  Robert Tibshirani,et al.  Hybrid hierarchical clustering with applications to microarray data. , 2005, Biostatistics.

[41]  Paul C. Boutros,et al.  Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data , 2005, Briefings Bioinform..

[42]  Eyke Hüllermeier,et al.  Clustering of gene expression data using a local shape-based similarity measure , 2005, Bioinform..