Microarray Gene Expression Data Mining : Clustering Analysis Review

After genome sequencing, DNA microarray analysis has become the most widely used functional genomics approach in the bioinformatics field. Biologists are vastly plagued by the enormous amount of unprecedented qualities of genome-wide data produced by the DNA Microarray experiment. Clustering is the process of grouping data objects into set of disjoint classes called clusters so that objects within a class are highly similar with one another and dissimilar with the objects in other classes. It is presently the far most used method for gene expression analysis which provides a divide-and–conquer strategy to extract meaningful information from expression profile. This paper presents a review on the recently development of microarray clustering techniques. In this paper, the procedures of clustering analysis are highlighted followed by the different categories of gene expression data clustering with some conventional approaches, to provide a framework for an enhanced general understanding of related methods for further development.

[1]  Kathleen Marchal,et al.  Adaptive quality-based clustering of gene expression profiles , 2002, Bioinform..

[2]  Jian Pei,et al.  Mining phenotypes and informative genes from gene expression data , 2003, KDD '03.

[3]  Mohammed J. Zaki,et al.  TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data , 2005, SIGMOD '05.

[4]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  C. Müller,et al.  Large-scale clustering of cDNA-fingerprinting data. , 1999, Genome research.

[6]  Laurie J. Heyer,et al.  Exploring expression data: identification and analysis of coexpressed genes. , 1999, Genome research.

[7]  Alfonso Valencia,et al.  A hierarchical unsupervised growing neural network for clustering gene expression patterns , 2001, Bioinform..

[8]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[9]  Teuvo Kohonen,et al.  In: Self-organising Maps , 1995 .

[10]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[11]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[12]  Francisco Azuaje,et al.  Cluster validation techniques for genome expression data , 2003, Signal Process..

[13]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[14]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[16]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[17]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.

[19]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[20]  Steen Knudsen,et al.  Guide to analysis of DNA microarray data , 2004 .

[21]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.