Mining Gene Expression Data Using Enhanced Intelligence Clustering Technique

With the advent of microarray technology, there is a growing need to reliably extract biologically significant information from massive gene expression data. Clustering is one of the key steps in analyzing gene expression data by identifying groups of genes that manifest similar expression patterns. Elucidating the patterns hidden in gene expression data offers a tremendous opportunity for an enhanced understanding of proteomics. However, the large number of genes and their measurement complexity greatly increase the challenges of comprehension, interpretation and limited progress on cluster validation and identifying the number of clusters. In this paper, an intelligence based clustering algorithm is integrated with the validation techniques to assess the predictive power of the clusters. Through experimental evaluation, this approach is shown to outperform the other clustering methods greatly in terms of clustering quality, efficiency and automation. The resulting clusters offer potential insight into gene function, molecular biological processes and regulatory mechanisms.

[1]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[2]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[3]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[5]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Zohar Yakhini,et al.  Clustering gene expression patterns , 1999, J. Comput. Biol..

[7]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.

[8]  Abdelghani Bellaachia,et al.  E-CAST: A Data Mining Algorithm for Gene Expression Data , 2002, BIOKDD.

[9]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[10]  V.S. Tseng,et al.  Efficiently mining gene expression data via a novel parameterless clustering method , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[12]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[13]  N. P. Gopalan,et al.  Enhanced correlation search technique for clustering cancer gene expression data , 2006 .

[14]  David Corne,et al.  Evolutionary Computation In Bioinformatics , 2003 .

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Sudipto Guha,et al.  ROCK: a robust clustering algorithm for categorical attributes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[18]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.