Projection Based Clustering of Gene Expression Data

The microarray DNA technologies have given researchers the ability to examine, discover and monitor thousands of genes in a single experiment. Nonetheless, the tremendous amount of data that can be obtained from microarray studies presents a challenge for data analysis, mainly due to the very high data dimensionality. A particular class of clustering algorithms has been very successful in dealing with such data, utilising information driven by the Principal Component Analysis. In this paper, we investigate the application of recently proposed projection based hierarchical clustering algorithms on gene expression microarray data. The algorithms apart from identifying the clusters present in a data set also calculate their number and thus require no special knowledge about the data.

[1]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[2]  Vipin Kumar,et al.  The Challenges of Clustering High Dimensional Data , 2004 .

[3]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[4]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[6]  D. K. Tasoulis,et al.  Improving Principal Direction Divisive Clustering , 2008 .

[7]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[8]  Martin Nilsson,et al.  Hierarchical Clustering Using Non-Greedy Principal Direction Divisive Partitioning , 2002, Information Retrieval.

[9]  Larry S. Davis,et al.  Improved fast gauss transform and efficient kernel density estimation , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[11]  Leslie Greengard,et al.  The Fast Gauss Transform , 1991, SIAM J. Sci. Comput..

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  Daniel Boley,et al.  Principal Direction Divisive Partitioning , 1998, Data Mining and Knowledge Discovery.

[14]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[15]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[17]  C. Chute,et al.  An Overview of Statistical Methods for the Classification and Retrieval of Patient Events , 1995, Methods of Information in Medicine.

[18]  R. Tryon Cluster Analysis , 1939 .

[19]  Efstratios Gallopoulos,et al.  Principal Direction Divisive Partitioning with Kernels and k-Means Steering , 2008 .

[20]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[21]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[22]  Charles Nicholas,et al.  Feature Selection and Document Clustering , 2004 .

[23]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[24]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[25]  Hans-Peter Kriegel,et al.  Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications , 1998, Data Mining and Knowledge Discovery.

[26]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[27]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[28]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.