Hybrid Algorithm for Clustering of Microarray Data

Clustering is a crucial step in the analysis of gene expression data. Its goal is to identify the natural clusters and provide a reliable estimate of the number of distinct clusters in a given data set. In this paper we propose new hybrid algorithm for clustering of microarray data based on spectral clustering and k-means. Our algorithm consist of four steps, including preprocessing or filtering step, and finding optimal number of clusters by using two different clustering methods based on hierarchical and partition-based approaches. Then, we cluster data based on similarity/dissimilarity metrics with spectral clustering. In the final step, we select centroid genes based on kmeans results. The proposed method was tested on six data sets from GEMS microarray database. When compared with existing single or combination of clustering methods, our results indicate about 10% improvement in selection of representative genes.

[1]  Arlindo L. Oliveira,et al.  Biclustering algorithms for biological data analysis: a survey , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[3]  Davide Risso,et al.  A novel approach to the clustering of microarray data via nonparametric density estimation , 2011, BMC Bioinformatics.

[4]  O. Yli-Harja,et al.  DNA microarray data preprocessing , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[5]  R. Tibshirani,et al.  Clustering methods for the analysis of DNA microarray data , 1999 .

[6]  Giovanni Felici,et al.  MALA: A Microarray Clustering and Classification Software , 2012, 2012 23rd International Workshop on Database and Expert Systems Applications.

[7]  Roger K. Blashfield,et al.  Computer Programs for Performing Iterative Partitioning Cluster Analysis , 1978 .

[8]  Braden Greer,et al.  Online analysis of microarray data using artificial neural networks. , 2007, Methods in molecular biology.

[9]  Xu Degang,et al.  Research on spectral clustering algorithms based on building different affinity matrix , 2013, 2013 25th Chinese Control and Decision Conference (CCDC).

[10]  Jun Ni,et al.  Clustering of gene expression data: performance and similarity analysis , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[11]  Chun-Hsi Huang,et al.  Clustering of Gene Expression Data: Performance and Similarity Analysis , 2006, First International Multi-Symposiums on Computer and Computational Sciences (IMSCCS'06).

[12]  T. Marwala,et al.  Microarray data feature selection using hybrid genetic algorithm simulated annealing , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[13]  Derek J. Hook,et al.  Functional Protein Microarrays in Drug Discovery , 2008 .

[14]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[15]  Yixin Chen,et al.  Biomarker discovery using 1-norm regularization for multiclass earthworm microarray gene expression data , 2012, Neurocomputing.

[16]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[17]  Teresa Lettieri,et al.  Recent Applications of DNA Microarray Technology to Toxicology and Ecotoxicology , 2005, Environmental health perspectives.

[18]  Pietro Hiram Guzzi Microarray Data Analysis: Methods and Applications , 2016 .

[19]  S. Samarasinghe,et al.  Machine Learning for Childhood Acute Lymphoblastic Leukaemia Gene Expression Data Analysis: A Review , 2010 .

[20]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[21]  Hong Chang,et al.  Robust path-based spectral clustering with application to image segmentation , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22]  Roded Sharan,et al.  Algorithmic approaches to clustering gene expression data , 2001 .

[23]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[24]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[25]  Angel R. Martinez,et al.  MATLAB Statistics Toolbox , 2001 .

[26]  Peter Kokol,et al.  Unsupervised variance based preprocessing of microarray data , 2009, 2009 22nd IEEE International Symposium on Computer-Based Medical Systems.

[27]  Xin Zhao,et al.  Multiclass Kernel-Imbedded Gaussian Processes for Microarray Data Analysis , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[28]  Cheng Fang,et al.  Gene Expression Data Classification Using Artificial Neural Network Ensembles Based on Samples Filtering , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.