FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data

BackgroundData clustering analysis has been extensively applied to extract information from gene expression profiles obtained with DNA microarrays. To this aim, existing clustering approaches, mainly developed in computer science, have been adapted to microarray data analysis. However, previous studies revealed that microarray datasets have very diverse structures, some of which may not be correctly captured by current clustering methods. We therefore approached the problem from a new starting point, and developed a clustering algorithm designed to capture dataset-specific structures at the beginning of the process.ResultsThe clustering algorithm is named Fuzzy clustering by Local Approximation of MEmbership (FLAME). Distinctive elements of FLAME are: (i) definition of the neighborhood of each object (gene or sample) and identification of objects with "archetypal" features named Cluster Supporting Objects, around which to construct the clusters; (ii) assignment to each object of a fuzzy membership vector approximated from the memberships of its neighboring objects, by an iterative converging process in which membership spreads from the Cluster Supporting Objects through their neighbors. Comparative analysis with K-means, hierarchical, fuzzy C-means and fuzzy self-organizing maps (SOM) showed that data partitions generated by FLAME are not superimposable to those of other methods and, although different types of datasets are better partitioned by different algorithms, FLAME displays the best overall performance. FLAME is implemented, together with all the above-mentioned algorithms, in a C++ software with graphical interface for Linux and Windows, capable of handling very large datasets, named Gene Expression Data Analysis Studio (GEDAS), freely available under GNU General Public License.ConclusionThe FLAME algorithm has intrinsic advantages, such as the ability to capture non-linear relationships and non-globular clusters, the automated definition of the number of clusters, and the identification of cluster outliers, i.e. genes that are not assigned to any cluster. As a result, clusters are more internally homogeneous and more diverse from each other, and provide better partitioning of biological functions. The clustering algorithm can be easily extended to applications different from gene expression analysis.

[1]  Peer Bork,et al.  Similar gene expression profiles do not imply similar tissue functions. , 2006, Trends in genetics : TIG.

[2]  Vito Di Gesù,et al.  GenClust: A genetic algorithm for clustering gene expression data , 2005, BMC Bioinformatics.

[3]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[4]  Ka Yee Yeung,et al.  Validating clustering for gene expression data , 2001, Bioinform..

[5]  Richard M. Karp,et al.  CLIFF: clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts , 2001, ISMB.

[6]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Bart De Moor,et al.  Biclustering microarray data by Gibbs sampling , 2003, ECCB.

[8]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[9]  Michael L. Bittner,et al.  Issues associated with microarray data analysis and integration , 1999 .

[10]  David Botstein,et al.  The Stanford Microarray Database , 2001, Nucleic Acids Res..

[11]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[12]  José María Carazo,et al.  Smoothly distributed fuzzy c-means: a new self-organizing map , 2001, Pattern Recognit..

[13]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[14]  Nabil Belacel,et al.  Fuzzy J-Means and VNS methods for clustering genes from microarray data , 2004, Bioinform..

[15]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[16]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[17]  R. Shamir,et al.  An algorithm for clustering cDNA fingerprints. , 2000, Genomics.

[18]  Alan P. Sprague,et al.  Reproducible Clusters from Microarray Research: Whither? , 2005, BMC Bioinformatics.

[19]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[20]  M. Eisen,et al.  Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering , 2002, Genome Biology.

[21]  Trevor Hastie,et al.  Gene Expression Programs in Response to Hypoxia: Cell Type Specificity and Prognostic Significance in Human Cancers , 2006, PLoS medicine.

[22]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[23]  B. Frey,et al.  The functional landscape of mouse gene expression , 2004, Journal of biology.

[24]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[25]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[26]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[27]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[28]  Shizhong Xu,et al.  Supervised cluster analysis for microarray data based on multivariate Gaussian mixture , 2004, Bioinform..