BIOINFORMATICS ORIGINAL PAPER

MOTIVATION Fuzzy c-means clustering is widely used to identify cluster structures in high-dimensional datasets, such as those obtained in DNA microarray and quantitative proteomics experiments. One of its main limitations is the lack of a computationally fast method to set optimal values of algorithm parameters. Wrong parameter values may either lead to the inclusion of purely random fluctuations in the results or ignore potentially important data. The optimal solution has parameter values for which the clustering does not yield any results for a purely random dataset but which detects cluster formation with maximum resolution on the edge of randomness. RESULTS Estimation of the optimal parameter values is achieved by evaluation of the results of the clustering procedure applied to randomized datasets. In this case, the optimal value of the fuzzifier follows common rules that depend only on the main properties of the dataset. Taking the dimension of the set and the number of objects as input values instead of evaluating the entire dataset allows us to propose a functional relationship determining the fuzzifier directly. This result speaks strongly against using a predefined fuzzifier as typically done in many previous studies. Validation indices are generally used for the estimation of the optimal number of clusters. A comparison shows that the minimum distance between the centroids provides results that are at least equivalent or better than those obtained by other computationally more expensive indices.

[1]  Christian Döring,et al.  Data analysis with fuzzy clustering methods , 2006, Comput. Stat. Data Anal..

[2]  Masahiro Okamoto,et al.  Application of bioinformatics for DNA microarray data to bioscience, bioengineering and medical fields. , 2006, Journal of bioscience and bioengineering.

[3]  J. C. Peters,et al.  Fuzzy Cluster Analysis : A New Method to Predict Future Cardiac Events in Patients With Positive Stress Tests , 1998 .

[4]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[5]  Kevin Baker,et al.  Classification of radar returns from the ionosphere using neural networks , 1989 .

[6]  D. Lauffenburger,et al.  Multiple reaction monitoring for robust quantitative proteomic analysis of cellular signaling networks , 2007, Proceedings of the National Academy of Sciences.

[7]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[8]  Matthias E. Futschik,et al.  Noise-robust Soft Clustering of Gene Expression Time-course Data , 2005, J. Bioinform. Comput. Biol..

[9]  Raghu Krishnapuram,et al.  Fitting an unknown number of lines and planes to image data through compatible cluster merging , 1992, Pattern Recognit..

[10]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[11]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[12]  Robert Babuska,et al.  Fuzzy Modeling for Control , 1998 .

[13]  Rajesh N. Davé,et al.  Validating fuzzy partitions obtained through c-shells clustering , 1996, Pattern Recognit. Lett..

[14]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[15]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Crispin J. Miller,et al.  Eight-channel iTRAQ Enables Comparison of the Activity of Six Leukemogenic Tyrosine Kinases*S , 2008, Molecular & Cellular Proteomics.

[17]  Y. Fukuyama,et al.  A new method of choosing the number of clusters for the fuzzy c-mean method , 1989 .

[18]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[19]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Doulaye Dembélé,et al.  Fuzzy C-means Method for Clustering Microarray Data , 2003, Bioinform..

[21]  Jian Yu,et al.  A novel fuzzy clustering algorithm based on a fuzzy scatter matrix with optimality tests , 2005, Pattern Recognit. Lett..

[22]  M. Mann,et al.  Global, In Vivo, and Site-Specific Phosphorylation Dynamics in Signaling Networks , 2006, Cell.

[23]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[24]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[25]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..