Unsupervised and Supervised Learning Approaches Together for Microarray Analysis

In this article, a novel concept is introduced by using both unsupervised and supervised learning. For unsupervised learning, the problem of fuzzy clustering in microarray data as a multiobjective optimization is used, which simultaneously optimizes two internal fuzzy cluster validity indices to yield a set of Pareto-optimal clustering solutions. In this regards, a new multiobjective differential evolution based fuzzy clustering technique has been proposed. Subsequently, for supervised learning, a fuzzy majority voting scheme along with support vector machine is used to integrate the clustering information from all the solutions in the resultant Pareto-optimal set. The performances of the proposed clustering techniques have been demonstrated on five publicly available benchmark microarray data sets. A detail comparison has been carried out with multiobjective genetic algorithm based fuzzy clustering, multiobjective differential evolution based fuzzy clustering, single objective versions of differential evolution and genetic algorithm based fuzzy clustering as well as well known fuzzy c-means algorithm. While using support vector machine, comparative studies of the use of four different kernel functions are also reported. Statistical significance test has been done to establish the statistical superiority of the proposed multiobjective clustering approach. Finally, biological significance test has been carried out using a web based gene annotation tool to show that the proposed integrated technique is able to produce biologically relevant clusters of coexpressed genes.

[1]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  G. Church,et al.  Systematic determination of genetic network architecture , 1999, Nature Genetics.

[4]  R. Storn,et al.  Differential Evolution - A simple and efficient adaptive scheme for global optimization over continuous spaces , 2004 .

[5]  J. Barker,et al.  Large-scale temporal gene expression mapping of central nervous system development. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Ronald W. Davis,et al.  A genome-wide transcriptional analysis of the mitotic cell cycle. , 1998, Molecular cell.

[7]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[8]  Ujjwal Maulik,et al.  Modified differential evolution based fuzzy clustering for pixel classification in remote sensing imagery , 2009, Pattern Recognit..

[9]  Kalyanmoy Deb,et al.  Multi-objective optimization using evolutionary algorithms , 2001, Wiley-Interscience series in systems and optimization.

[10]  Ujjwal Maulik,et al.  Multiobjective Genetic Clustering for Pixel Classification in Remote Sensing Imagery , 2007, IEEE Transactions on Geoscience and Remote Sensing.

[11]  Ron Shamir,et al.  CLICK and EXPANDER: a system for clustering and visualizing gene expression data , 2003, Bioinform..

[12]  James C. Bezdek,et al.  On cluster validity for the fuzzy c-means model , 1995, IEEE Trans. Fuzzy Syst..

[13]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[14]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[15]  D. Botstein,et al.  The transcriptional program of sporulation in budding yeast. , 1998, Science.

[16]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[17]  P. Reymond,et al.  Differential Gene Expression in Response to Mechanical Wounding and Insect Feeding in Arabidopsis , 2000, Plant Cell.

[18]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  R. Storn,et al.  Differential Evolution: A Practical Approach to Global Optimization (Natural Computing Series) , 2005 .

[20]  Sanghamitra Bandyopadhyay,et al.  Analysis of Biological Data: A Soft Computing Approach , 2007, Science, Engineering, and Biology Informatics.

[21]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[22]  Ujjwal Maulik,et al.  Integrating Clustering and Supervised Learning for Categorical Data Analysis , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[23]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[24]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Ujjwal Maulik,et al.  Genetic algorithm-based clustering technique , 2000, Pattern Recognit..

[26]  Kalyanmoy Deb,et al.  A Fast Elitist Non-dominated Sorting Genetic Algorithm for Multi-objective Optimisation: NSGA-II , 2000, PPSN.

[27]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[30]  Ujjwal Maulik,et al.  Automatic Fuzzy Clustering Using Modified Differential Evolution for Image Classification , 2010, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[32]  Carlos A. Coello Coello,et al.  A Comprehensive Survey of Evolutionary-Based Multiobjective Optimization Techniques , 1999, Knowledge and Information Systems.

[33]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[34]  Y Xu,et al.  Minimum spanning trees for gene expression data clustering. , 2001, Genome informatics. International Conference on Genome Informatics.