Automatic generation of biclusters from gene expression data using multi-objective simulated annealing approach

The invention of microarray technology aids in the successful monitoring of the gene expression patterns. Biclustering is a method in which a number of co-regulated genes are identified over subset of conditions. Our aim is to detect all the non trivial biclusters having low mean squared residue(MSR) and high row variance. In this paper, we have proposed a multi-objective simulated annealing based solution framework to solve the biclustering problem from gene expression data sets. Two objective functions MSR and row-variance capturing two important properties of biclusters are optimized in parallel using the search capability of multi-objective simulated annealing based optimization technique, AMOSA. A new encoding strategy and several different search operators are defined for fast convergence of the algorithm. We have done experiment on two real-life data sets and obtained results are quantified by using several cluster validity indices. We have compared our obtained results with some state-of-the-art biclustering techniques.

[1]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[2]  Hamid Seifoddini,et al.  Single linkage versus average linkage clustering in machine cells formation applications , 1989 .

[3]  Hitashyam Maka,et al.  Biclustering of Gene Expression Data Using Genetic Algorithm , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[4]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[5]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[6]  T. M. Murali,et al.  Extracting Conserved Gene Expression Motifs from Gene Expression Data , 2002, Pacific Symposium on Biocomputing.

[7]  Krista Rizman Zalik,et al.  Biclustering of gene expression data , 2005 .

[8]  Clara Pizzuti,et al.  Gene Expression Biclustering Using Random Walk Strategies , 2005, DaWaK.

[9]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[10]  Padraig Cunningham,et al.  Biclustering of expression data using simulated annealing , 2005, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05).

[11]  Wan-Chi Siu,et al.  BiVisu: software tool for bicluster detection and visualization , 2007, Bioinform..

[12]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[13]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[14]  Ujjwal Maulik,et al.  Finding Multiple Coherent Biclusters in Microarray Data Using Variable String Length Multiobjective Genetic Algorithm , 2009, IEEE Transactions on Information Technology in Biomedicine.

[15]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[16]  Eckart Zitzler,et al.  An EA framework for biclustering of gene expression data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[17]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[18]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[19]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[20]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[21]  R. K. Ursem Multi-objective Optimization using Evolutionary Algorithms , 2009 .

[22]  Mahmoud Mounir,et al.  On biclustering of gene expression data , 2015, 2015 IEEE Seventh International Conference on Intelligent Computing and Information Systems (ICICIS).