Automatic evolution of bi-clusters from microarray data using self-organized multi-objective evolutionary algorithm

In the current paper, a novel approach is proposed for bi-clustering of gene expression data using the fusion of differential evolution framework and self-organizing map (SOM), named as BiClustSMEA. Variable number of gene and condition cluster centers are encoded in different solutions of the population to determine the number of bi-clusters from a dataset in an automated way. The concept of SOM is utilized in designing new genetic operators for both gene and condition clusters to reach to the optimal solution in a faster way. In order to measure the goodness of a bi-clustering solution, three bi-cluster quality measures, mean squared error, row variance, and bi-cluster size, are optimized simultaneously using differential evolution as the underlying optimization strategy. The concept of polynomial mutation is incorporated in our framework to generate highly diverse solutions which in turn helps in faster convergence. The proposed approach is applied on two real-life microarray gene expression datasets and results are compared with various state-of-the-art techniques. Results obtained clearly illustrate that our approach extracts high-quality bi-clusters as compared to other methods and also it converges much faster than other competitors. Further, the obtained results are validated using statistical significance test and biological significance test.

[1]  Roded Sharan,et al.  Discovering statistically significant biclusters in gene expression data , 2002, ISMB.

[2]  Bin Wei,et al.  Comparison between differential evolution and particle swarm optimization algorithms , 2014, 2014 IEEE International Conference on Mechatronics and Automation.

[3]  El-Ghazali Talbi,et al.  Using multiobjective optimization for biclustering microarray data , 2015, Appl. Soft Comput..

[4]  Federico Divina,et al.  A multi-objective approach to discover biclusters in microarray data , 2007, GECCO '07.

[5]  Lai-Wan Chan,et al.  Biclustering Gene Expression Profiles by Alternately Sorting with Weighted Correlated Coefficient , 2006, 2006 16th IEEE Signal Processing Society Workshop on Machine Learning for Signal Processing.

[6]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[7]  Sudipta Acharya,et al.  Bi-clustering of microarray data using a symmetry-based multi-objective optimization framework , 2019, Soft Comput..

[8]  Teuvo Kohonen,et al.  The self-organizing map , 1990 .

[9]  Xiao Zhi Gao,et al.  Self-organizing multiobjective optimization based on decomposition with neighborhood ensemble , 2016, Neurocomputing.

[10]  Hairong Dong,et al.  An Efficient Weighted Biclustering Algorithm for Gene Expression Data , 2016, 2016 17th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT).

[11]  Ujjwal Maulik,et al.  A Simulated Annealing-Based Multiobjective Optimization Algorithm: AMOSA , 2008, IEEE Transactions on Evolutionary Computation.

[12]  Clara Pizzuti,et al.  Gene Expression Biclustering Using Random Walk Strategies , 2005, DaWaK.

[13]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[14]  Philip S. Yu,et al.  Enhanced biclustering on expression data , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[15]  Ajith Abraham,et al.  Data Clustering Using Multi-objective Differential Evolution Algorithms , 2009, Fundam. Informaticae.

[16]  G. Getz,et al.  Coupled two-way clustering analysis of gene microarray data. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Chih-Cheng Hung,et al.  Bi-MOCK: A Multi-objective Evolutionary Algorithm for Bi-clustering with Automatic Determination of the Number of Bi-clusters , 2017, ICONIP.

[18]  Ujjwal Maulik,et al.  Finding Multiple Coherent Biclusters in Microarray Data Using Variable String Length Multiobjective Genetic Algorithm , 2009, IEEE Transactions on Information Technology in Biomedicine.

[19]  Pushpak Bhattacharyya,et al.  A Self Organizing Map Based Multi-objective Framework for Automatic Evolution of Clusters , 2017, ICONIP.

[20]  René Thomsen,et al.  A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[21]  Richard M. Karp,et al.  Discovering local structure in gene expression data: the order-preserving submatrix problem , 2002, RECOMB '02.

[22]  Kalyanmoy Deb,et al.  Omni-optimizer: A generic evolutionary algorithm for single and multi-objective optimization , 2008, Eur. J. Oper. Res..

[23]  Sven Bergmann,et al.  Defining transcription modules using large-scale gene expression data , 2004, Bioinform..

[24]  J. Hartigan Direct Clustering of a Data Matrix , 1972 .

[25]  Swagatam Das,et al.  Automatic Clustering Using an Improved Differential Evolution Algorithm , 2007 .

[26]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[27]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[28]  Hitashyam Maka,et al.  Biclustering of Gene Expression Data Using Genetic Algorithm , 2005, 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology.

[29]  Qingfu Zhang,et al.  A Self-Organizing Multiobjective Evolutionary Algorithm , 2016, IEEE Transactions on Evolutionary Computation.