Automatic clustering of gene ontology by genetic algorithm

Abstract—Nowadays, Gene Ontology has been used widely by many researchers for biological data mining and information retrieval, integration of biological databases, finding genes, and incorporating knowledge in the Gene Ontology for gene clustering. However, the increase in size of the Gene Ontology has caused problems in maintaining and processing them. One way to obtain their accessibility is by clustering them into fragmented groups. Clustering the Gene Ontology is a difficult combinatorial problem and can be modeled as a graph partitioning problem. Additionally, deciding the number k of clusters to use is not easily perceived and is a hard algorithmic problem. Therefore, an approach for solving the automatic clustering of the Gene Ontology is proposed by incorporating cohesion-and-coupling metric into a hybrid algorithm consisting of a genetic algorithm and a split-and-merge algorithm. Experimental results and an example of modularized Gene Ontology in RDF/XML format are given to illustrate the effectiveness of the algorithm

[1]  Chris Walshaw,et al.  Mesh Partitioning: A Multilevel Balancing and Refinement Algorithm , 2000, SIAM J. Sci. Comput..

[2]  S. Kotsiantis,et al.  Recent Advances in Clustering : A Brief Survey , 2004 .

[3]  Ali Kaveh,et al.  A hybrid graph-genetic method for domain decomposition , 2000 .

[4]  Tomoyuki Hiroyasu,et al.  A parallel genetic algorithm with distributed environment scheme , 1999, IEEE SMC'99 Conference Proceedings. 1999 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.99CH37028).

[5]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[6]  Shantanu Dutt,et al.  Probability-based approaches to VLSI circuit partitioning , 2000, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[9]  Miodrag Potkonjak,et al.  Watermarking graph partitioning solutions , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[10]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Byung Ro Moon,et al.  Genetic Algorithm and Graph Partitioning , 1996, IEEE Trans. Computers.

[12]  Michael E. Wall,et al.  Galib: a c++ library of genetic algorithm components , 1996 .

[13]  Bidyut Baran Chaudhuri,et al.  A novel genetic algorithm for automatic clustering , 2004, Pattern Recognit. Lett..

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Lin-Yu Tseng,et al.  A genetic approach to the automatic clustering problem , 2001, Pattern Recognit..

[16]  Michel C. A. Klein,et al.  Structure-Based Partitioning of Large Concept Hierarchies , 2004, SEMWEB.

[17]  Horst Bunke,et al.  Self-organizing map for clustering in the graph domain , 2002, Pattern Recognit. Lett..

[18]  Eytan Domany,et al.  Coupled Two-way Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data , 2002, Bioinform..

[19]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[20]  Youssef Saab,et al.  An effective multilevel algorithm for bisecting graphs and hypergraphs , 2004, IEEE Transactions on Computers.

[21]  K. Chang,et al.  Integration of Self-Organizing Feature Maps and Genetic-Algorithm-Based Clustering Method for Market Segmentation , 2004, J. Organ. Comput. Electron. Commer..

[22]  Greg Hamerly,et al.  Learning the k in k-means , 2003, NIPS.

[23]  Anthony Skjellum,et al.  A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard , 1996, Parallel Comput..

[24]  Ulrich Elsner,et al.  Graph partitioning - a survey , 2005 .

[25]  James C. Bezdek,et al.  Fuzzy c-means clustering of incomplete data , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[26]  Chris Walshaw,et al.  Parallel optimisation algorithms for multilevel mesh partitioning , 2000, Parallel Comput..

[27]  Kam-Fai Wong,et al.  A genetic algorithm-based clustering approach for database partitioning , 2002, IEEE Trans. Syst. Man Cybern. Part C.

[28]  Kengo Katayama,et al.  Performance of a genetic algorithm for the graph partitioning problem , 2003 .

[29]  Rajan Batta,et al.  A simulated annealing approach to police district design , 2002, Comput. Oper. Res..