Knowledge Discovery in Bioinformatics

Biomedical research progresses rapidly, in particular in the area of genomic and postgenomic research. Hence many challenges appear for biostatistics and bioinformatics to deal with the large amount of data generated. After presenting some of these challenges, this chapter aims at presenting evolutionary combinatorial optimization approaches proposed to deal with knowledge discovery in bioinformatics. Therefore, the chapter will focus on three main tasks of data mining (association rules, feature selection, and clustering) widely encountered in bioinformatics applications. For each of them, a description of the task will be given as well as information about their uses in bioinformatics. Then, some evolutionary approaches proposed to cope with such a task will be exposed and discussed.

[1]  Edmund K. Burke,et al.  Improving the scalability of rule-based evolutionary learning , 2009, Memetic Comput..

[2]  George M. Church,et al.  Biclustering of Expression Data , 2000, ISMB.

[3]  F. Valafar Pattern Recognition Techniques in Microarray Data Analysis , 2002, Annals of the New York Academy of Sciences.

[4]  José María Carazo,et al.  BMC Bioinformatics BioMed Central Methodology article Integrated analysis of gene expression by association rules discovery , 2022 .

[5]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[6]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Multi-objective clustering ensemble for gene expression data analysis , 2009, Neurocomputing.

[7]  Pier Luca Lanzi,et al.  Learning classifier systems: then and now , 2008, Evol. Intell..

[8]  Clarisse Dhaenens,et al.  A multicriteria genetic algorithm to analyze microarray data , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[9]  Edward R. Dougherty,et al.  Is cross-validation valid for small-sample microarray classification? , 2004, Bioinform..

[10]  Sung-Bae Cho,et al.  Evolutionary Fuzzy Clustering Algorithm with Knowledge-Based Evaluation and Applications for Gene Expression Profiling , 2005 .

[11]  Peteris Prusis,et al.  Rough set‐based proteochemometrics modeling of G‐protein‐coupled receptor‐ligand interactions , 2006, Proteins.

[12]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Laetitia Vermeulen-Jourdan,et al.  Synergies between operations research and data mining: The emerging use of multi-objective approaches , 2012, Eur. J. Oper. Res..

[14]  Alex Alves Freitas,et al.  Automatically evolving rule induction algorithms tailored to the prediction of postsynaptic activity in proteins , 2009, Intell. Data Anal..

[15]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[16]  Manabu Kotani,et al.  Feature Extraction Using Genetic Algorithms , 1999 .

[17]  Vipin Kumar,et al.  Association Analysis Techniques for Bioinformatics Problems , 2009, BICoB.

[18]  Ujjwal Maulik,et al.  An improved algorithm for clustering gene expression data , 2007, Bioinform..

[19]  Xavier Llorà,et al.  Towards better than human capability in diagnosing prostate cancer using infrared spectroscopic imaging , 2007, GECCO '07.

[20]  Juan Liu,et al.  Selecting informative genes with parallel genetic algorithms in tissue classification. , 2001, Genome informatics. International Conference on Genome Informatics.

[21]  El-Ghazali Talbi,et al.  Comparison of population based metaheuristics for feature selection: Application to microarray data classification , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[22]  Joshua D. Knowles,et al.  Multiobjective Optimization in Bioinformatics and Computational Biology , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Martin V. Butz,et al.  Data Mining in Learning Classifier Systems: Comparing XCS with GAssist , 2005, IWLCS.

[24]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[25]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[26]  Nikhil R. Pal,et al.  Genetic programming for simultaneous feature selection and classifier design , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[27]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[28]  Juan Liu,et al.  Clustering Protein Interaction Data Through Chaotic Genetic Algorithm , 2006, SEAL.

[29]  Joshua D. Knowles,et al.  An Evolutionary Approach to Multiobjective Clustering , 2007, IEEE Transactions on Evolutionary Computation.

[30]  Salvatore Greco,et al.  Rough Sets in Decision Making , 2009, Encyclopedia of Complexity and Systems Science.

[31]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[32]  Joshua D. Knowles,et al.  Evolutionary Multiobjective Clustering , 2004, PPSN.

[33]  Mukesh M. Raghuwanshi,et al.  Genetic Algorithm Based Clustering: A Survey , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[34]  Byung Ro Moon,et al.  Hybrid Genetic Algorithms for Feature Selection , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[35]  Andreas Zell,et al.  Clustering Gene Expression Profiles with Memetic Algorithms , 2002, PPSN.

[36]  Jaume Bacardit,et al.  Prediction of topological contacts in proteins using learning classifier systems , 2008, Soft Comput..

[37]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[38]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[39]  Enrique Alba,et al.  Sensitivity and specificity based multiobjective approach for feature selection: Application to cancer diagnosis , 2009, Inf. Process. Lett..

[40]  Debashis Ghosh,et al.  Feature selection and molecular classification of cancer using genetic programming. , 2007, Neoplasia.

[41]  J Wang,et al.  Genetic algorithm-based efficient feature selection for classification of pre-miRNAs. , 2011, Genetics and molecular research : GMR.

[42]  Ricardo J. G. B. Campello,et al.  Evolutionary algorithms for clustering gene-expression data , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[43]  Jorge Casillas,et al.  Learning consistent, complete and compact sets of fuzzy rules in conjunctive normal form for regression problems , 2008, Soft Comput..

[44]  Douglas B. Kell,et al.  Genetic algorithms as a method for variable selection in multiple linear regression and partial least squares regression, with applications to pyrolysis mass spectrometry , 1997 .

[45]  Zheng Rong Yang,et al.  A genetic programming approach for Burkholderia Pseudomallei diagnostic pattern discovery , 2009, Bioinform..

[46]  Mohammed J. Zaki Parallel Sequence Mining on Shared-Memory Machines , 1999, J. Parallel Distributed Comput..

[47]  Edgar E. Vallejo,et al.  MOCEA: a multi-objective clustering evolutionary algorithm for inferring protein-protein functional interactions , 2009, GECCO '09.

[48]  Jan Komorowski,et al.  Predicting gene ontology biological process from temporal gene expression patterns. , 2003, Genome research.

[49]  Ali M. S. Zalzala,et al.  NOCEA: A rule-based evolutionary algorithm for efficient and effective clustering of massive high-dimensional databases , 2007, Appl. Soft Comput..

[50]  Paul Terry,et al.  Application of the GA/KNN method to SELDI proteomics data , 2004, Bioinform..

[51]  El-Ghazali Talbi,et al.  Multi-objective evolutionary algorithm for biclustering in microarrays data , 2011, 2011 IEEE Congress of Evolutionary Computation (CEC).

[52]  Hyeoncheol Kim,et al.  Generating Rules for Predicting MHC Class I Binding Peptide using ANN and Knowledge-based GA , 2009, J. Digit. Content Technol. its Appl..

[53]  Laetitia Vermeulen-Jourdan,et al.  Linkage disequilibrium study with a parallel adaptive GA , 2005, Int. J. Found. Comput. Sci..

[54]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[55]  Chia-Chu Chiang,et al.  A Parallel Apriori Algorithm for Frequent Itemsets Mining , 2006, Fourth International Conference on Software Engineering Research, Management and Applications (SERA'06).

[56]  Staal A. Vinterbo,et al.  Minimal approximate hitting sets and rule templates , 2000, Int. J. Approx. Reason..

[57]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[58]  Edgar E. Vallejo,et al.  A Clustering Genetic Algorithm for Genomic Data Mining , 2009, Foundations of Computational Intelligence.

[59]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[60]  Sung-Bae Cho,et al.  Evolutionary fuzzy cluster analysis with Bayesian validation of gene expression profiles , 2007, J. Intell. Fuzzy Syst..

[61]  Xin Yao,et al.  An evolutionary clustering algorithm for gene expression microarray data analysis , 2006, IEEE Transactions on Evolutionary Computation.

[62]  A. Wayne Whitney,et al.  A Direct Method of Nonparametric Measurement Selection , 1971, IEEE Transactions on Computers.

[63]  Jaume Bacardit Peñarroya Pittsburgh genetic-based machine learning in the data mining era: representations, generalization, and run-time , 2004 .

[64]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[65]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .