Evolutionary approaches for feature selection in biological data

iii Copyright and access declaration v Acknowledgements vii List of Publications ix Table of Figures xvii Table of Tables xxiii Table of Abbreviations xxix

[1]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian Cancer , 2002 .

[2]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[3]  Daniel Tuyttens,et al.  A Random-Key Genetic Algorithm for Printing Problems , 2015 .

[4]  Madhu Yedla,et al.  Enhancing K-means Clustering Algorithm with Improved Initial Center , 2010 .

[5]  Li Li,et al.  A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. , 2005, Genomics.

[6]  John J. Grefenstette,et al.  Optimization of Control Parameters for Genetic Algorithms , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[7]  Hongbin Zhang,et al.  Feature selection using tabu search method , 2002, Pattern Recognit..

[8]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[9]  Huub C. J. Hoefsloot Statistical Analysis and validation , 2013 .

[10]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[11]  K. Isobe,et al.  Nonspecific crossreacting antigen (NCA) is a major member of the carcinoembryonic antigen (CEA)-related gene family expressed in lung cancer. , 1993, British Journal of Cancer.

[12]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[13]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[14]  Frank Jochen Dieterle,et al.  Multianalyte Quantifications by Means of Integration of Artificial Neural Networks, Genetic Algorithms and Chemometrics for Time-Resolved Analytical Data , 2003 .

[15]  Jerzy W. Bala,et al.  Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification , 1995, IJCAI.

[16]  Ji Zhu,et al.  Improved centroids estimation for the nearest shrunken centroid classifier , 2007, Bioinform..

[17]  Glenn Fung,et al.  SVM feature selection for classification of SPECT images of Alzheimer's disease using spatial information , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[18]  L. M. Patnaik,et al.  Adaptive Probabilities of Crossover Genetic in Mu tation and Algorithms , 1994 .

[19]  Thomas Bäck,et al.  A Survey of Evolution Strategies , 1991, ICGA.

[20]  Ali Ridho Barakbah,et al.  Hierarchical K-means: an algorithm for centroids initialization for K-means , 2007 .

[21]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[22]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[23]  Sanghamitra Bandyopadhyay,et al.  Unsupervised Classification: Similarity Measures, Classical and Metaheuristic Approaches, and Applications , 2012 .

[24]  Kalyanmoy Deb,et al.  Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms , 1994, Evolutionary Computation.

[25]  Nachol Chaiyaratana,et al.  Effects of diversity control in single-objective and multi-objective genetic algorithms , 2007, J. Heuristics.

[26]  Sushmita Mitra,et al.  Evolutionary Rough Feature Selection in Gene Expression Data , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[27]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[28]  David M. Lin,et al.  Effective similarity measures for expression profiles , 2006, Bioinform..

[29]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[30]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[31]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[32]  Roger E Bumgarner,et al.  Correction: Multiclass classification of microarray data with repeated measurements: application to cancer , 2006, Genome Biology.

[33]  K. Dejong,et al.  An analysis of the behavior of a class of genetic adaptive systems , 1975 .

[34]  Dong Wang,et al.  Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases , 2010, Bioinform..

[35]  Chiou-Peng Lam,et al.  NSC-NSGA2: Optimal search for finding multiple thresholds for nearest shrunken centroid , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[36]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[37]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[38]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[39]  Masao Fukushima,et al.  Genetic algorithm with automatic termination and search space rotation , 2011, Memetic Comput..

[40]  Saiful Islam,et al.  Mahalanobis Distance , 2009, Encyclopedia of Biometrics.

[41]  J. Galletly An Overview of Genetic Algorithms , 1992 .

[42]  Graham R. Ball,et al.  Identification of gene transcript signatures predictive for estrogen receptor and lymph node status using a stepwise forward selection artificial neural network modelling approach , 2008, Artif. Intell. Medicine.

[43]  S. Mitra,et al.  Bioinformatics with soft computing , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[45]  Lawrence V. Snyder,et al.  A random-key genetic algorithm for the generalized traveling salesman problem , 2006, Eur. J. Oper. Res..

[46]  Chang Wook Ahn,et al.  A diversity preserving selection in multiobjective evolutionary algorithms , 2010, Applied Intelligence.

[47]  H. P. Lee,et al.  Saliency Analysis of Support Vector Machines for Gene Selection in Tissue Classification , 2003, Neural Computing & Applications.

[48]  James Smith,et al.  A tutorial for competent memetic algorithms: model, taxonomy, and design issues , 2005, IEEE Transactions on Evolutionary Computation.

[49]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[50]  Ning Zhong,et al.  Probabilistic Rough Induction: The GDT-RS Methodology and Algorithms , 1999, ISMIS.

[51]  A. E. Eiben,et al.  Costs and Benefits of Tuning Parameters of Evolutionary Algorithms , 2008, PPSN.

[52]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[53]  S Shibahara,et al.  Evidence for the presence of two amino-terminal isoforms of neurofibromin, a gene product responsible for neurofibromatosis type 1. , 1995, The Tohoku journal of experimental medicine.

[54]  Boris Kovatchev,et al.  Evaluating the clinical accuracy of two continuous glucose sensors using continuous glucose-error grid analysis. , 2005, Diabetes care.

[55]  Gordon Broderick,et al.  Gene Expression Correlates of Unexplained Fatigue , 2006, Pharmacogenomics.

[56]  Kenneth A. De Jong,et al.  Genetic algorithms as a tool for feature selection in machine learning , 1992, Proceedings Fourth International Conference on Tools with Artificial Intelligence TAI '92.

[57]  Duong Tuan Anh,et al.  A FRAMEWORK FOR MEMETIC ALGORITHMS , 2009 .

[58]  Wei Pan,et al.  Incorporating prior knowledge of gene functional groups into regularized discriminant analysis of microarray data , 2007, Bioinform..

[59]  Jian Huang,et al.  Penalized feature selection and classification in bioinformatics , 2008, Briefings Bioinform..

[60]  Qin Yan,et al.  COMPARISON OF DIFFERENT CHROMOSOME REPRESENTATIONS USED BY GENETIC ALGORITHMS FOR SCHEDULING ENGINEERING MISSIONS , 1999 .

[61]  Simon Lovestone,et al.  Alzheimer’s Disease, Diagnosis and the Need for Biomarkers , 2008, Biomarker insights.

[62]  Jianqing Fan,et al.  High Dimensional Classification Using Features Annealed Independence Rules. , 2007, Annals of statistics.

[63]  Hyeoncheol Kim,et al.  An MLP-based feature subset selection for HIV-1 protease cleavage site analysis , 2010, Artif. Intell. Medicine.

[64]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[65]  Walter Maetzler,et al.  Biomarkers of Alzheimer's and Parkinson's Disease , 2010 .

[66]  T. Ushio,et al.  Rough sets-based machine learning using a binary discernibility matrix , 1999, Proceedings of the Second International Conference on Intelligent Processing and Manufacturing of Materials. IPMM'99 (Cat. No.99EX296).

[67]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[68]  Jessica Andrea Carballido,et al.  On Stopping Criteria for Genetic Algorithms , 2004, SBIA.

[69]  Mary Mittelman,et al.  World Alzheimer Report 2012 , 2012 .

[70]  Jan Komorowski,et al.  Learning Rough Set Classifiers from Gene Expressions and Clinical Data , 2002, Fundam. Informaticae.

[71]  Gordon Fraser,et al.  Evolutionary Generation of Whole Test Suites , 2011, 2011 11th International Conference on Quality Software.

[72]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[73]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[74]  Constantin F. Aliferis,et al.  Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective , 2006, Cancer informatics.

[75]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[76]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[77]  Li Bai,et al.  Genetic algorithm based feature selection for mass spectrometry data , 2008, 2008 8th IEEE International Conference on BioInformatics and BioEngineering.

[78]  Pablo Moscato,et al.  Identification of a 5-Protein Biomarker Molecular Signature for Predicting Alzheimer's Disease , 2008, PloS one.

[79]  Pablo M. Granitto,et al.  A novel clustering approach for biological data using a new distance based on Gene Ontology , 2013 .

[80]  Alden H. Wright,et al.  Genetic Algorithms for Real Parameter Optimization , 1990, FOGA.

[81]  David E. Goldberg,et al.  Genetic Algorithms, Tournament Selection, and the Effects of Noise , 1995, Complex Syst..

[82]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[83]  R. Tibshirani,et al.  Classification and prediction of clinical Alzheimer's diagnosis based on plasma signaling proteins , 2007, Nature Medicine.

[84]  Dominik Slezak,et al.  Roughfication of Numeric Decision Tables: The Case Study of Gene Expression Data , 2007, RSKT.

[85]  Anshul Mittal,et al.  A GENETIC ALGORITHM , 2010 .

[86]  Ingrid Russell,et al.  An introduction to the WEKA data mining system , 2006, ITICSE '06.

[87]  T. Shows,et al.  Isolation and Characterization of a Novel Zinc-finger Protein with Transcriptional Repressor Activity (*) , 1995, The Journal of Biological Chemistry.

[88]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[89]  T. Marwala,et al.  Microarray data feature selection using hybrid genetic algorithm simulated annealing , 2012, 2012 IEEE 27th Convention of Electrical and Electronics Engineers in Israel.

[90]  E. Talbi,et al.  A Genetic Algorithm for Feature Selection in Data-Mining for Genetics , 2001 .

[91]  Bernd Freisleben,et al.  Fitness landscapes and memetic algorithm design , 1999 .

[92]  Fernando G. Lobo,et al.  A parameter-less genetic algorithm , 1999, GECCO.

[93]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[94]  Enrique Alba,et al.  Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[95]  Qi Tian,et al.  Adaptive discriminant analysis for microarray-based classification , 2008, TKDD.

[96]  Glenn Fung,et al.  SVM Feature Selection for Classification of SPECT Images of Alzheimer's Disease Using Spatial Information , 2005, ICDM.

[97]  A. Olshen,et al.  Insights into extramedullary tumour cell growth revealed by expression profiling of human plasmacytomas and multiple myeloma , 2003, British journal of haematology.

[98]  Gary B. Lamont,et al.  Applications Of Multi-Objective Evolutionary Algorithms , 2004 .

[99]  V. K. Koumousis,et al.  A saw-tooth genetic algorithm combining the effects of variable population size and reinitialization to enhance performance , 2006, IEEE Transactions on Evolutionary Computation.

[100]  Grey Giddins,et al.  Statistics , 2016, The Journal of hand surgery, European volume.

[101]  Wei Du,et al.  Molecular classification of cancer types from microarray data using the combination of genetic algorithms and support vector machines , 2003, FEBS letters.

[102]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[103]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[104]  Mohammed Yeasin,et al.  A unified framework for finding differentially expressed genes from microarray experiments , 2007, BMC Bioinformatics.

[105]  Andrew Foss High-Dimensional Data Mining: Subspace Clustering, Outlier Detection and applications to classification , 2010 .

[106]  Siti Mariyam Shamsuddin,et al.  A predictive model construction applying rough set methodology for Malaysian stock market returns , 2009 .

[107]  B. Santhi,et al.  Decision tree classifiers for mass classification , 2015 .

[108]  Rajeev Kumar,et al.  Improved Sampling of the Pareto-Front in Multiobjective Genetic Optimizations by Steady-State Evolution: A Pareto Converging Genetic Algorithm , 2002, Evolutionary Computation.

[109]  Chiou-Peng Lam,et al.  NSC-GA: Search for optimal shrinkage thresholds for nearest shrunken centroid , 2013, 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[110]  Kenneth A. De Jong,et al.  Learning Concept Classification Rules Using Genetic Algorithms , 1991, IJCAI.

[111]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[112]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[113]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[114]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[115]  P. Moscato,et al.  Differences in Abundances of Cell-Signalling Proteins in Blood Reveal Novel Biomarkers for Early Detection Of Clinical Alzheimer's Disease , 2011, PloS one.

[116]  Peter K. Sharpe,et al.  Efficient GA Based Techniques for Classification , 1999, Applied Intelligence.

[117]  N. Zhong,et al.  Data Mining: A Probabilistic Rough Set Approach , 1998 .

[118]  W. Thies,et al.  2012 Alzheimer’s disease facts and figures , 2012, Alzheimer's & Dementia.

[119]  J.T. Alander,et al.  On optimal population size of genetic algorithms , 1992, CompEuro 1992 Proceedings Computer Systems and Software Engineering.

[120]  E. Gehan,et al.  The properties of high-dimensional data spaces: implications for exploring gene and protein expression data , 2008, Nature Reviews Cancer.

[121]  Peter J. Fleming,et al.  An Overview of Evolutionary Algorithms in Multiobjective Optimization , 1995, Evolutionary Computation.

[122]  Hans-Peter Kriegel,et al.  Optimal Distance Bounds for the Mahalanobis Distance , 2013, SISAP.

[123]  A. K. Pujari,et al.  Data Mining Techniques , 2006 .

[124]  Je Milton,et al.  Analaysis and improvement of genetic algorithms using concepts from information theory , 2009 .

[125]  Sushmita Mitra,et al.  Multi-objective evolutionary biclustering of gene expression data , 2006, Pattern Recognit..

[126]  Jongwoo Kim,et al.  Automatic Recognition of Alzheimer's Disease Using Genetic Algorithms and Neural Network , 2003, International Conference on Computational Science.

[127]  Andreas Zell,et al.  Wrapper- and Ensemble-Based Feature Subset Selection Methods for Biomarker Discovery in Targeted Metabolomics , 2011, PRIB.

[128]  David Beasley,et al.  An overview of genetic algorithms: Part 1 , 1993 .

[129]  Matthias Wölfel,et al.  Feature weighted mahalanobis distance: Improved robustness for Gaussian classifiers , 2005, 2005 13th European Signal Processing Conference.

[130]  Donald E. Grierson,et al.  Comparison among five evolutionary-based optimization algorithms , 2005, Adv. Eng. Informatics.

[131]  Blaise Hanczar,et al.  Improving classification of microarray data using prototype-based feature selection , 2003, SKDD.

[132]  N. Ramaraj,et al.  A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm , 2010, Knowl. Based Syst..

[133]  Kerrie L. Mengersen,et al.  Classification based upon gene expression data: bias and precision of error rates , 2007, Bioinform..

[134]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[135]  Myungsook Klassen,et al.  Nearest Shrunken Centroid as Feature Selection of Microarray Data , 2009, CATA.

[136]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[137]  Leandro dos Santos Coelho,et al.  A MULTIOBJECTIVE GENETIC ALGORITHM APPLIED TO MULTIVARIABLE CONTROL OPTIMIZATION , 2008 .

[138]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[139]  Pradipta Maji,et al.  A New Rough-Fuzzy Clustering Algorithm and its Applications , 2012, SocProS.

[140]  Chiou-Peng Lam,et al.  Incorporating genetic algorithm into rough feature selection for high dimensional biomedical data , 2011, 2011 IEEE International Symposium on IT in Medicine and Education.

[141]  Gene Hunting Gene Hunting , 2001 .

[142]  B Johansson,et al.  Altered expression of TGFB receptors and mitogenic effects of TGFB in pancreatic carcinomas. , 2001, International journal of oncology.

[143]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[144]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[145]  K. Deb,et al.  Reliable classification of two-class cancer data using evolutionary algorithms. , 2003, Bio Systems.

[146]  Ashish Tiwari,et al.  A greedy genetic algorithm for the quadratic assignment problem , 2000, Comput. Oper. Res..

[147]  E.J. Delp,et al.  A Comparison of Feature Selection Methods for the Detection of Breast Cancers in Mammograms: Adaptive Sequential Floating Search vs. Genetic Algorithm , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[148]  Z. Pawlak Rough set approach to knowledge-based decision support , 1997 .

[149]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[150]  Kazuhiro Nagai,et al.  Identifying progression‐associated genes in adult T‐cell leukemia/lymphoma by using oligonucleotide microarrays , 2004, International journal of cancer.

[151]  Tom V. Mathew Genetic Algorithm , 2022 .

[152]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[153]  Kien A. Hua,et al.  Decision tree classifier for network intrusion detection with GA-based feature selection , 2005, ACM Southeast Regional Conference.

[154]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[155]  Jin-Kao Hao,et al.  Advances in metaheuristics for gene selection and classification of microarray data , 2010, Briefings Bioinform..

[156]  Chad L. Myers,et al.  Comparison of Profile Similarity Measures for Genetic Interaction Networks , 2013, PloS one.

[157]  Joni-Kristian Kämäräinen,et al.  Differential Evolution Training Algorithm for Feed-Forward Neural Networks , 2003, Neural Processing Letters.

[158]  Masaaki Adachi,et al.  Induction of nonspecific cross-reacting antigen mRNA by interferon-γ and anti-fibronectin receptor antibody in colon cancer cells , 1997, Journal of Gastroenterology.

[159]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[160]  Shengxiang Yang,et al.  A memetic algorithm with adaptive hill climbing strategy for dynamic optimization problems , 2009, Soft Comput..

[161]  Geoffrey J McLachlan,et al.  Selection bias in gene extraction on the basis of microarray gene-expression data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[162]  M. P. Sebastian,et al.  Improving the Accuracy and Efficiency of the k-means Clustering Algorithm , 2009 .

[163]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[164]  Nicole Comtesse,et al.  Toward a more complete recognition of immunoreactive antigens in squamous cell lung carcinoma , 2002, International journal of cancer.

[165]  Thomas Roß,et al.  Feature selection for optimized skin tumor recognition using genetic algorithms , 1999, Artif. Intell. Medicine.

[166]  Ning Zhong,et al.  Using Rough Sets with Heuristics for Feature Selection , 1999, RSFDGrC.

[167]  Hitoshi Iba,et al.  Selecting informative genes using a multiobjective evolutionary algorithm , 2002, Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600).