Identification of disease critical genes causing Duchenne muscular dystrophy (DMD) using computational intelligence

Identification of the genes responsible for a disease by mining gene expression data is called gene selection problem. Here, gene selection is applied to a gene expression dataset containing disease affected samples (here, a set of expression levels of all genes of a person) as well as normal samples. The disease here is Duchenne muscular dystrophy, a genetic disease causing progressive decay in muscles. For identification of the disease-causing genes, two meta-heuristic algorithms, namely differential evolution (DE) and simulated annealing (SA) have been employed. The famous classifier, k nearest neighbor is embedded in each algorithm to classify the normal and diseased samples of the specified genes. Fitness of a solution in an algorithm is considered as the number of properly classified samples. Both algorithms obtained the highest fitness (22). DE requires less execution time than SA. The genes found as disease critical by the algorithms have been provided here.

[1]  A. Kornberg,et al.  Duchenne muscular dystrophy. , 2008, Neurology India.

[2]  Horst W. Hamacher,et al.  Combinatorial Optimization: New Frontiers in Theory and Practice , 2012 .

[3]  Alexandr Andoni,et al.  Nearest neighbor search : the old, the new, and the impossible , 2009 .

[4]  Sheldon Howard Jacobson,et al.  The Theory and Practice of Simulated Annealing , 2003, Handbook of Metaheuristics.

[5]  Sriyankar Acharyya,et al.  Identification of disease-critical genes causing preeclampsia: Meta-heuristic approaches , 2015, 2015 IEEE UP Section Conference on Electrical Computer and Electronics (UPCON).

[6]  Chen-An Tsai,et al.  Gene selection for sample classifications in microarray experiments. , 2004, DNA and cell biology.

[7]  Aidong Zhang,et al.  Virtual Gene: A Gene Selection Algorithm for Sample Classification on Microarray Datasets , 2005, International Conference on Computational Science.

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[10]  Pierre Hansen,et al.  Variable Neighbourhood Search , 2003 .

[11]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[12]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[13]  Pierre Hansen,et al.  Variable neighborhood search: Principles and applications , 1998, Eur. J. Oper. Res..

[14]  Thomas A. Darden,et al.  Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method , 2001, Bioinform..

[15]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[16]  Eduardo Tejera,et al.  Co-expression network analysis and genetic algorithms for gene prioritization in preeclampsia , 2013, BMC Medical Genomics.

[17]  Sriyankar Acharyya,et al.  Gene Selection by Sample Classification Using k Nearest Neighbor and Meta-heuristic Algorithms , 2016, 2016 IEEE 6th International Conference on Advanced Computing (IACC).

[18]  Yuhui Shi,et al.  Particle swarm optimization: developments, applications and resources , 2001, Proceedings of the 2001 Congress on Evolutionary Computation (IEEE Cat. No.01TH8546).

[19]  Ville Tirronen,et al.  Recent advances in differential evolution: a survey and experimental analysis , 2010, Artificial Intelligence Review.