Facioscapulohumeral Muscular Dystrophy Diagnosis Using Hierarchical Clustering Algorithm and K-Nearest Neighbor Based Methodology

The genetic diagnosis of neuromuscular disorder is an active area of research. Microarrays are used to detect the changes in genes for the accurate diagnosis. Unfortunately, the number of genes in gene expression data is very large as compared to number of samples. The number of genes needs to be reduced for correct diagnosis. In the present paper, the authors have made an intelligent integrated model for clustering and diagnosis of neuromuscular diseases. Wilcoxon signed rank test is used to preselect the genes. K-means and hierarchical clustering algorithms with different distance metric are employed to cluster the genes. Three classifiers namely linear discriminant analysis, quadratic discriminant analysis and k-nearest neighbor are used. For the employment of integrated techniques, a balanced facioscapulohumeral muscular dystrophy dataset is taken. A comparative analysis of the above integrated algorithms is presented which demonstrate that the integration of cosine distance metric hierarchical clustering algorithm with k-nearest neighbor has given the best performance measures.

[1]  Kuldip K. Paliwal,et al.  Cancer classification by gradient LDA technique using microarray gene expression data , 2008, Data Knowl. Eng..

[2]  P Halonen,et al.  [Facioscapulohumeral muscular dystrophy]. , 1990, Duodecim; laaketieteellinen aikakauskirja.

[3]  Krzysztof Fujarewicz,et al.  Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data , 2004, Eng. Appl. Artif. Intell..

[4]  Zne-Jung Lee,et al.  An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer , 2008, Artif. Intell. Medicine.

[5]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Aidong Zhang,et al.  Virtual Gene: A Gene Selection Algorithm for Sample Classification on Microarray Datasets , 2005, International Conference on Computational Science.

[7]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Babita Pandey,et al.  Knowledge and intelligent computing system in medicine , 2009, Comput. Biol. Medicine.

[9]  M.Punithavalli,et al.  Efficient Cancer Classification using Fast Adaptive Neuro-Fuzzy Inference System (FANFIS) based on Statistical Techniques , 2011 .

[10]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[11]  L. Ziaei Ms,et al.  APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN CANCER CLASSIFICATION AND DIAGNOSIS PREDICTION OF A SUBTYPE OF LYMPHOMA BASED ON GENE EXPRESSION PROFILE , 2006 .

[12]  S. Swamynathan,et al.  A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data , 2013, Frontiers of Computer Science.

[13]  Andrew H. Sung,et al.  Gene Selection for Tumor Classification Using Microarray Gene Expression Data , 2007, World Congress on Engineering.

[14]  Austin H. Chen,et al.  A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostate cancers , 2011, Expert Syst. Appl..

[15]  Fang-Xiang Wu,et al.  Genetic weighted k-means algorithm for clustering large-scale gene expression data , 2008, BMC Bioinformatics.

[16]  Hung-Wen Chiu,et al.  Risk classification of cancer survival using ANN with gene expression data from multiple laboratories , 2014, Comput. Biol. Medicine.

[17]  R. Lemmers,et al.  Best practice guidelines on genetic diagnostics of Facioscapulohumeral muscular dystrophy: Workshop 9th June 2010, LUMC, Leiden, The Netherlands , 2012, Neuromuscular Disorders.

[18]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[19]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[20]  Babita Pandey,et al.  Knowledge and intelligent computing techniques in bioinformatics , 2016, Int. J. Comput. Biol. Drug Des..

[21]  Loris Nanni,et al.  An ensemble of support vector machines for predicting virulent proteins , 2009, Expert Syst. Appl..

[22]  Bangpeng Yao,et al.  ANMM4CBR: a case-based reasoning method for gene expression data classification , 2010, Algorithms for Molecular Biology.

[23]  Yanwen Chong,et al.  Gene selection using independent variable group analysis for tumor classification , 2011, Neural Computing and Applications.

[24]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.