Multicategory classification of 11 neuromuscular diseases based on microarray data using support vector machine

We applied multicategory machine learning methods to classify 11 neuromuscular disease groups and one control group based on microarray data. To develop multicategory classification models with optimal parameters and features, we performed a systematic evaluation of three machine learning algorithms and four feature selection methods using three-fold cross validation and a grid search. This study included 114 subjects of 11 neuromuscular diseases and 31 subjects of a control group using microarray data with 22,283 probe sets from the National Center for Biotechnology Information (NCBI). We obtained an accuracy of 100%, relative classifier information (RCI) of 1.0, and a kappa index of 1.0 by applying the models of support vector machines one-versus-one (SVM-OVO), SVM one-versus-rest (OVR), and directed acyclic graph SVM (DAGSVM), using the ratio of genes between categories to within-category sums of squares (BW) feature selection method. Each of these three models selected only four features to categorize the 12 groups, resulting in a time-saving and cost-effective strategy for diagnosing neuromuscular diseases. In addition, a gene symbol, SPP1 was selected as the top-ranked gene by the BW method. We confirmed relationships between the gene (SPP1) and Duchenne muscular dystrophy (DMD) from a previous study. With our models as clinically helpful tools, neuromuscular diseases could be classified quickly using a computer, thereby giving a time-saving, cost-effective, and accurate diagnosis.

[1]  P. Saratchandran,et al.  Multicategory Classification Using An Extreme Learning Machine for Microarray Gene Expression Cancer Diagnosis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Chang-Xue Jack Feng,et al.  Threefold vs. fivefold cross validation in one-hidden-layer and two-hidden-layer predictive neural network modeling of machining surface roughness data , 2005 .

[3]  E. Hoffman,et al.  Journal of Autoimmune Diseases BioMed Central , 2006 .

[4]  George M. Spyrou,et al.  Investigating the Minimum Required Number of Genes for the Classification of Neuromuscular Disease Microarray Data , 2011, IEEE Transactions on Information Technology in Biomedicine.

[5]  Nello Cristianini,et al.  Large Margin DAGs for Multiclass Classification , 1999, NIPS.

[6]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[7]  S. Paik,et al.  Development of the 21-gene assay and its application in clinical practice and clinical trials. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[8]  Sara Galbiati,et al.  Skeletal muscle gene expression profiling in mitochondrial disorders , 2005, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[9]  G. Lanfranchi,et al.  SPP1 genotype is a determinant of disease severity in Duchenne muscular dystrophy , 2010, Neurology.

[10]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[11]  B. Shneiderman,et al.  Nuclear envelope dystrophies show a transcriptional fingerprint suggesting disruption of Rb-MyoD pathways in muscle regeneration. , 2006, Brain : a journal of neurology.

[12]  G. Hadjigeorgiou,et al.  Virus-mediated autoimmunity in Multiple Sclerosis , 2006, Journal of autoimmune diseases.

[13]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[14]  R. Calogero,et al.  Microarray data analysis and mining. , 2004, Methods in molecular medicine.

[15]  S. Rutkove,et al.  Machine learning algorithms to classify spinal muscular atrophy subtypes , 2012, Neurology.

[16]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[17]  N. Friedman,et al.  Mitochondrial processes are impaired in hereditary inclusion body myopathy. , 2008, Human molecular genetics.

[18]  Gregory Piatetsky-Shapiro,et al.  Microarray data mining: facing the challenges , 2003, SKDD.

[19]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..