Optimizing feature selection to improve medical diagnosis

In this paper, we propose a new optimization framework for improving feature selection in medical data classification. We call this framework Support Feature Machine (SFM). The use of SFM in feature selection is to find the optimal group of features that show strong separability between two classes. The separability is measured in terms of inter-class and intra-class distances. The objective of SFM optimization model is to maximize the correctly classified data samples in the training set, whose intra-class distances are smaller than inter-class distances. This concept can be incorporated with the modified nearest neighbor rule for unbalanced data. In addition, a variation of SFM that provides the feature weights (prioritization) is also presented. The proposed SFM framework and its extensions were tested on 5 real medical datasets that are related to the diagnosis of epilepsy, breast cancer, heart disease, diabetes, and liver disorders. The classification performance of SFM is compared with those of support vector machine (SVM) classification and Logical Data Analysis (LAD), which is also an optimization-based feature selection technique. SFM gives very good classification results, yet uses far fewer features to make the decision than SVM and LAD. This result provides a very significant implication in diagnostic practice. The outcome of this study suggests that the SFM framework can be used as a quick decision-making tool in real clinical settings.

[1]  Meta M. Voelker,et al.  Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[2]  Ya-Ju Fan,et al.  Novel Optimization Models for Abnormal Brain Activity Classification , 2008, Oper. Res..

[3]  W. Art Chaovalitwongse,et al.  Electroencephalogram (EEG) time series classification: Applications in epilepsy , 2006, Ann. Oper. Res..

[4]  Fred Glover,et al.  IMPROVED LINEAR PROGRAMMING MODELS FOR DISCRIMINANT ANALYSIS , 1990 .

[5]  Panos M. Pardalos,et al.  Data Mining in Biomedicine , 2010 .

[6]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[7]  K. Saastamoinen,et al.  Medical Data Classification using Logical Similarity Based Measures , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[8]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[9]  John J. Glen,et al.  An iterative mixed integer programming method for classification accuracy maximizing discriminant analysis , 2003, Comput. Oper. Res..

[10]  O. Mangasarian Linear and Nonlinear Separation of Patterns by Linear Programming , 1965 .

[11]  Seoung Bum Kim,et al.  Controlling the False Discovery Rate for Feature Selection in High‐resolution NMR Spectra , 2008, Stat. Anal. Data Min..

[12]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[13]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[14]  Ya-Ju Fan,et al.  On the Time Series $K$-Nearest Neighbor Classification of Abnormal Brain Activity , 2007, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[15]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[16]  K.S. Nikita,et al.  Classification of medical data with a robust multi-level combination scheme , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[17]  Glenn Fung,et al.  Finite Newton method for Lagrangian support vector machine classification , 2003, Neurocomputing.

[18]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[19]  Olvi L. Mangasarian,et al.  Multisurface method of pattern separation , 1968, IEEE Trans. Inf. Theory.

[20]  Jancik,et al.  Multisurface Method of Pattern Separation , 1993 .

[21]  Panos M. Pardalos,et al.  Comprar Data Mining in Biomedicine | Vazacopoulos, Alkis | 9780387693187 | Springer , 2007 .

[22]  井尻 雄士 Creative and innovative approaches to the science of management , 1993 .