Service-Oriented Medical System for Supporting Decisions With Missing and Imbalanced Data

In this paper, we propose a service-oriented support decision system (SOSDS) for diagnostic problems that is insensitive to the problems of the imbalanced data and missing values of the attributes, which are widely observed in the medical domain. The system is composed of distributed Web services, which implement machine-learning solutions dedicated to constructing the decision models directly from the datasets impaired by the high percentage of missing values of the attributes and imbalanced class distribution. The issue of the imbalanced data is solved by the application of a cost-sensitive support vector machine and the problem of missing values of attributes is handled by proposing the novel ensemble-based approach that splits the incomplete data space into complete subspaces that are further used to construct base learners. We evaluate the quality of the SOSDS components using three ontological datasets.

[1]  Randolph A. Miller,et al.  Service-oriented Architecture in Medical Software: Promises and Perils , 2007, J. Am. Medical Informatics Assoc..

[2]  Justin Dauwels,et al.  Tensor-Based Methods for Handling Missing Data in Quality-of-Life Questionnaires , 2014, IEEE Journal of Biomedical and Health Informatics.

[3]  James R. Campbell,et al.  Synthesis of Research Paper: The SAGE Guideline Model: Achievements and Overview , 2007, J. Am. Medical Informatics Assoc..

[4]  Pawel Swiatek,et al.  ADAPTIVE DECISION SUPPORT SYSTEM FOR AUTOMATIC PHYSICAL EFFORT PLAN GENERATION—DATA-DRIVEN APPROACH , 2013, Cybern. Syst..

[5]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[6]  Krzysztof Juszczyszyn,et al.  Service Composition in Knowledge-based SOA Systems , 2012, New Generation Computing.

[7]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Adam Wright,et al.  SANDS: A service-oriented architecture for clinical decision support in a National Health Information Network , 2008, J. Biomed. Informatics.

[9]  Aníbal R. Figueiras-Vidal,et al.  Pattern classification with missing data: a review , 2010, Neural Computing and Applications.

[10]  Piotr Rygielski,et al.  Dynamic Resources Allocation for Delivery of Personalized Services , 2010, I3E.

[11]  Marek Lubicz,et al.  Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients , 2014, Appl. Soft Comput..

[12]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[14]  Erik Strumbelj,et al.  Explanation and reliability of prediction models: the case of breast cancer recurrence , 2010, Knowledge and Information Systems.

[15]  Igor Kononenko,et al.  Cost-Sensitive Learning with Neural Networks , 1998, ECAI.

[16]  Jakub M. Tomczak,et al.  Decision rules extraction from data stream in the presence of changing context for diabetes treatment , 2012, Knowledge and Information Systems.

[17]  Loris Nanni,et al.  A classifier ensemble approach for the missing feature problem , 2012, Artif. Intell. Medicine.