Using Machine Learning Classifiers to Assist Healthcare-Related Decisions: Classification of Electronic Patient Records

Surveillance Levels (SLs) are categories for medical patients (used in Brazil) that represent different types of medical recommendations. SLs are defined according to risk factors and the medical and developmental history of patients. Each SL is associated with specific educational and clinical measures. The objective of the present paper was to verify computer-aided, automatic assignment of SLs. The present paper proposes a computer-aided approach for automatic recommendation of SLs. The approach is based on the classification of information from patient electronic records. For this purpose, a software architecture composed of three layers was developed. The architecture is formed by a classification layer that includes a linguistic module and machine learning classification modules. The classification layer allows for the use of different classification methods, including the use of preprocessed, normalized language data drawn from the linguistic module. We report the verification and validation of the software architecture in a Brazilian pediatric healthcare institution. The results indicate that selection of attributes can have a great effect on the performance of the system. Nonetheless, our automatic recommendation of surveillance level can still benefit from improvements in processing procedures when the linguistic module is applied prior to classification. Results from our efforts can be applied to different types of medical systems. The results of systems supported by the framework presented in this paper may be used by healthcare and governmental institutions to improve healthcare services in terms of establishing preventive measures and alerting authorities about the possibility of an epidemic.

[1]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[2]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Kátia Grillo Padilha,et al.  Sistemas de classificação de pacientes como instrumentos de gestão em Unidades de Terapia Intensiva , 2007 .

[5]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[6]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[7]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[8]  Max Chacón,et al.  Patients Classification by Risk Using Cluster Analysis and Genetic Algorithms , 2003, CIARP.

[9]  Renato Tinós,et al.  A Software Architecture-Based Framework Supporting Suggestion of Medical Surveillance Level from Classification of Electronic Patient Records , 2009, 2009 International Conference on Computational Science and Engineering.

[10]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  N. Demartines,et al.  Classification of Surgical Complications: A New Proposal With Evaluation in a Cohort of 6336 Patients and Results of a Survey , 2004, Annals of Surgery.

[13]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[14]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[15]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[16]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[17]  Limitations on Inductive Learning (extended Abstract) , 1997 .

[18]  Yuval Shahar,et al.  Classification of patients by severity grades during triage in the emergency department using data mining methods. , 2012, Journal of evaluation in clinical practice.