Correlation based Feature Selection using Rank aggregation for an Improved Prediction of Potentially Preventable Events

This paper presents a methodology for developing a novel feature selection model that will help in a more accurate and robust prediction of patients with the risk of Potentially Preventable Events (PPEs). PPEs are admissions, readmissions, complications and emergency department visits that could have been avoided if the patient had been given the appropriate interventions. Various clinical factors and patient hHDOWK FRQGLWLRQV FDQ DIIHFW D SDWLHQW¶V chance of developing the risk of PPE. We propose a robust Correlation based feature selection method using Rank Aggregation (CRA) which helps to identify the key contributing factors for the prediction of PPE. Unlike existing feature selection techniques that causes bias by using distinct statistical properties of data for feature evaluation, CRA uses rank aggregation thus reducing this bias. The result indicates that the proposed technique is more robust across a wide range of classifiers and has higher accuracy than other traditional methods.

[1]  Taghi M. Khoshgoftaar,et al.  Mean Aggregation versus Robust Rank Aggregation for Ensemble Gene Selection , 2012, 2012 11th International Conference on Machine Learning and Applications.

[2]  Jaideep Srivastava,et al.  Improved feature selection for hematopoietic cell transplantation outcome prediction using rank aggregation , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[3]  Jaideep Srivastava,et al.  Early Prediction of Potentially Preventable Events in Ambulatory Care Sensitive Admissions from Clinical Data , 2012, 2012 IEEE Second International Conference on Healthcare Informatics, Imaging and Systems Biology.

[4]  Prem Melville,et al.  Supervised Rank Aggregation for Predicting Influence in Networks , 2011, ArXiv.

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Katherine J Hoggatt,et al.  Exploratory data mining analysis identifying subgroups of patients with depression who are at high risk for suicide. , 2009, The Journal of clinical psychiatry.

[7]  H. Aaron Waste, we know you are out there. , 2008, The New England journal of medicine.

[8]  Marc-Oliver Wright,et al.  Automated surveillance and infection control: Toward a better tomorrow , 2008 .

[9]  Sellappan Palaniappan,et al.  Intelligent heart disease prediction system using data mining techniques , 2008, 2008 IEEE/ACS International Conference on Computer Systems and Applications.

[10]  Frank Tüttelmann,et al.  Optimising workflow in andrology: a new electronic patient record and database. , 2006, Asian journal of andrology.

[11]  Mary K Obenshain Application of Data Mining Techniques to Healthcare Data , 2004, Infection Control & Hospital Epidemiology.

[12]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[15]  Moni Naor,et al.  Rank aggregation methods for the Web , 2001, WWW '01.

[16]  I. Kojadinovic,et al.  Comparison between a filter and a wrapper approach to variable subset selection in regression problems , 2000 .

[17]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[18]  P.-C.-F. Daunou,et al.  Mémoire sur les élections au scrutin , 1803 .