A knowledge discovery pipeline for medical decision support using clustering ensemble and neural network ensemble

It is widely recognized that knowledge discovery and data mining in the health domain are two techniques than scientists and researchers are always looking into areas for improvements and accurateness in prediction. In this paper, we present a multi-tier knowledge acquisition, amalgamation and learning info-structure for the learning of rules that have been generated from medical datasets comprising both annotated and un-annotated attributes. We propose a hybridized approach for rule learning manifested in an enhanced Knowledge Discovery Pipeline which features data cleansing, a novel data clustering ensemble mechanism via boosting, data discretization, rule generation via rough sets, rule filtering and eventually neural network ensemble via bagging. The pipeline, in addition to generating decision rules, would produce a neural knowledge base that can be considered an abstraction of knowledge that is present in the dataset.

[1]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[2]  S. Dudoit,et al.  A prediction-based resampling method for estimating the number of clusters in a dataset , 2002, Genome Biology.

[3]  A. Ohrn,et al.  Rough sets: a knowledge discovery technique for multifactorial medical outcomes. , 2000, American journal of physical medicine & rehabilitation.

[4]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[5]  Zhi-Hua Zhou,et al.  Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble , 2003, IEEE Transactions on Information Technology in Biomedicine.

[6]  Syed Sibte Raza Abidi,et al.  Applying Knowledge Discovery to Predict Infectious Disease Epidemics , 1998, PRICAI.

[7]  Syed Sibte Raza Abidi,et al.  Symbolic exposition of medical data-sets: a data mining workbench to inductively derive data-defining symbolic rules , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[8]  Andrzej Skowron,et al.  Rough set rudiments , 1995 .

[9]  Zhi-Hua Zhou,et al.  Extracting symbolic rules from trained neural network ensembles , 2003, AI Commun..

[10]  Régis Beuscart,et al.  From Data Collection to Knowledge Data Discovery: A Medical Application of Data Mining , 2001, MedInfo.

[11]  Ankica Babic Knowledge Discovery for Advanced Clinical Data Management and Analysis , 1999, MIE.

[12]  Stud. Techn. Knut Magne Risvik Discretization of Numerical Attributes Preprocessing for Machine Learning , 2007 .

[13]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[14]  Pádraig Cunningham,et al.  Stability problems with artificial neural networks and the ensemble solution , 2000, Artif. Intell. Medicine.

[15]  Yu-Bin Yang,et al.  Lung cancer cell identification based on artificial neural network ensembles , 2002, Artif. Intell. Medicine.

[16]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[17]  Bill Rogers,et al.  Locating anatomical landmarks for prosthetics design using ensemble neural networks , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).