Ensembles of case‐based reasoning classifiers in high‐dimensional biological domains

In order to extend the capabilities of case‐based reasoning (CBR), we implemented an ensemble for case‐based reasoning (E4CBR) approach where an ensemble of CBR classifiers is combined with clustering and feature selection. We first select a subset of features of all the cases, and then cluster the cases into disjoint groups, where each group of cases forms the case‐base of one of the member classifiers. Finally, in each case‐base, a subset of features is ‘locally’ selected individually. To predict the label of an unseen case, each classifier in the ensemble provides a prediction, and the aggregation component of E4CBR combines the predictions by weighing each classifier using a CBR approach—a classifier with more cases similar to the test case receives a higher weight.We evaluated E4CBR on four publicly available biological data sets, and also compared the classification error of E4CBR with a single CBR classifier. In our experiments, we use TA3—a computational framework for CBR systems. Our results show that E4CBR reduces the classification error of our CBR classifier. On the basis of empirical results, our aggregation method outperforms the existing CBR aggregation methods. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 164‐171 DOI: 10.1002/widm.22

[1]  Igor Jurisica,et al.  Data mining for case-based reasoning in high-dimensional biological domains , 2005, IEEE Transactions on Knowledge and Data Engineering.

[2]  Padraig Cunningham,et al.  A Comparison of Ensemble and Case-Base Maintenance Techniques for Handling Concept Drift in Spam Filtering , 2006, FLAIRS.

[3]  Sayan Mukherjee,et al.  Classifying Microarray Data Using Support Vector Machines , 2003 .

[4]  William Cheetham,et al.  Using Ensembles of Binary Case-Based Reasoners , 2005, ICCBR.

[5]  Isabelle Bichindaritz,et al.  INTRODUCTION TO THE SPECIAL ISSUE ON CASE‐BASED REASONING IN THE HEALTH SCIENCES , 2006, Comput. Intell..

[6]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Padraig Cunningham,et al.  An Approach to Aggregating Ensembles of Lazy Learners That Supports Explanation , 2002, ECCBR.

[9]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[10]  J. Glimm,et al.  Detection of cancer-specific markers amid massive mass spectral data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Igor Jurisica,et al.  Applications of Case-Based Reasoning in Molecular Biology , 2004, AI Mag..

[12]  Giorgio Valentini,et al.  Effectiveness of error correcting output coding methods in ensemble and monolithic learning machines , 2003 .

[13]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Stephen D. Bay Nearest neighbor classification from multiple feature subsets , 1999, Intell. Data Anal..

[16]  John Mylopoulos,et al.  Case-based reasoning in IVF: prediction and knowledge mining , 1998, Artif. Intell. Medicine.

[17]  David W. Aha,et al.  Error-Correcting Output Codes for Local Learners , 1998, ECML.

[18]  Thomas G. Dietterich,et al.  Error-Correcting Output Coding Corrects Bias and Variance , 1995, ICML.

[19]  Igor Jurisica,et al.  Prognostic gene signatures for non-small-cell lung cancer , 2009, Proceedings of the National Academy of Sciences.

[20]  M. V. Velzen,et al.  Self-organizing maps , 2007 .

[21]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[22]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[23]  Giorgio Valentini,et al.  Feature Selection Combined with Random Subspace Ensemble for Gene Expression Based Diagnosis of Malignancies , 2004, WIRN.

[24]  David Leake,et al.  Case-Based Reasoning: Experiences, Lessons and Future Directions , 1996 .

[25]  Jim Davies,et al.  Protein Structure from Contact Maps: A Case-Based Reasoning Approach , 2006, Inf. Syst. Frontiers.

[26]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[27]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[28]  Carlotta Domeniconi,et al.  On Error Correlation and Accuracy of Nearest Neighbor Ensemble Classifiers , 2005, SDM.

[29]  John Mylopoulos,et al.  Incremental Iterative Retrieval and Browsing for Efficient Conversational CBR Systems , 2000, Applied Intelligence.

[30]  Kalyan Moy Gupta,et al.  Case-Based Collective Inference for Maritime Object Classification , 2009, ICCBR.

[31]  David B. Leake,et al.  Case dispatching versus case-base merging: when MCBR matters , 2004, Int. J. Artif. Intell. Tools.

[32]  Padraig Cunningham,et al.  Case Representation Issues for Case-Based Reasoning from Ensemble Research , 2001, ICCBR.