Global optimization of case-based reasoning for breast cytology diagnosis

Case-based reasoning (CBR) is one of the most popular prediction techniques in medical domains because it is easy to apply, has no possibility of overfitting, and provides a good explanation for the output. However, it has a critical limitation - its prediction performance is generally lower than other AI techniques like artificial neural networks (ANN). In order to obtain accurate results from CBR, effective retrieval and matching of useful prior cases for the problem is essential, but it is still a controversial issue to design a good matching and retrieval mechanism for CBR systems. In this study, we propose a novel approach to enhance the prediction performance of CBR. Our suggestion is the simultaneous optimization of feature weights, instance selection, and the number of neighbors that combine using genetic algorithms (GA). Our model improves the prediction performance in three ways - (1) measuring similarity between cases more accurately by considering relative importance of each feature, (2) eliminating useless or erroneous reference cases, and (3) combining several similar cases represent significant patterns. To validate the usefulness of our model, this study applied it to a real-world case for evaluating cytological features derived directly from a digital scan of breast fine needle aspirate (FNA) slides. Experimental results showed that the prediction accuracy of conventional CBR may be improved significantly by using our model. We also found that our proposed model outperformed all the other optimized models for CBR using GA.

[1]  W. Eric L. Grimson,et al.  Prototype optimization for nearest-neighbor classification , 2002, Pattern Recognit..

[2]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[3]  Uri Lipowezky Selection of the optimal prototype subset for 1-NN classification , 1998, Pattern Recognit. Lett..

[4]  Lawrence Davis,et al.  Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm , 1991, ICGA.

[5]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[6]  Kyoung-jae Kim,et al.  Toward Global Optimization of Case-Based Reasoning Systems for Financial Forecasting , 2004, Applied Intelligence.

[7]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.

[8]  Lorenzo Moreno Ruiz,et al.  Cytological breast fine needle aspirate images analysis with a genetic fuzzy finite state machine , 2002, Proceedings of 15th IEEE Symposium on Computer-Based Medical Systems (CBMS 2002).

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[11]  David B. Skalak,et al.  Using a Genetic Algorithm to Learn Prototypes for Case Retrieval and Classification , 1993 .

[12]  David W. Aha,et al.  A Review and Empirical Evaluation of Feature Weighting Methods for a Class of Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[13]  Ingoo Han,et al.  Case-based reasoning supported by genetic algorithms for corporate bond rating , 1999 .

[14]  C. R. Mount,et al.  A case-based reasoning system for identifying failure mechanisms , 2000 .

[15]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[16]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[17]  Chaochang Chiu,et al.  A case-based customer classification approach for direct marketing , 2002, Expert Syst. Appl..

[18]  Se-Hak Chun,et al.  New knowledge extraction technique using probability for case‐based reasoning: application to medical diagnosis , 2006, Expert Syst. J. Knowl. Eng..

[19]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[20]  Miroslav Kubat,et al.  Selecting representative examples and attributes by a genetic algorithm , 2003, Intell. Data Anal..

[21]  Hong Yan,et al.  Prototype optimization for nearest neighbor classifiers using a two-layer perceptron , 1993, Pattern Recognit..

[22]  David B. Skalak,et al.  Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[23]  Claire Cardie,et al.  Improving Minority Class Prediction Using Case-Specific Feature Weights , 1997, ICML.

[24]  Pedro M. Domingos Control-Sensitive Feature Selection for Lazy Learners , 1997, Artificial Intelligence Review.

[25]  Ruimin Shen,et al.  GA based CBR approach in Q&A system , 2004, Expert Syst. Appl..

[26]  T. Ravindra Babu,et al.  Comparison of genetic algorithm based prototype selection schemes , 2001, Pattern Recognit..

[27]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[28]  Ingoo Han,et al.  Hybrid genetic algorithms and case‐based reasoning systems for customer classification , 2004, Expert Syst. J. Knowl. Eng..

[29]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[30]  Susan Craw,et al.  Self-optimising CBR retrieval , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[31]  Hans-Peter Kriegel,et al.  Feature Weighting and Instance Selection for Collaborative Filtering: An Information-Theoretic Approach* , 2003, Knowledge and Information Systems.

[32]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[33]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .