Using genetic algorithms to optimize nearest neighbors for data mining

Case-based reasoning (CBR) is widely used in data mining for managerial applications because it often shows significant promise for improving the effectiveness of complex and unstructured decision making. There are, however, some limitations in designing appropriate case indexing and retrieval mechanisms including feature selection and feature weighting. Some of the prior studies pointed out that finding the optimal k parameter for the k-nearest neighbor (k-NN) is also one of the most important factors for designing an effective CBR system. Nonetheless, there have been few attempts to optimize the number of neighbors, especially using artificial intelligence (AI) techniques. This study proposes a genetic algorithm (GA) approach to optimize the number of neighbors to combine. In this study, we apply this novel model to two real-world cases involving stock market and online purchase prediction problems. Experimental results show that a GA-optimized k-NN approach may outperform traditional k-NN. In addition, these results also show that our proposed method is as good as or sometime better than other AI techniques in performance-comparison.

[1]  Janet L. Kolodner,et al.  Case-Based Reasoning , 1989, IJCAI 1989.

[2]  Pei-Chann Chang,et al.  A case-based expert support system for due-date assignment in a wafer fabrication factory , 2003, J. Intell. Manuf..

[3]  Ingoo Han,et al.  Case-based reasoning supported by genetic algorithms for corporate bond rating , 1999 .

[4]  Elisabet Golobardes,et al.  Automatic diagnosis with genetic algorithms and case-based reasoning , 1999, Artif. Intell. Eng..

[5]  Susan Craw,et al.  Self-optimising CBR retrieval , 2000, Proceedings 12th IEEE Internationals Conference on Tools with Artificial Intelligence. ICTAI 2000.

[6]  Ruimin Shen,et al.  GA based CBR approach in Q&A system , 2004, Expert Syst. Appl..

[7]  Ingoo Han,et al.  Hybrid genetic algorithms and case‐based reasoning systems for customer classification , 2004, Expert Syst. J. Knowl. Eng..

[8]  Bradley P. Allen,et al.  Case-based reasoning: business applications , 1994, CACM.

[9]  Ian D. Watson,et al.  Applying case-based reasoning - techniques for the enterprise systems , 1997 .

[10]  Nick Lord,et al.  Statistical methods for business and economics , 1970 .

[11]  N. Ishii,et al.  A method of similarity metrics for structured representations , 1997 .

[12]  Chaochang Chiu,et al.  A case-based customer classification approach for direct marketing , 2002, Expert Syst. Appl..

[13]  Ingoo Han,et al.  Global optimization of feature weights and the number of neighbors that combine in a case‐based reasoning system , 2006, Expert Syst. J. Knowl. Eng..

[14]  Kyoung-jae Kim,et al.  Toward Global Optimization of Case-Based Reasoning Systems for Financial Forecasting , 2004, Applied Intelligence.

[15]  Kyung-shik Shin,et al.  A genetic algorithm application in bankruptcy prediction modeling , 2002, Expert Syst. Appl..

[16]  Ingoo Han,et al.  Hybrid Genetic Algorithms and Case-Based Reasoning Systems , 2004, CIS.

[17]  Miroslav Kubat,et al.  Selecting representative examples and attributes by a genetic algorithm , 2003, Intell. Data Anal..

[18]  Ingoo Han,et al.  Maintaining case-based reasoning systems using a genetic algorithms approach , 2001, Expert Syst. Appl..

[19]  Cheng Wu,et al.  A genetic learning approach with case-based memory for job-shop scheduling problems , 2002, Proceedings. International Conference on Machine Learning and Cybernetics.

[20]  Ingoo Han,et al.  A case-based reasoning system with the two-dimensional reduction technique for customer classification , 2007, Expert Syst. Appl..

[21]  Lakhmi C. Jain,et al.  Nearest neighbor classifier: Simultaneous editing and feature selection , 1999, Pattern Recognit. Lett..

[22]  Jack Sklansky,et al.  A note on genetic algorithms for large-scale feature selection , 1989, Pattern Recognition Letters.