When Similar Problems Don't Have Similar Solutions

The performance of a Case-Based Reasoning system relies on the integrity of its case base but in real life applications the available data used to construct the case base invariably contains erroneous, noisy cases. Automated removal of these noisy cases can improve system accuracy. In addition, error rates for nearest neighbour classifiers can often be reduced by removing cases to give smoother decision boundaries between classes. In this paper we argue that the optimallevel of boundary smoothing is domain dependent and, therefore, our approach to error reduction reacts to the characteristics of the domain to set an appropriate level of smoothing. We present a novel, yet transparent algorithm, Threshold Error Reduction, which identifies and removes noisy and boundary cases with the aid of a local complexity measure. Evaluation results confirm it to be superior to benchmark algorithms.

[1]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[2]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[3]  Teuvo Kohonen,et al.  Self-organization and associative memory: 3rd edition , 1989 .

[4]  H. Tirri,et al.  Massively Parallel Case-Based Reasoning with Probabilistic Similarity Metrics , 1993, EWCBR.

[5]  Edwina L. Rissland,et al.  Case Retrieval through Multiple Indexing and Heuristic Search , 1993, IJCAI.

[6]  Stefan Wess,et al.  Using k-d Trees to Improve the Retrieval Step in Case-Based Reasoning , 1993, EWCBR.

[7]  Agnar Aamodt,et al.  Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches , 1994, AI Commun..

[8]  Jörg Walter Schaaf,et al.  "Fish and Sink" - An Anytime-Algorithm to Retrieve Adequate Cases , 1995, ICCBR.

[9]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[10]  Carla E. Brodley,et al.  Identifying and Eliminating Mislabeled Training Instances , 1996, AAAI/IAAI, Vol. 1.

[11]  Ken Orr Data Quality and System Theory. , 1998 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Ken Orr,et al.  Data quality and systems theory , 1998, CACM.

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Chris Mellish,et al.  On the Consistency of Information Filters for Lazy Learning Algorithms , 1999, PKDD.

[16]  Barry Smyth,et al.  Building Compact Competent Case-Bases , 1999, ICCBR.

[17]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[18]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[19]  Barry Smyth,et al.  Competence Models and the Maintenance Problem , 2001, Comput. Intell..

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Tin Kam Ho,et al.  Complexity Measures of Supervised Classification Problems , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[22]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[23]  Thomas Roth-Berghofer,et al.  Explanations and Case-Based Reasoning: Foundational Issues , 2004, ECCBR.

[24]  Jing Wu,et al.  Enhancing the Effectiveness of Interactive Case-Based Reasoning with Clustering and Decision Forests , 2004, Applied Intelligence.

[25]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[26]  Padraig Cunningham,et al.  An Analysis of Case-Base Editing in a Spam Filtering System , 2004, ECCBR.

[27]  Xingquan Zhu,et al.  Class Noise vs. Attribute Noise: A Quantitative Study , 2003, Artificial Intelligence Review.

[28]  Barry Smyth,et al.  Retrieval, reuse, revision and retention in case-based reasoning , 2005, The Knowledge Engineering Review.

[29]  Tin Kam Ho,et al.  Domain of competence of XCS classifier system in complexity measurement space , 2005, IEEE Transactions on Evolutionary Computation.

[30]  Stewart Massie,et al.  Complexity-Guided Case Discovery for Case Based Reasoning , 2005, AAAI.

[31]  Pei-Chann Chang,et al.  A hybrid system combining self-organizing maps with case-based reasoning in wholesaler's new-release book forecasting , 2005, Expert Syst. Appl..

[32]  Enric Plaza,et al.  Distributed case-based reasoning , 2005, Knowl. Eng. Rev..

[33]  Albert Fornells,et al.  Integration of Strategies Based on Relevance Feedback into a Tool for the Retrieval of Mammographic Images , 2006, IDEAL.

[34]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[35]  E. Golobardes,et al.  Unsupervised Case Memory Organization: Analysing Computational Time and Soft Computing Capabilities , 2006, ECCBR.

[36]  Nic Wilson,et al.  Decision Diagrams: Fast and Flexible Support for Case Retrieval and Recommendation , 2006, ECCBR.

[37]  Stewart Massie,et al.  Complexity Profiling for Informed Case-Base Editing , 2006, ECCBR.

[38]  Albert Fornells,et al.  Measuring the Applicability of Self-organization Maps in a Case-Based Reasoning System , 2007, IbPRIA.