A Taxonomy of Similarity Mechanisms for Case-Based Reasoning

Assessing the similarity between cases is a key aspect of the retrieval phase in case-based reasoning (CBR). In most CBR work, similarity is assessed based on feature value descriptions of cases using similarity metrics, which use these feature values. In fact, it might be said that this notion of a feature value representation is a defining part of the CBR worldview-it underpins the idea of a problem space with cases located relative to each other in this space. Recently, a variety of similarity mechanisms have emerged that are not founded on this feature space idea. Some of these new similarity mechanisms have emerged in CBR research and some have arisen in other areas of data analysis. In fact, research on kernel-based learning is a rich source of novel similarity representations because of the emphasis on encoding domain knowledge in the kernel function. In this paper, we present a taxonomy that organizes these new similarity mechanisms and more established similarity mechanisms in a coherent framework.

[1]  Dedre Gentner,et al.  Structure-Mapping: A Theoretical Framework for Analogy , 1983, Cogn. Sci..

[2]  Padraig Cunningham,et al.  ECUE: A Spam Filter that Uses Machine Leaming to Track Concept Drift , 2006, ECAI.

[3]  Ian H. Witten,et al.  Text Compression , 1990, 125 Problems in Text Algorithms.

[4]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[5]  Derek Greene,et al.  Ensemble clustering in medical diagnostics , 2004, Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems.

[6]  Tony Veale,et al.  The Competence of Sub-Optimal Theories of STructure Mapping on Hard Analogies , 1997, IJCAI.

[7]  Ralph Bergmann,et al.  Representation and Structure-Based Similarity Assessment for Agile Workflows , 2007, LWA.

[8]  Enric Plaza,et al.  Cases as terms: A feature term approach to the structured representation of cases , 1995, ICCBR.

[9]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[10]  Mehran Sahami,et al.  A web-based kernel function for measuring the similarity of short text snippets , 2006, WWW '06.

[11]  Jason Weston,et al.  Semi-supervised Protein Classification Using Cluster Kernels , 2003, NIPS.

[12]  Xin Chen,et al.  An information-based sequence distance and its application to whole mitochondrial genome phylogeny , 2001, Bioinform..

[13]  Mark T. Keane,et al.  The Incremental Analogy Machine: A Computational Model of Analogy , 1988, EWSL.

[14]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[15]  Ralph Bergmann,et al.  Similarity Measures for Object-Oriented Case Representations , 1998, EWCBR.

[16]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[17]  E. Costello,et al.  A Case-Based Approach to Gene Finding , 2003 .

[18]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[19]  Sarah Jane Delany,et al.  Catching the Drift: Using Feature-Free Case-Based Reasoning for Spam Filtering , 2007, ICCBR.

[20]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[21]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[22]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[23]  Horst Bunke,et al.  Similarity Measures for Structured Representations , 1993, EWCBR.

[24]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[25]  Shaul Markovitch,et al.  Anytime Induction of Decision Trees: An Iterative Improvement Approach , 2006, AAAI.

[26]  Xin Chen,et al.  A compression algorithm for DNA sequences and its applications in genome comparison , 2000, RECOMB '00.

[27]  Padraig Cunningham,et al.  Hierarchical Case-Based Reasoning Integrating Case-Based and Decompositional Problem-Solving Techniques for Plant-Control Software Design , 2001, IEEE Trans. Knowl. Data Eng..

[28]  Mykola Pechenizkiy,et al.  Dynamic Integration with Random Forests , 2006, ECML.

[29]  Mario Lenz,et al.  Case Retrieval Nets: Basic Ideas and Extensions , 1996, KI.

[30]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[31]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Stefan Wess,et al.  Case-Based Reasoning Technology: From Foundations to Applications , 1998, Lecture Notes in Computer Science.

[33]  Ivan Koychev,et al.  Feature Selection and Generalisation for Retrieval of Textual Cases , 2004, ECCBR.

[34]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[35]  A. Tversky Features of Similarity , 1977 .

[36]  Padraig Cunningham,et al.  Déjà Vu: A Hierarchical Case-Based Reasoning System for Software Design , 1992, ECAI.

[37]  Brian Falkenhainer,et al.  The Structure-Mapping Engine * , 2003 .

[38]  Bin Ma,et al.  The similarity metric , 2001, IEEE Transactions on Information Theory.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Barry Smyth,et al.  Footprint-Based Retrieval , 1999, ICCBR.

[41]  Hideo Shimazu,et al.  A Textual Case-Based Reasoning System Using XML on the World-Wide Web , 1998, EWCBR.

[42]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[43]  Mario Lenz,et al.  Applying Case Retrieval Nets to Diagnostic Tasks in Technical Domains , 1996, EWCBR.

[44]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[45]  Francesco Ricci,et al.  A Minimum Risk Metric for Nearest Neighbor Classification , 1999, ICML.

[46]  Jörg Walter Schaaf Fish and Shrink. A Next Step Towards Efficient Case Retrieval in Large-Scale Case Bases , 1996, EWCBR.

[47]  James A. Hendler,et al.  The Case for Graph-Structured Representations , 1997, ICCBR.

[48]  Ralph Bergmann,et al.  Experience Management: Foundations, Development Methodology, and Internet-Based Applications , 2002 .

[49]  Eamonn J. Keogh,et al.  Towards parameter-free data mining , 2004, KDD.

[50]  Bernhard Schölkopf,et al.  Kernel Methods in Computational Biology , 2005 .

[51]  Brian Falkenhainer,et al.  The Structure-Mapping Engine: Algorithm and Examples , 1989, Artif. Intell..

[52]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[53]  Evgeniy Gabrilovich,et al.  Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge , 2006, AAAI.

[54]  Bernhard Schölkopf,et al.  Cluster Kernels for Semi-Supervised Learning , 2002, NIPS.

[55]  Joel Waldfogel,et al.  Introduction , 2010, Inf. Econ. Policy.

[56]  Maarten Grachten,et al.  Extracting Performers' Behaviors to Annotate Cases in a CBR System for Musical Tempo Transformations , 2003, ICCBR.

[57]  Jean-Philippe Vert,et al.  Local Alignment Kernels for Biological Sequences , 2004 .

[58]  Barry Smyth,et al.  Adaptation-Guided Retrieval: Questioning the Similarity Assumption in Reasoning , 1998, Artif. Intell..

[59]  Francisco Azuaje,et al.  Incorporating Biological Domain Knowledge into Cluster Validity Assessment , 2006, EvoWorkshops.