Introspective Knowledge Revision in Textual Case-Based Reasoning

The performance of a Textual Case-Based Reasoning system is critically dependent on its underlying model of text similarity, which in turn is dependent on similarity between terms and phrases in the domain. In the absence of human intervention, term similarities are often modelled using co-occurrence statistics, which are fragile unless the corpus is truly representative of the domain. We present the case for introspective revision in TCBR, whereby the system incrementally revises its term similarity knowledge by exploiting conflicts of its representation against an alternate source of knowledge such as category knowledge in classification tasks, or linguistic and background knowledge. The advantage of such revision is that it requires no human intervention. Our experiments on classification knowledge show that revision can lead to substantial gains in classification accuracy, with results competitive to best-in-line text classifiers. We have also presented experimental results over synthetic data to suggest that the idea can be extended to improve case-base alignment in TCBR domains with textual problem and solution descriptions.

[1]  Stan Matwin,et al.  Text Classification Using WordNet Hypernyms , 1998, WordNet@ACL/COLING.

[2]  Sutanu Chakraborti,et al.  Robust Measures of Complexity in TCBR , 2009, ICCBR.

[3]  Mario Lenz,et al.  Case retrieval nets as a model for building flexible information systems , 1999, DISKI.

[4]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[5]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[6]  Ray Bareiss,et al.  Concept Learning and Heuristic Classification in WeakTtheory Domains , 1990, Artif. Intell..

[7]  William M. Pottenger,et al.  Detecting Patterns in the LSI Term-Term Matrix , 2002 .

[8]  Padraig Cunningham,et al.  Using Introspective Learning to Improve Retrieval in CBR: A Case Study in Air Traffic Control , 1997, ICCBR.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[11]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[12]  Pedro A. González-Calero,et al.  Formal concept analysis as a support technique for CBR , 2001, Knowl. Based Syst..

[13]  Sutanu Chakraborti,et al.  Sprinkling: Supervised Latent Semantic Indexing , 2006, ECIR.

[14]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[15]  David W. Aha,et al.  Introduction: Interactive Case-Based Reasoning , 2001, Applied Intelligence.

[16]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[17]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[19]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[20]  Sutanu Chakraborti,et al.  A Propositional Approach to Textual Case Indexing , 2005, PKDD.

[21]  Sutanu Chakraborti,et al.  Visualizing and Evaluating Complexity of Textual Case Bases , 2008, ECCBR.

[22]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .