Estimating Case Base Complexity Using Fractal Dimension

This paper presents a novel measure of complexity of a case base. The concept of Fractal Dimensions, which is a generalization of the idea of dimensions, is used to estimate complexity. In terms of a classification problem, the idea of Fractal Dimension is used to estimate the ruggedness of the space spanned by instances along the decision boundary. Experiments over collections of varying complexity show that the measure exhibits strong negative correlation with classification accuracies over several classifiers. We also present empirical findings from experiments over non-textual datasets.

[1]  Sarah Jane Delany,et al.  Feature based and feature free textual CBR: a comparison in spam filtering , 2006 .

[2]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[3]  Stewart Massie,et al.  From Anomaly Reports to Cases , 2007, ICCBR.

[4]  Ingemar J. Cox,et al.  Measuring the Complexity of a Collection of Documents , 2006, ECIR.

[5]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[6]  Maithilee Kunda,et al.  Fractals and Ravens , 2014, Artif. Intell..

[7]  B. Mandelbrot How Long Is the Coast of Britain? Statistical Self-Similarity and Fractional Dimension , 1967, Science.

[8]  Luc Lamontagne,et al.  Case-Based Reasoning Research and Development , 1997, Lecture Notes in Computer Science.

[9]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[10]  Barry Smyth,et al.  Advances in Case-Based Reasoning , 1996, Lecture Notes in Computer Science.

[11]  Sutanu Chakraborti,et al.  Visualizing and Evaluating Complexity of Textual Case Bases , 2008, ECCBR.

[12]  Sutanu Chakraborti,et al.  Query Suggestions for Textual Problem Solution Repositories , 2013, ECIR.

[13]  R. Marimont,et al.  Nearest Neighbour Searches and the Curse of Dimensionality , 1979 .

[14]  Derek G. Bridge,et al.  On Dataset Complexity for Case Base Maintenance , 2011, ICCBR.

[15]  Georgios Paliouras,et al.  A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists , 2004, Information Retrieval.

[16]  Stewart Massie,et al.  Complexity-Guided Case Discovery for Case Based Reasoning , 2005, AAAI.