A feature-relevance heuristic for indexing and compressing large case bases

This paper reports results with igtree a formalism for index ing and compressing large case bases in Instance Based Learning ibl and other lazy learning techniques The concept of information gain en tropy minimisation is used as a heuristic feature relevance function for performing the compression of the case base into a tree igtree reduces storage requirements and the time required to compute classi cations considerably for problems where current ibl approaches fail for com plexity reasons Moreover generalisation accuracy is often similar for the tasks studied to that obtained with information gain weighted vari ants of lazy learning and alternative approaches such as c Although igtree was designed for a speci c class of problems linguistic disam biguation problems with symbolic nominal features huge case bases and a complex interaction between sub regularities and exceptions we show in this paper that the approach has a wider applicability when generalising it to tribl a hybrid combination of igtree and ibl