Choosing among competing generalizations

In a learning situation with noisy data, a generalization procedure may find a sizeable collection of partial generalizations each of them covering a part of the positive examples and, in general, a few negative examples. The task is to choose a moderately sized sub-collection of partial generalizations that are sufficiently different from one another. The idea is that the user is less interested in all the “best” partial generalizations (according to a given criterion) if they are quite similar rather than in some diverse ones, even if they are, taken individually, less satisfactory. This vague goal concept is made more precise and a procedure is developed and examined that employs two notions: a measure for the quality of a single generalization (called evidence) and an asymmetric measure for the similarity of two generalizations (called affinity). These two co-operate in suppressing generalizations that are worse than, but not too different from, another generalization. The selection procedure works satisfactorily in different environments.