Discovering interesting statements from a database

Knowledge discovery aims at extracting new knowledge from potentially large databases; this may be in the form of interesting statements about the data. Two interrelated classes of problem arise that are treated here: to put the subjective notion of &n145;interesting&n146; into concrete terms and to deal with large numbers of statements that are related to one another (one rendering the other redundant or at least less interesting). Four increasingly subjective facets of &n145;interestingness&n146; are identified: the subject field under consideration, the conspicuousness of a finding, its novelty, and its deviation from prior knowledge. A procedure is proposed, and tried out on two quite different data sets, that allows for specifying interestingness by various means and that ranks the results in a way that takes interestingness (relevance, evidence) as well as mutual relatedness (similarity, affinity) into account—manifestations of the second and third facets of interestingness in the given data environment.