Mining Generalised Emerging Patterns

Emerging Patterns (EPs) are a data mining model that is useful as a means of discovering distinctions inherently present amongst a collection of datasets. However, current EP mining algorithms do not handle attributes whose values are asscociated with taxonomies (is-a hierarchies). Current EP mining techniques are restricted to using only the leaf-level attribute-values in a taxonomy. In this paper, we formally introduce the problem of mining generalised emerging patterns. Given a large data set, where some attributes are hierarchical, we find emerging patterns that consist of items at any level of the taxonomies. Generalised EPs are more concise and interpretable when used to describe some distinctive characteristics of a class of data. They are also considered to be more expressive because they include items at higher levels of the hierarchies, which have larger supports than items at the leaf level. We formulate the problem of mining generalised EPs, and present an algorithm for this task. We demonstrate that the discovered generalised patterns, which contain items at higher levels in the hierarchies, have greater support than traditional leaf-level EPs according to our experimental results based on ten benchmark datasets.