Similarity Measure by Aggregating Shared Emerging Patterns

The shared emerging patterns (SEPs) is a special form of emerging patterns(EPs). In the field of data mining, EPs represents the knowledge of strong characters in one dataset and it is very important for building classifier. However, SEPs represents the shared knowledge of strong characters in two or more datasets and it has great potential for applying in analogy and transfer learning. When the training data is lacking, in order to save cost, we need to find the existing similar data and not to mark new data. In this case, similarity measure of dataset has great significance. In this paper, a novel application of SEPs is proposed that it used to measure similarity of two datasets, the quality and quantity of SEPs are two parameters for the contribution that used to measure the similarity. For lack of samples in a certain field, according to the similarity measure we obtain known similar samples.