A Pattern Tree Based Method for Mining Conditional Contrast Patterns of Multi-source Data

Contrast patterns are itemsets that frequently occur in one dataset while not in another. These patterns have been successfully applied to many data mining domains, such as prediction, classification and clustering. However, none of the previous studies has considered extracting contrast patterns from different types of datasets. In this paper, we introduce a new type of contrast pattern, Conditional Contrast Patterns (CCPs), which are a subset of traditional Contrast Patterns (CPs) in one kind of dataset conditioned on a property of these patterns in another kind of dataset. Accordingly, we propose an algorithm based on tree search for mining CCPs, which can compress the datasets into a tree representation. We evaluate our proposed method in comparison with two other methods (Brute force and Apriori-based methods) on a synthetic dataset as well as a real-life retail dataset. The results show that CCPs are more informative and actionable for decision makers than normal CPs, and our tree-based algorithm has the best performance in terms of efficiency.

[1]  James Bailey,et al.  Contrast Data Mining: Concepts, Algorithms, and Applications , 2012 .

[2]  James Bailey,et al.  Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams , 2006, KDD '06.

[3]  Christopher Leckie,et al.  Discovering the Impact of Urban Traffic Interventions Using Contrast Mining on Vehicle Trajectory Data , 2015, PAKDD.

[4]  Lijuan Lu,et al.  A New Algorithm Based on Shared Pattern-Tree to Mine Shared Emerging Patterns , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[5]  Jixian Zhang Multi-source remote sensing data fusion: status and trends , 2010 .

[6]  Changjie Tang,et al.  Mining Top-k Distinguishing Sequential Patterns with Flexible Gap Constraints , 2016, WAIM.

[7]  Bin Shen,et al.  A Framework for Mining Actionable Navigation Patterns from In-Store RFID Datasets via Indoor Mapping , 2015, Sensors.

[8]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[9]  Peter Bak,et al.  Understanding customer behavior using indoor location analysis and visualization , 2014, IBM J. Res. Dev..

[10]  José Francisco Martínez Trinidad,et al.  Fuzzy emerging patterns for classifying hard domains , 2011, Knowledge and Information Systems.

[11]  Licia Capra,et al.  Urban Computing: Concepts, Methodologies, and Applications , 2014, TIST.

[12]  Li Li,et al.  Trajectory Pattern Identification and Anomaly Detection of Pedestrian Flows Based on Visual Clustering , 2016, Intelligent Information Processing.

[13]  Kotagiri Ramamohanarao,et al.  The Space of Jumping Emerging Patterns and Its Incremental Maintenance Algorithms , 2000, ICML.

[14]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[15]  Kotagiri Ramamohanarao,et al.  Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Shanmugasundaram Hariharan,et al.  A survey on mining multiple data sources , 2013, WIREs Data Mining Knowl. Discov..

[17]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[18]  Jae-Gil Lee,et al.  Mining Discriminative Patterns for Classifying Trajectories on Road Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.