A general framework to encode heterogeneous information sources for contextual pattern mining

Traditional pattern mining methods usually work on single data sources. However, in practice, there are often multiple and heterogeneous information sources. They collectively provide contextual information not available in any single source alone describing the same set of objects, and are useful for discovering hidden contextual patterns. One important challenge is to provide a general methodology to mine contextual patterns easily and efficiently. In this paper, we propose a general framework to encode contextual information from multiple sources into a coherent representation---Contextual Information Graph (CIG). The complexity of the encoding scheme is linear in both time and space. More importantly, CIG can be handled by any single-source pattern mining algorithms that accept taxonomies without any modification. We demonstrate by three applications of the contextual association rule, sequence and graph mining, that contextual patterns providing rich and insightful knowledge can be easily discovered by the proposed framework. It enables Contextual Pattern Mining (CPM) by reusing single-source methods, and is easy to deploy and use in real-world systems.

[1]  Alessio Malizia,et al.  Automated Personal Assistants , 2011, Computer.

[2]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[3]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4]  David Page,et al.  Biological applications of multi-relational data mining , 2003, SKDD.

[5]  Gultekin Özsoyoglu,et al.  Taxonomy-superimposed graph mining , 2008, EDBT '08.

[6]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[7]  Akihiro Inokuchi Mining generalized substructures from a set of labeled graphs , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[8]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[9]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[10]  Jiawei Han,et al.  Frequent Closed Sequence Mining without Candidate Maintenance , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Stephen Muggleton,et al.  Inductive Logic Programming , 2011, Lecture Notes in Computer Science.

[13]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Saso Dzeroski,et al.  Multi-relational data mining: an introduction , 2003, SKDD.

[16]  Tijl De Bie,et al.  Interesting Multi-relational Patterns , 2011, 2011 IEEE 11th International Conference on Data Mining.

[17]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[18]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[19]  Jiawei Han,et al.  Discovery of Spatial Association Rules in Geographic Information Databases , 1995, SSD.

[20]  Philip S. Yu,et al.  Dual Labeling: Answering Graph Reachability Queries in Constant Time , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[21]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[22]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..