Effectively and Efficiently Mining Frequent Patterns from Dense Graph Streams on Disk

Abstract In this paper, we focus on dense graph streams , which can be generated in various applications ranging from sensor networks to social networks, from bio-informatics to chemical informatics. We also investigate the problem of effectively and efficiently mining frequent patterns from such streaming data, in the targeted case of dealing with limited memory environments so that disk support is required. This setting occurs frequently (e.g., in mobile applications / systems) and is gaining momentum even in advanced computational settings where social networks are the main representative. Inspired by this problem, we propose (i) a specialized data structure called DSMatrix, which captures important data from dense graph streams onto the disk directly and (ii) stream mining algorithms that make use of such structure in order to mine frequent patterns effectively and efficiently. Experimental results clearly confirm the benefits of our approach.

[1]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[2]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Apostolos N. Papadopoulos,et al.  Discovery of Top-k Dense Subgraphs in Dynamic Graph Collections , 2012, SSDBM.

[4]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[5]  Diana Maynard,et al.  Interlinking Documents based on Semantic Graphs , 2013, KES.

[6]  John Wang,et al.  Encyclopedia of Business Analytics and Optimization , 2018 .

[7]  Charu C. Aggarwal,et al.  On Classification of Graph Streams , 2011, SDM.

[8]  Alfredo Cuzzocrea,et al.  A Grid Framework for Approximate Aggregate Query Answering on Summarized Sensor Network Readings , 2004, OTM Workshops.

[9]  Philip S. Yu,et al.  On dense pattern mining in graph streams , 2010, Proc. VLDB Endow..

[10]  Alfredo Cuzzocrea,et al.  Frequent Pattern Mining from Dense Graph Streams , 2014, EDBT/ICDT Workshops.

[11]  Yelena Yesha,et al.  Data Mining: Next Generation Challenges and Future Directions , 2004 .

[12]  Bart Goethals,et al.  Frequent Itemset Mining for Big Data , 2013, 2013 IEEE International Conference on Big Data.

[13]  Geoff Holmes,et al.  Mining frequent closed graphs on evolving data streams , 2011, KDD.

[14]  Carson Kai-Sang Leung,et al.  Finding groups of friends who are significant across multiple domains in social networks , 2013, 2013 Fifth International Conference on Computational Aspects of Social Networks.

[15]  Alfredo Cuzzocrea,et al.  Stream mining of frequent sets with limited memory , 2013, SAC '13.

[16]  Alfredo Cuzzocrea,et al.  CAMS: OLAPing Multidimensional Data Streams Efficiently , 2009, DaWaK.

[17]  Carson Kai-Sang Leung,et al.  Exploring Social Networks: A Frequent Pattern Visualization Approach , 2010, 2010 IEEE Second International Conference on Social Computing.

[18]  Longbing Cao,et al.  Mining Frequent Patterns from Human Interactions in Meetings Using Directed Acyclic Graphs , 2013, PAKDD.

[19]  Alfredo Cuzzocrea,et al.  Discovering Frequent Patterns from Uncertain Data Streams with Time-Fading and Landmark Models , 2013, Trans. Large Scale Data Knowl. Centered Syst..

[20]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[21]  Carson Kai-Sang Leung,et al.  Frequent itemset mining of uncertain data streams using the damped window model , 2011, SAC.

[22]  Carson Kai-Sang Leung,et al.  Finding Strong Groups of Friends among Friends in Social Networks , 2011, 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing.

[23]  Sharma Chakravarthy,et al.  Event-based lossy compression for effective and efficient OLAP over data streams , 2010, Data Knowl. Eng..

[24]  Srinivasan Parthasarathy,et al.  Out-of-core frequent pattern mining on a commodity PC , 2006, KDD '06.

[25]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[26]  Carson Kai-Sang Leung,et al.  Big Data Mining and Analytics , 2014 .

[27]  Carson Kai-Sang Leung,et al.  PUF-Tree: A Compact Tree Structure for Frequent Pattern Mining of Uncertain Data , 2013, PAKDD.

[28]  Bin Li,et al.  Fast Graph Stream Classification Using Discriminative Clique Hashing , 2013, PAKDD.

[29]  Carson Kai-Sang Leung,et al.  Interactive Visual Analytics of Databases and Frequent Sets , 2013, Int. J. Inf. Retr. Res..

[30]  Carson Kai-Sang Leung,et al.  Mining Frequent Patterns from Uncertain Data with MapReduce for Big Data Analytics , 2013, DASFAA.

[31]  Lei Cao,et al.  Scalable distance-based outlier detection over high-volume data streams , 2014, 2014 IEEE 30th International Conference on Data Engineering.