DATA CHARACTERIZATION TOWARDS MODELING FREQUENT PATTERN MINING ALGORITHMS

Big data quickly comes under the spotlight in recent years. As big data is supposed to handle extremely huge amount of data, it is quite natural that the demand for the computational environment to accelerates, and scales out big data applications increases. The important thing is, however, the behavior of big data applications is not clearly defined yet. Among big data applications, this paper specifically focuses on stream mining applications. The behavior of stream mining applications varies according to the characteristics of the input data. The parameters for data characterization are, however, not clearly defined yet, and there is no study investigating explicit relationships between the input data, and stream mining applications, either. Therefore, this paper picks up frequent pattern mining as one of the representative stream mining applications, and interprets the relationships between the characteristics of the input data, and behaviors of signature algorithms for frequent pattern mining.

[1]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[2]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[3]  Jean-Marc Vincent,et al.  Random graph generation for scheduling simulations , 2010, SimuTools.

[4]  Karl-Michael Schneider,et al.  A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering , 2003, EACL.

[5]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[6]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[7]  Chris Clifton,et al.  Dependable real-time data mining , 2005, Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (ISORC'05).

[8]  Dinesh Manocha,et al.  Fast and approximate stream mining of quantiles and frequencies using graphics processors , 2005, SIGMOD '05.

[9]  Rong Yan,et al.  Configuring topologies of distributed semantic concept classifiers for continuous multimedia stream processing , 2008, ACM Multimedia.

[10]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[11]  Douglas Thain,et al.  The quest for scalable support of data-intensive workloads in distributed systems , 2009, HPDC '09.

[12]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[13]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[14]  Michael Stonebraker,et al.  Load Shedding in a Data Stream Manager , 2003, VLDB.

[15]  Deepak S. Turaga,et al.  Resource Management for Networked Classifiers in Distributed Stream Mining Systems , 2006, Sixth International Conference on Data Mining (ICDM'06).

[16]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[17]  Qiang Ding,et al.  Decision tree classification of spatial data streams using Peano Count Trees , 2002, SAC '02.

[18]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[19]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[20]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..

[21]  Keke Chen,et al.  HE-Tree: a framework for detecting changes in clustering structure for categorical data streams , 2009, The VLDB Journal.

[22]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[23]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[24]  Carlos Ordonez,et al.  Clustering binary data streams with K-means , 2003, DMKD '03.

[25]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[26]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[27]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[28]  Sayaka Akioka Task Graphs of Stream Mining Algorithms , 2013, BD3@VLDB.

[29]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[30]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[31]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[32]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[33]  Hironori Kasahara,et al.  A standard task graph set for fair evaluation of multiprocessor scheduling algorithms , 2002 .

[34]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[35]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[36]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[37]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[38]  Wayne H. Wolf,et al.  TGFF: task graphs for free , 1998, Proceedings of the Sixth International Workshop on Hardware/Software Codesign. (CODES/CASHE'98).

[39]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[40]  Yoichi Muraoka,et al.  Data Access Pattern Analysis on Stream Mining Algorithms for Cloud Computation , 2010, PDPTA.

[41]  Christos Faloutsos,et al.  Adaptive, Hands-Off Stream Mining , 2003, VLDB.