Multiple MapReduce and derivative projected database: New approach for supporting PrefixSpan scalability

To support PrefixSpan scalability, there exits two problems regarding its implementation in MapReduce framework. The first problem is related to parsing & analyzing big data, while the second one is related to managing projected databases. In this paper, we propose two methods i.e. Multiple MapReduce and Derivative Projected Database to overcome the first and the second problems. Our experiments show that those proposed method can significantly reduce execution time in supporting the scalability of PrefixSpan.

[1]  R. C. SARITHA,et al.  Mining Frequent Item Sets Using Map Reduce Paradigm , 2014 .

[2]  Wei-keng Liao,et al.  Performance evaluation and characterization of scalable data mining algorithms , 2004 .

[3]  Riza Ramadan STRATEGI IMPLEMENTASI PENINGKATAN WAKTU PROSES ALGORITMA PREFIXSPAN UNTUK SEQUENTIAL PATTERN MINING , 2011 .

[4]  Jimmy J. Lin,et al.  Web-scale computer vision using MapReduce for multimedia data mining , 2010, MDMKDD '10.

[5]  V. Suma,et al.  A Comparative Analysis of Data Mining Tools in Agent Based Systems , 2012, ArXiv.

[6]  Murat Kantarcioglu,et al.  A Comparison of Approaches for Large-Scale Data Mining Utilizing MapReduce in Large-Scale Data Mining , 2010 .

[7]  Dheeraj Agrawal A Comprehensive Study of Data Mining and Application , 2013 .

[8]  Ranieri Baraglia,et al.  Document Similarity Self-Join with MapReduce , 2010, 2010 IEEE International Conference on Data Mining.

[9]  John B. Goodenough,et al.  On System Scalability , 2006 .

[10]  Anand Rajaraman,et al.  Mining of Massive Datasets , 2011 .

[11]  Edward Y. Chang,et al.  Pfp: parallel fp-growth for query recommendation , 2008, RecSys '08.

[12]  Yunming Ye,et al.  A Survey of Open Source Data Mining Systems , 2007, PAKDD Workshops.

[13]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[14]  Jimmy J. Lin,et al.  Scaling big data mining infrastructure: the twitter experience , 2013, SKDD.

[15]  Shuliang Wang,et al.  Algorithm and approaches to handle large Data- A Survey , 2013, ArXiv.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[18]  Zoltán Prekopcsák,et al.  Radoop: Analyzing Big Data with RapidMiner and Hadoop , 2011 .

[19]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[20]  Connie U. Smith,et al.  Web Application Scalability: A Model-Based Approach , 2004, Int. CMG Conference.

[21]  Alex Holmes Hadoop in Practice , 2012 .

[22]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[23]  Kyuseok Shim,et al.  MapReduce Algorithms for Big Data Analysis , 2013, DNIS.

[24]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .