A Framework for Data Mining and Knowledge Discovery in Cloud Computing

The massive amounts of data being generated in the current world of information technology have increased from terabytes to petabytes in volume. The fact that extracting knowledge from large-scale data is a challenging issue creates a great demand for cloud computing because of its potential benefits such as scalable storage and processing services. Considering this motivation, this chapter introduces a novel framework, data mining in cloud computing (DMCC), that allows users to apply classification, clustering, and association rule mining methods on huge amounts of data efficiently by combining data mining, cloud computing, and parallel computing technologies. The chapter discusses the main architectural components, interfaces, features, and advantages of the proposed DMCC framework. This study also compares the running times when data mining algorithms are executed in serial and parallel in a cloud environment through DMCC framework. Experimental results show that DMCC greatly decreases the execution times of data mining algorithms.

[1]  Latifur Khan,et al.  FSBD: A Framework for Scheduling of Big Data Mining in Cloud Computing , 2014, 2014 IEEE International Congress on Big Data.

[2]  Ming-Syan Chen,et al.  DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud , 2010, PAKDD.

[3]  Domenico Talia,et al.  A Cloud Framework for Parameter Sweeping Data Mining Applications , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[4]  Zaigham Mahmood,et al.  Cloud Computing: Concepts, Technology & Architecture , 2013 .

[5]  Elena Baralis,et al.  SeaRum: A Cloud-Based Service for Association Rule Mining , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[6]  Sanjay Bansal,et al.  A Survey on Association Rule Mining in Cloud Computing , 2013 .

[7]  Francisco Herrera,et al.  An Overview on the Structure and Applications for Business Intelligence and Data Mining in Cloud Computing , 2012, KMO.

[8]  Shen Ruan Based on Cloud-Computing’s Web Data Mining , 2012 .

[9]  Giannis Tzimas,et al.  Mining Biological Data on the Cloud - A MapReduce Approach , 2014, AIAI Workshops.

[10]  Derya Birant,et al.  Naive Bayes classifier for continuous variables using novel method (NBC4D) and distributions , 2014, 2014 IEEE International Symposium on Innovations in Intelligent Systems and Applications (INISTA) Proceedings.

[11]  B. Kamala A STUDY ON INTEGRATED APPROACH OF DATA MINING AND CLOUD MINING , 2013 .

[12]  Sanjay Tanwani,et al.  Distributed Framework for Data Mining As a Service on Private Cloud , 2014 .

[13]  Ashish Pandey,et al.  High performance Cloud data mining algorithm and Data mining in Clouds , 2013 .

[14]  Richard Hill,et al.  Cloud Computing for Enterprise Architectures , 2014 .

[15]  Zhi Yang,et al.  Data Mining in Cloud Computing , 2013, ISCA 2013.

[16]  R. Kamalraj,et al.  A Data Mining Based Approach for Introducing Products in SaaS (Software as a Service) , 2012 .

[17]  Zhao Li,et al.  Massive XML Data Mining in Cloud Computing Environment , 2014, J. Multim..

[18]  Lu Huang,et al.  A survey of mass data mining based on cloud-computing , 2012, Anti-counterfeiting, Security, and Identification.

[19]  Xing Wu,et al.  Dynamic Pricing Strategy for Cloud Computing with Data Mining Method , 2012, HiPC 2012.

[20]  Ruxandra- tefania Petre Data mining in Cloud Computing , 2012 .

[21]  Jie Cao,et al.  Data Cloud for Distributed Data Mining via Pipelined MapReduce , 2011, ADMI.

[22]  Robert L. Grossman,et al.  Data mining using high performance data clouds: experimental studies using sector and sphere , 2008, KDD.

[23]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[24]  Hui Wang,et al.  Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment , 2012 .