Making knowledge discovery services scalable on clouds for big data mining

The amount of digital data is increasing beyond any previous estimation and data stores and sources are more and more pervasive and distributed. Professionals and scientists need advanced data analysis tools and services coupled with scalable architectures to support the extraction of useful information from big data repositories. Cloud computing systems offer an effective support for addressing both the computational and data storage needs of big data mining and parallel knowledge discovery applications. In fact, complex data mining tasks involve data- and compute-intensive algorithms that require large and efficient storage facilities together with high performance processors to get results in acceptable times. In this paper we introduce the topic and the main research issues. We discuss how to make knowledge discovery services scalable and present the Data Mining Cloud Framework designed for developing and executing distributed data analytics applications as workflows of services. In this environment we use data sets, analysis tools, data mining algorithms and knowledge models that are implemented as single services that can be combined through a visual programming interface in distributed workflows to be executed on Clouds. The main features of the programming interface are described and performance evaluation of knowledge discovery applications are reported.

[1]  Domenico Talia,et al.  Scalable script-based data analysis workflows on clouds , 2013, WORKS@SC.

[2]  Joe Weinman,et al.  The future of Cloud Computing , 2011, 2011 IEEE Technology Time Machine Symposium on Technologies Beyond 2020.

[3]  Domenico Talia,et al.  A Cloud Framework for Big Data Analytics Workflows on Azure , 2012, High Performance Computing Workshop.

[4]  Randy H. Katz,et al.  A view of cloud computing , 2010, CACM.

[5]  Domenico Talia,et al.  Clouds for Scalable Big Data Analytics , 2013, Computer.

[6]  Eugenio Cesario,et al.  Using Clouds for Smart City Applications , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[7]  Lee Rainie,et al.  The future of cloud computing , 2010 .