Cloud Technologies: A New Level for Big Data Mining

Nowadays, the amount of data being collected and stored has been constantly increasing. Data come from different sources such as various devices, sensors, networks, transactional applications, web and social media. Conventional technologies and methods are not able to store and analyze such amount of data. In this paper, a comparative analysis of the existing data mining systems is performed and it shows that the most of existing data mining solutions are not appropriate to solve Big Data problems. In order to bring conventional data mining to a new level and to cope with challenges of massive and complex data of different nature, requirements for data mining systems suitable for Big Data are derived.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Zhiyong Peng,et al.  From Big Data to Big Data Mining: Challenges, Issues, and Opportunities , 2013, DASFAA Workshops.

[3]  Olga Kurasova,et al.  Strategies for Big Data Clustering , 2014, 2014 IEEE 26th International Conference on Tools with Artificial Intelligence.

[4]  Carole A. Goble,et al.  BioCatalogue: a universal catalogue of web services for the life sciences , 2010, Nucleic Acids Res..

[5]  N. Bogunovic,et al.  An overview of free software tools for general data mining , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[6]  Nada Lavrac,et al.  Active learning for sentiment analysis on data streams: Methodology and workflow implementation in the ClowdFlows platform , 2015, Inf. Process. Manag..

[7]  Thorsten Meinl,et al.  The Konstanz Information Miner 2.0 , 2009 .

[8]  Domenico Talia,et al.  Service-oriented middleware for distributed data mining on the grid , 2008, J. Parallel Distributed Comput..

[9]  Ian T. Foster Globus Toolkit Version 4: Software for Service-Oriented Systems , 2005, NPC.

[10]  Carole A. Goble,et al.  The design and realisation of the myExperiment Virtual Research Environment for social sharing of workflows , 2009, Future Gener. Comput. Syst..

[11]  Nada Lavrac,et al.  Orange4WS Environment for Service-Oriented Data Mining , 2012, Comput. J..

[12]  Derya Birant,et al.  Service-Oriented Data Mining , 2011 .

[13]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[14]  Domenico Talia,et al.  KOALA: a co-allocating grid scheduler , 2008 .

[15]  Vlado Stankovski,et al.  Service-based Resource Brokering for Grid-Based Data Mining , 2006, GCA.

[16]  Jano I. van Hemert,et al.  Scientific Workflow: A Survey and Research Directions , 2007, PPAM.

[17]  Carole A. Goble,et al.  The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud , 2013, Nucleic Acids Res..

[18]  Vlado Stankovski,et al.  Grid-enabling data mining applications with DataMiningGrid: An architectural perspective , 2008, Future Gener. Comput. Syst..

[19]  Sebastian Stawicki,et al.  TunedIT.org: System for Automated Evaluation of Algorithms in Repeatable Experiments , 2010, RSCTC.

[20]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[21]  Julio J. Valdés,et al.  Data Mining Meets Grid Computing: Time to Dance? , 2009 .

[22]  Bhabani Shankar Prasad Mishra,et al.  Techniques and Environments for Big Data Analysis , 2016 .

[23]  Johan Montagnat,et al.  Computer-Assisted Scientific Workflow Design , 2013, Journal of Grid Computing.

[24]  Domenico Talia,et al.  How distributed data mining tasks can thrive as knowledge services , 2010, Commun. ACM.

[25]  Domenico Talia,et al.  Service-Oriented Distributed Knowledge Discovery , 2012 .

[26]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[27]  Yunming Ye,et al.  A Survey of Open Source Data Mining Systems , 2007, PAKDD Workshops.

[28]  Yahya Slimani,et al.  Meta-learning in grid-based data mining systems , 2010, Int. J. Commun. Networks Distributed Syst..

[29]  Nada Lavrac,et al.  ClowdFlows: A Cloud Based Scientific Workflow Platform , 2012, ECML/PKDD.

[30]  Nathalie Japkowicz,et al.  Big Data Analysis: New Algorithms for a New Society , 2015 .