Big Data Knowledge Discovery Platforms: A 360 Degree Perspective

Big Datais a buzzword affecting nearly every domain and providing different set new opportunity for the development of knowledge discovery process. Although it comes with challengeslike abundance, extensiveness and diversity, timeliness and dynamism, messiness and vagueness, and with an uncertainty as all the data generated does not relates to any specific question and can be associated with another process or activity. To address these challenges are certainly cannot be handled by the traditional infrastructure, platforms and frameworks. New analytical techniques and high performance computing architecture came into picture to handle this explosion. These platforms and architecture are giving a cutting edge to the Big Data Knowledge Discovery process by using Artificial Intelligence, Machine Learning and Expert systems. This study encompasses a comprehensive review of Big Data analytical platforms and frameworks with their comparative analysis. A Knowledge Discovery architecture for Big Data Analytics is also proposed while considering the fundamental aspect of gaining insights from Big Data sets and focus of this analysis is to provide the open challenges associated with these techniques and future research directions.

[1]  S. R,et al.  Data Mining with Big Data , 2017, 2017 11th International Conference on Intelligent Systems and Control (ISCO).

[2]  Athanasios V. Vasilakos,et al.  Big data analytics: a survey , 2015, Journal of Big Data.

[3]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[4]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[5]  Morteza Mardani,et al.  Subspace Learning and Imputation for Streaming Big Data Matrices and Tensors , 2014, IEEE Transactions on Signal Processing.

[6]  Xiangfeng Wang,et al.  Machine learning for Big Data analytics in plants. , 2014, Trends in plant science.

[7]  George Karypis,et al.  MPI for Big Data: New tricks for an old dog , 2014, Parallel Comput..

[8]  Jean-Yves Tourneret,et al.  A New Frequency Estimation Method for Equally and Unequally Spaced Data , 2014, IEEE Transactions on Signal Processing.

[9]  Madhu Siddalingaiah,et al.  Pro Apache Hadoop , 2014, Apress.

[10]  Muhammad Shiraz,et al.  Big Data: Survey, Technologies, Opportunities, and Challenges , 2014, TheScientificWorldJournal.

[11]  Rajiv Ranjan,et al.  Streaming Big Data Processing in Datacenter Clouds , 2014, IEEE Cloud Computing.

[12]  Yonggang Wen,et al.  Toward Scalable Systems for Big Data Analytics: A Technology Tutorial , 2014, IEEE Access.

[13]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[14]  Bas Geerdink,et al.  A reference architecture for big data solutions introducing a model to perform predictive analytics using big data technology , 2013, 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013).

[15]  Laurence T. Yang,et al.  Big Data Real-Time Processing Based on Storm , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[16]  Qihui Wu,et al.  Spatial-Temporal Opportunity Detection for Spectrum-Heterogeneous Cognitive Radio Networks: Two-Dimensional Sensing , 2013, IEEE Transactions on Wireless Communications.

[17]  William B. March,et al.  MLPACK: a scalable C++ machine learning library , 2012, J. Mach. Learn. Res..

[18]  Fu Lin,et al.  Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers , 2011, IEEE Transactions on Automatic Control.

[19]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[20]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[21]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[22]  Carsten Felden,et al.  Big Data - A State-of-the-Art , 2012, AMCIS.

[23]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[24]  Jun Rao,et al.  Building LinkedIn's Real-time Activity Data Pipeline , 2012, IEEE Data Eng. Bull..

[25]  Richard G Baraniuk,et al.  More Is Less: Signal Processing and the Data Deluge , 2011, Science.

[26]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[27]  Leonardo Neumeyer,et al.  S4: Distributed Stream Computing Platform , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[28]  Sinisa Todorovic,et al.  Local-Learning-Based Feature Selection for High-Dimensional Data Analysis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Juhnyoung Lee,et al.  A view of cloud computing , 2010, CACM.

[30]  Nesime Tatbul,et al.  Streaming data integration: Challenges and opportunities , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[31]  Sanjay Ghemawat,et al.  MapReduce: a flexible data processing tool , 2010, CACM.

[32]  Marios D. Dikaiakos,et al.  Cloud Computing: Distributed Internet Computing for IT and Scientific Research , 2009, IEEE Internet Computing.

[33]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[34]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[35]  Carl Eklund,et al.  National Institute for Standards and Technology , 2009, Encyclopedia of Biometrics.

[36]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[37]  Christoforos E. Kozyrakis,et al.  Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.

[38]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[39]  Klaus-Robert Müller,et al.  Incremental Support Vector Learning: Analysis, Implementation and Applications , 2006, J. Mach. Learn. Res..

[40]  Arlo Faria,et al.  MapReduce : Distributed Computing for Machine Learning , 2006 .

[41]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[42]  Message Passing Interface Forum MPI: A message - passing interface standard , 1994 .