Machine Learning and Big Data Processing: A Technological Perspective and Review

This paper discusses the role of Machine Learning (ML) based algorithms and methods in Big Data Processing & Analytics (BDA). ML and BDA are both evolutionary fields of computing and the developments in these fields are complementing each other. The ever changing data landscape in modern digital world have resulted in newer ways of data processing frameworks in order to get meaningful insights which are unprecedented. This paper presents a detailed review on latest developments in ML algorithms for Big Data Processing. In later section key challenges associated with application of ML based approaches are also discussed. ML based Big Data Processing has gained popularity and new developments are on the rise for efficient data processing. This field is witnessing unparalleled emergence of new methods and approaches for efficient data processing in order to discover interestingness for decision making. Thus, more and more ML based data processing approaches are being used for Big Data Processing. With the splurge data from different newer sources, heterogeneous nature of data, uncertain & unstructured data, the so called Big Data with all its characteristics (5 Vs) there is an ever increasing need to use approaches which aid in modelling and processing of these data, provide automated approach to data processing and so on. These type of new processing requirements have given a big boost to the development of new ML based methods for managing & processing them. The paper will be useful to the scholars who are researching in this interesting & challenging domain of ML and Big Data Processing.

[1]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[2]  Kathleen Martin,et al.  The Learning Machines. , 1981 .

[3]  Markus Reischl,et al.  Data mining tools , 2011, WIREs Data Mining Knowl. Discov..

[4]  Shen Furao,et al.  An online incremental learning support vector machine for large-scale data , 2011, Neural Computing and Applications.

[5]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[6]  Taghi M. Khoshgoftaar,et al.  A Multi-dimensional Comparison of Toolkits for Machine Learning with Big Data , 2015, 2015 IEEE International Conference on Information Reuse and Integration.

[7]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[9]  Eamonn J. Keogh,et al.  Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping , 2013, TKDD.

[10]  J. Langford Tutorial on Practical Prediction Theory for Classification , 2005, J. Mach. Learn. Res..

[11]  Yiu-ming Cheung,et al.  Discretizing Numerical Attributes in Decision Tree for Big Data Analysis , 2014, 2014 IEEE International Conference on Data Mining Workshop.

[12]  Weiyi Liu,et al.  A Parallel and Incremental Approach for Data-Intensive Learning of Bayesian Networks , 2015, IEEE Transactions on Cybernetics.

[13]  Dominik Ryzko,et al.  Multi-agent Architecture for Real-Time Big Data Processing , 2014, 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT).

[14]  Feiping Nie,et al.  Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Multi-View K-Means Clustering on Big Data , 2022 .

[15]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[16]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[17]  José M. F. Moura,et al.  Big Data Analysis with Signal Processing on Graphs: Representation and processing of massive data sets with irregular structure , 2014, IEEE Signal Processing Magazine.

[18]  Xavier Amatriain,et al.  Mining large streams of user data for personalized recommendations , 2013, SKDD.

[19]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[20]  S. Mahadevan,et al.  Solving Semi-Markov Decision Problems Using Average Reward Reinforcement Learning , 1999 .

[21]  Dharma P. Agrawal,et al.  Markov chain existence and Hidden Markov models in spectrum sensing , 2009, 2009 IEEE International Conference on Pervasive Computing and Communications.

[22]  Fujimaki Ryohei,et al.  The Most Advanced Data Mining of the Big Data Era , 2012 .

[23]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[24]  Yueming Cai,et al.  A learner based on neural network for cognitive radio , 2010, 2010 IEEE 12th International Conference on Communication Technology.

[25]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[26]  Verónica Bolón-Canedo,et al.  Data discretization: taxonomy and big data challenge , 2016, WIREs Data Mining Knowl. Discov..

[27]  Lei Cao,et al.  Online Outlier Exploration Over Large Datasets , 2015, KDD.

[28]  Karin M. Verspoor,et al.  Evaluation of a Machine Learning Duplicate Detection Method for Bioinformatics Databases , 2015, DTMBIO@CIKM.

[29]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[30]  Sudharman K. Jayaweera,et al.  Multidimensional Dirichlet Process-Based Non-Parametric Signal Classification for Autonomous Self-Learning Cognitive Radios , 2013, IEEE Transactions on Wireless Communications.

[31]  Vijay Srinivas Agneeswaran Big-Data - Theoretical, Engineering and Analytics Perspective , 2012, BDA.

[32]  Ali El-Hajj,et al.  Cognitive Radio Transceivers: RF, Spectrum Sensing, and Learning Algorithms Review , 2014 .

[33]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[34]  Francisco Herrera,et al.  Big data preprocessing: methods and prospects , 2016 .

[35]  Nicola Jones,et al.  Computer science: The learning machines , 2014, Nature.

[36]  Emc Education Services Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data , 2015 .

[37]  Ulf Blanke,et al.  Combining crowd-generated media and personal data: semi-supervised learning for context recognition , 2013, PDM '13.