Machine Learning and Statistical Approaches for Big Data : Issues , Challenges and Research Directions

Today, as we are observing, massive sized and complex structured data is becoming available from variety of diverse sources, organizations are making attempt to utilize these plentiful resources for the purpose of enhance innovation, increase decisional and operational efficiency. Machine learning is a kind of artificial intelligence method to discover knowledge for making intelligent decisions. Big Data has vast impacts on scientific discoveries and value creation. This paper presents an extensive literature study and review of latest advances, developments and new methodologies in researches on machine learning for processing big data. We have discussed various types of data types, learning methods, vital issues in big data processing and application of machine learning approaches in big data. Finally, we have outlined some open problems in this domain and our further research aims and directions.

[1]  I. Song,et al.  Analytics over large-scale multidimensional data: the big data revolution! , 2011, DOLAP '11.

[2]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[3]  Shiliang Sun,et al.  Cross-domain representation-learning framework with combination of class-separate and domain-merge objectives , 2012, CDKD '12.

[4]  Chengqi Zhang,et al.  Active Learning without Knowing Individual Instance Labels: A Pairwise Label Homogeneity Query Approach , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Shahriar Akter,et al.  How ‘Big Data’ Can Make Big Impact: Findings from a Systematic Review and a Longitudinal Case Study , 2015 .

[6]  Marios D. Dikaiakos,et al.  Cloud Computing: Distributed Internet Computing for IT and Scientific Research , 2009, IEEE Internet Computing.

[7]  Nlp Lab Multi-Domain Sentiment Classification with Classifier Combination , 2011 .

[8]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[9]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[10]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Farzaneh Farhangmehr Statistical Approaches for Big Data Analytics and Machine Learning : Data-Driven Network Reconstruction and Predictive Modeling of Time Series Biological Systems , 2014 .

[12]  Z. Irani,et al.  Critical analysis of Big Data challenges and analytical methods , 2017 .

[13]  Jiang Gui,et al.  A Robust Multifactor Dimensionality Reduction Method for Detecting Gene–Gene Interactions with Application to the Genetic Analysis of Bladder Cancer Susceptibility , 2011, Annals of human genetics.

[14]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[15]  Honggang Wang,et al.  A survey of big data research , 2015, IEEE Network.

[16]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[17]  Lin Li,et al.  Spatial coding-based approach for partitioning big spatial data in Hadoop , 2017, Comput. Geosci..

[18]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[19]  Nesime Tatbul,et al.  Streaming data integration: Challenges and opportunities , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[20]  Luis Ramirez,et al.  Big Data Analysis Using Modern Statistical and Machine Learning Methods in Medicine , 2014, International neurourology journal.

[21]  Seref Sagiroglu,et al.  A survey on security and privacy issues in big data , 2015, 2015 10th International Conference for Internet Technology and Secured Transactions (ICITST).

[22]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[23]  Johnny S. Wong,et al.  A Brief Review on Leading Big Data Models , 2014, Data Sci. J..

[24]  Melnned M. Kantardzic Big Data Analytics , 2013, Lecture Notes in Computer Science.

[25]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[26]  Pengtao Xie,et al.  Strategies and Principles of Distributed Machine Learning on Big Data , 2015, ArXiv.

[27]  Fei Huang,et al.  Exploring Representation-Learning Approaches to Domain Adaptation , 2010 .

[28]  Guang-Bin Huang,et al.  A Fast SVD-Hidden-nodes based Extreme Learning Machine for Large-Scale Data Analytics , 2016, Neural Networks.

[29]  Victor C. M. Leung,et al.  Big Data: Related Technologies, Challenges and Future Prospects , 2014 .

[30]  Paul Mineiro,et al.  Machine learning on Big Data , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[31]  Daniel A. Keim,et al.  Visual analytics for the big data era — A comparative review of state-of-the-art commercial systems , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[32]  Bertha Guijarro-Berdiñas,et al.  A survey of methods for distributed machine learning , 2012, Progress in Artificial Intelligence.

[33]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[34]  N. B. Anuar,et al.  The rise of "big data" on cloud computing: Review and open research issues , 2015, Inf. Syst..

[35]  Murtaza Haider,et al.  Beyond the hype: Big data concepts, methods, and analytics , 2015, Int. J. Inf. Manag..

[36]  Jennifer Widom,et al.  Challenges and Opportunities with Big Data 2012-2 , 2011 .

[37]  Wen Ji,et al.  Divide-and-conquer signal processing, feature extraction, and machine learning for big data , 2016, Neurocomputing.

[38]  Jacky Akoka,et al.  Research on Big Data - A systematic mapping study , 2017, Comput. Stand. Interfaces.

[39]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[40]  Dong Yu,et al.  Deep Learning and Its Applications to Signal and Information Processing , 2011 .

[41]  S. Agatonovic-Kustrin,et al.  Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. , 2000, Journal of pharmaceutical and biomedical analysis.

[42]  Junhong Wang,et al.  Dynamic extreme learning machine for data stream classification , 2017, Neurocomputing.

[43]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[44]  Peter Kulchyski and , 2015 .

[45]  Zdzislaw Pawlak,et al.  Information systems theoretical foundations , 1981, Inf. Syst..

[46]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[47]  Qiang Yang,et al.  Bridging Domains Using World Wide Knowledge for Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[48]  Jorge Bernardino,et al.  Big Data Open Source Platforms , 2015, 2015 IEEE International Congress on Big Data.

[49]  Matthew Self,et al.  Bayesian Classification , 1988, AAAI.

[50]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[51]  E. A. Mary Anita,et al.  A Survey of Big Data Analytics in Healthcare and Government , 2015 .

[52]  Seref Sagiroglu,et al.  Big data: A review , 2013, 2013 International Conference on Collaboration Technologies and Systems (CTS).

[53]  Derek C. Rose,et al.  Deep Machine Learning - A New Frontier in Artificial Intelligence Research [Research Frontier] , 2010, IEEE Computational Intelligence Magazine.

[54]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[55]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[56]  Yichuan Wang,et al.  An integrated big data analytics-enabled transformation model: Application to health care , 2018, Inf. Manag..

[57]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[58]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[59]  Zhiyong Peng,et al.  From Big Data to Big Data Mining: Challenges, Issues, and Opportunities , 2013, DASFAA Workshops.

[60]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[61]  Yichuan Wang,et al.  Exploring the path to big data analytics success in healthcare , 2017 .

[62]  Mohammad Ali Nematbakhsh,et al.  A Survey on Security Issues in Big Data and NoSQL , 2015 .