Big Data Analytics = Machine Learning + Cloud Computing

“Big Data” can mean different things to different people. The scale and challenges of Big Data are often described using three attributes, namely volume, velocity, and variety (3Vs), which only reflect some of the aspects of data. In this chapter, we review historical aspects of the term “big data” and the associated analytics. We augment the 3Vs with additional attributes of big data to make it more comprehensive and relevant. We show that Big Data is not just the 3Vs, but actually 3 2 Vs; that is, 9Vs covering the fundamental motivation behind Big Data, which is to incorporate business intelligence based on different hypothesis or statistical models so that Big Data analytics (BDA) can enable decision makers to make useful predictions for making some crucial decisions or researching results. History of Big Data has demonstrated that the most cost-effective way of performing BDA is to employ machine learning (ML) on the cloud computing (CC)-based infrastructure or simply, ML + CC → BDA. This chapter is devoted to help decision makers by defining BDA as a solution and opportunity to address their business needs.

[1]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[2]  Bruce Ratner,et al.  Statistical Modeling and Analysis for Database Marketing: Effective Techniques for Mining Big Data , 2003 .

[3]  Patrick Wendell,et al.  Learning Spark: Lightning-Fast Big Data Analytics , 2015 .

[4]  Ee-Peng Lim,et al.  Computational Trust Models and Machine Learning , 2014 .

[5]  Francis X. Diebold,et al.  A Personal Perspective on the Origin(s) and Development of 'Big Data': The Phenomenon, the Term, and the Discipline, Second Version , 2012 .

[6]  Rajkumar Buyya,et al.  Cloud Data Centers and Cost Modeling: A Complete Guide To Planning, Designing and Building a Cloud Data Center , 2015 .

[7]  T. Tweed,et al.  Crossing and Dwelling: A Theory of Religion , 2006 .

[8]  Joe Celko Joe Celko's Complete Guide to NoSQL: What Every SQL Professional Needs to Know about Non-Relational Databases , 2013 .

[9]  Rajkumar Buyya,et al.  Mastering Cloud Computing: Foundations and Applications Programming , 2013 .

[10]  Zakir Laliwala,et al.  Web Crawling and Data Mining with Apache Nutch , 2013 .

[11]  Yadira Espinal Viktor Mayer-Schonberger and Kenneth Cukier, Big Data: A Revolution That Will Transform How We Live, Work and Think , 2013 .

[12]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[13]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[14]  Antis Loizides Mill's A system of logic : critical appraisals , 2014 .

[15]  Charles Anderson,et al.  The end of theory: The data deluge makes the scientific method obsolete , 2008 .

[16]  Eric Gossett,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2015 .

[17]  M. Kelly The Art of Thinking , 1962 .

[18]  Irving M. Copi,et al.  Introduction to Logic , 1962 .

[19]  Y. de Montjoye,et al.  Unique in the shopping mall: On the reidentifiability of credit card metadata , 2015, Science.

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[21]  Frank J. Ohlhorst Big Data Analytics: Turning Big Data into Big Money , 2012 .

[22]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[23]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[24]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[25]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[26]  GhemawatSanjay,et al.  The Google file system , 2003 .

[27]  Noam Nisan,et al.  The Elements of Computing Systems - Building a Modern Computer from First Principles , 2005 .

[28]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[29]  D. Boyd,et al.  CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[30]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[31]  Rolf Dobelli The Art of Thinking Clearly , 2013 .

[32]  Bernhard Knollenberg,et al.  : The Scholar and the Future of the Research Library: A Problem and Its Solution , 1945 .

[33]  C. J. Date SQL and Relational Theory - How to Write Accurate SQL Code, Second Edition , 2012, Theory in practice.

[34]  Edmund A. Mennis The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations , 2006 .

[35]  Peter Harrington,et al.  Machine Learning in Action , 2012 .

[36]  K. G. Srinivasa,et al.  Guide to High Performance Distributed Computing , 2015, Computer Communications and Networks.

[37]  R. Kitchin,et al.  Big Data, new epistemologies and paradigm shifts , 2014, Big Data Soc..

[38]  Donald K. Wedding,et al.  Discovering Knowledge in Data, an Introduction to Data Mining , 2005, Inf. Process. Manag..

[39]  Dan Sullivan NoSQL for Mere Mortals , 2015 .

[40]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[41]  David Ellsworth,et al.  Application-controlled demand paging for out-of-core visualization , 1997, Proceedings. Visualization '97 (Cat. No. 97CB36155).

[42]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[43]  James Surowiecki The wisdom of crowds: Why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations Doubleday Books. , 2004 .

[44]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[45]  Viktor Mayer-Schnberger,et al.  Big Data: A Revolution That Will Transform How We Live, Work, and Think , 2013 .

[46]  Cees T. A. M. de Laat,et al.  Defining architecture components of the Big Data Ecosystem , 2014, 2014 International Conference on Collaboration Technologies and Systems (CTS).

[47]  Volker Markl,et al.  Iterative parallel data processing with stratosphere: an inside look , 2013, SIGMOD '13.

[48]  Martin S. Fridson,et al.  Memoirs of Extraordinary Popular Delusions and the Madness of Crowds , 2019 .

[49]  Madhu Siddalingaiah,et al.  Pro Apache Hadoop , 2014, Apress.

[50]  D. Lazer,et al.  The Parable of Google Flu: Traps in Big Data Analysis , 2014, Science.