Big Data Analysis in Cloud and Machine Learning

In today’s digital universe, the amount of digital data that exists is growing at an exponential rate. Data is considered to be the lifeblood for any business organization, as it is the data that streams into actionable insights of businesses. The data available with the organizations are so much in volume that it is popularly referred as big data. It is the hottest buzzword spanning the business and technology worlds. Economies over the world is using big data and big data analytics as a new frontier for business so as to plan smarter business moves, improve productivity, improve performance, and plan strategy more effectively. To make big data analytics effective, storage technologies, and analytical tools play a critical role. However, it is evident that big data places rigorous demands on networks, storage and servers, which has motivated organizations and enterprises to move on cloud, in order to harvest maximum benefits of the available big data. Furthermore, we are also aware that conventional analytics tools are incapable to capture the full value of big data. Hence, machine learning seems to be an ideal solution for exploiting the opportunities hidden in big data. In this chapter, we shall discuss big data and big data analytics with a special focus in cloud computing and machine learning.

[1]  Tsung-Han Tsai,et al.  Exploring Contextual Redundancy in Improving Object-Based Video Coding for Video Sensor Networks Surveillance , 2012, IEEE Transactions on Multimedia.

[2]  M. Anusha,et al.  Big Data-Survey , 2016 .

[3]  Carlos Castillo,et al.  Effective web crawling , 2005, SIGF.

[4]  Paul Zikopoulos,et al.  Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data , 2011 .

[5]  François Ingelrest,et al.  SensorScope: Out-of-the-Box Environmental Monitoring , 2008, 2008 International Conference on Information Processing in Sensor Networks (ipsn 2008).

[6]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[7]  Alexander G. Gray,et al.  On-line anomaly detection of deployed software: a statistical machine learning approach , 2006, SOQUA '06.

[8]  Amin Vahdat,et al.  Helios: a hybrid electrical/optical switch architecture for modular data centers , 2010, SIGCOMM '10.

[9]  Isabella Cerutti,et al.  Energy-Efficient Design of a Scalable Optical Multiplane Interconnection Architecture , 2011, IEEE Journal of Selected Topics in Quantum Electronics.

[10]  John A. Stankovic,et al.  LUSTER: wireless sensor network for environmental research , 2007, SenSys '07.

[11]  Stefan Poslad,et al.  Multi-Disciplinary Approaches to Intelligently Sharing Large-Volumes of Real-Time Sensor Data During Natural Disasters , 2013, Data Sci. J..

[12]  Atul Singh,et al.  Proteus: a topology malleable data center network , 2010, Hotnets-IX.

[13]  Anupam Joshi,et al.  On Using a Warehouse to Analyze Web Logs , 2003, Distributed and Parallel Databases.

[14]  Keith Gordon,et al.  What is Big Data , 2013 .

[15]  Eric A. Brewer,et al.  Towards robust distributed systems (abstract) , 2000, PODC '00.

[16]  Konstantina Papagiannaki,et al.  c-Through: part-time optics in data centers , 2010, SIGCOMM '10.

[17]  J. Alberto Espinosa,et al.  Big Data: Issues and Challenges Moving Forward , 2013, 2013 46th Hawaii International Conference on System Sciences.

[18]  Roberto Proietti,et al.  DOS - A scalable optical switch for datacenters , 2010, 2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[19]  Michael Minelli,et al.  Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses , 2012 .

[20]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[21]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[22]  Yannis Manolopoulos,et al.  Indexing web access-logs for pattern queries , 2002, WIDM '02.

[23]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[24]  Howard Gobioff,et al.  The Google file system , 2003, SOSP '03.

[25]  C. Bail Taming Big Data , 2017 .

[26]  Rajkumar Buyya,et al.  Big Data computing and clouds: Trends and future directions , 2013, J. Parallel Distributed Comput..

[27]  John Gantz,et al.  The Digital Universe in 2020: Big Data, Bigger Digital Shadows, and Biggest Growth in the Far East , 2012 .

[28]  Natalya F. Noy,et al.  Semantic integration: a survey of ontology-based approaches , 2004, SGMD.

[29]  Rajkumar Buyya,et al.  The anatomy of big data computing , 2015, Softw. Pract. Exp..

[30]  Mohd Norzali Haji Mohd,et al.  Data pre-processing on web server logs for generalized association rules mining algorithm , 2008 .

[31]  Imad Aad,et al.  The Mobile Data Challenge: Big Data for Mobile Computing Research , 2012 .

[32]  Anthony J. G. Hey,et al.  The Fourth Paradigm: Data-Intensive Scientific Discovery [Point of View] , 2011 .

[33]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[34]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[35]  J. Anderson,et al.  IP over SONET , 1998 .

[36]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[37]  Nasir Ghani,et al.  On IP-over-WDM integration , 2000, IEEE Commun. Mag..

[38]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[39]  Gregor von Bochmann,et al.  Crawling rich internet applications: the state of the art , 2012, CASCON.

[40]  Xiaoyong Du,et al.  Big data challenge: a data management perspective , 2013, Frontiers of Computer Science.

[41]  David Salomon,et al.  Data Compression , 2000, Springer Berlin Heidelberg.

[42]  Dan Pritchett,et al.  BASE: An Acid Alternative , 2008, ACM Queue.

[43]  Rami G. Melhem,et al.  Applying statistical machine learning to multicore voltage & frequency scaling , 2010, Conf. Computing Frontiers.

[44]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[45]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[46]  Chen Li,et al.  Big data platforms: What's next? , 2012, XRDS.

[47]  Andrian Marcus,et al.  Data Cleansing: Beyond Integrity Analysis 1 , 2000 .

[48]  Xinqi Wang,et al.  Semantically-aware data discovery and placement in collaborative computing environments , 2012 .

[49]  G. Parra,et al.  Mayer Schönberger, Viktor; Cukier, Kenneth. Big Data: A Revolution That Will Transform How We Live, Work and Think. London: John Murray, 2013 , 2015 .

[50]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[51]  Randal E. Bryant,et al.  Data-Intensive Scalable Computing for Scientific Applications , 2011, Computing in Science & Engineering.

[52]  Rajkumar Buyya,et al.  Article in Press Future Generation Computer Systems ( ) – Future Generation Computer Systems Cloud Computing and Emerging It Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th Utility , 2022 .

[53]  Jim Gray,et al.  The Transaction Concept: Virtues and Limitations (Invited Paper) , 1981, VLDB.

[54]  Avinash Karanth Kodi,et al.  Energy-Efficient and Bandwidth-Reconfigurable Photonic Networks for High-Performance Computing (HPC) Systems , 2011, IEEE Journal of Selected Topics in Quantum Electronics.

[55]  L. Nelson Data, data everywhere. , 1997, Critical care medicine.

[56]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[57]  Erik Meijer The world according to LINQ , 2011, CACM.