Big Data Management Challenges, Approaches, Tools and their limitations

Big Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data.

[1]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[2]  Jon Hemmerdinger Age of the algorithm : technology already exists to determine which aircraft and components require attention even before they are due, but airlines have barely begun to harness the power of big data , 2017 .

[3]  Michael Stonebraker,et al.  MapReduce and parallel DBMSs: friends or foes? , 2010, CACM.

[4]  Leah Hoffmann Looking back at big data , 2013, CACM.

[5]  David J. DeWitt,et al.  The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.

[6]  M. Balazinska,et al.  An analysis of Hadoop usage in scientific workloads , 2013 .

[7]  Jorge-Arnulfo Quiané-Ruiz,et al.  Efficient Big Data Processing in Hadoop MapReduce , 2012, Proc. VLDB Endow..

[8]  John Langford Parallel machine learning on big data , 2012, XRDS.

[9]  Jairam Chandar Join Algorithms using Map/Reduce , 2010 .

[10]  Oracle Oracle NoSQL Database Compared to MongoDB Overview , .

[11]  Parag Agrawal,et al.  Trio: a system for data, uncertainty, and lineage , 2006, VLDB.

[12]  Jeffrey D. Ullman Designing good MapReduce algorithms , 2012, XRDS.

[13]  Purnamrita Sarkar,et al.  The Big Data Bootstrap , 2012, ICML.

[14]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[15]  Peter Z. Kunszt,et al.  The SDSS skyserver: public access to the sloan digital sky server data , 2001, SIGMOD '02.

[16]  Mirek Riedewald,et al.  Processing theta-joins using MapReduce , 2011, SIGMOD '11.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  F'ieee Nick Cercone What's the big deal about big data? , 2015 .

[19]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[20]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[21]  Chen Li,et al.  Big data platforms: What's next? , 2012, XRDS.

[22]  Keith W. Miller,et al.  Big Data: New Opportunities and New Challenges [Guest editors' introduction] , 2013, Computer.

[23]  Rebecca E. Grinter A big data confession , 2013, INTR.

[24]  C. Mohan History repeats itself: sensible and NonsenSQL aspects of the NoSQL hoopla , 2013, EDBT '13.

[25]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[26]  Beng Chin Ooi,et al.  Distributed data management using MapReduce , 2014, CSUR.

[27]  Sean D Dessureault,et al.  Understanding big data , 2016 .

[28]  GhemawatSanjay,et al.  The Google file system , 2003 .

[29]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[30]  Ryan Johnson,et al.  Here are my Data Files. Here are my Queries. Where are my Results? , 2011, CIDR.

[31]  Eike Schallehn,et al.  Cloud Data Management: A Short Overview and Comparison of Current Approaches , 2012, Grundlagen von Datenbanken.

[32]  Martin L. Kersten,et al.  The researcher's guide to the data deluge , 2011, Proc. VLDB Endow..

[33]  Surajit Chaudhuri,et al.  What next?: a half-dozen data management research goals for big data and the cloud , 2012, PODS.

[34]  Edward Y. Chang,et al.  Data management projects at Google , 2008, SGMD.