The NoSQL Toolbox: The NoSQL Landscape in a Nutshell

In this chapter, we highlight the design space of distributed database systems, dividing it by the four dimensions sharding, replication, storage management, and query processing. The goal is to provide a comprehensive set of data management requirements that have to be considered for designing a flexible backend for globally distributed web applications. Therefore, we survey the implementation techniques of systems and discuss how they are related to different functional and non-functional properties (goals) of data management systems.

[1]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[2]  Christopher Frost,et al.  Spanner: Google's Globally-Distributed Database , 2012, OSDI.

[3]  James R. Hamilton,et al.  On Designing and Deploying Internet-Scale Services , 2007, LISA.

[4]  Peter Bailis,et al.  The network is reliable , 2014 .

[5]  Ji Huang,et al.  Schema-Agnostic Indexing with Azure DocumentDB , 2015, Proc. VLDB Endow..

[6]  Norbert Ritter,et al.  Real-Time Data Management for Big Data , 2018, EDBT.

[7]  Tim Kraska,et al.  MDCC: multi-data center consistency , 2012, EuroSys '13.

[8]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[9]  Marc Shapiro,et al.  A comprehensive study of Convergent and Commutative Replicated Data Types , 2011 .

[10]  Ju Wang,et al.  Windows Azure Storage: a highly available cloud storage service with strong consistency , 2011, SOSP.

[11]  Jeff Carpenter,et al.  Cassandra: The Definitive Guide , 2010 .

[12]  Norbert Ritter,et al.  Datenbanksysteme für Business, Technologie und Web (BTW 2019), 18. Fachtagung des GI-Fachbereichs „Datenbanken und Informationssysteme" (DBIS), 4.-8. März 2019, Rostock, Germany, Workshopband , 2019, BTW.

[13]  Qiong Luo,et al.  Caching and Materialization for Web Databases , 2009, Found. Trends Databases.

[14]  Norbert Ritter,et al.  Scalable data management: NoSQL data stores in research and practice , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[15]  Michael Stonebraker,et al.  Distributed query processing in a relational data base system , 1978, SIGMOD Conference.

[16]  Jimmy J. Lin,et al.  Summingbird: A Framework for Integrating Batch and Online MapReduce Computations , 2014, Proc. VLDB Endow..

[17]  Dennis Shasha,et al.  The dangers of replication and a solution , 1996, SIGMOD '96.

[18]  Norbert Ritter,et al.  Skalierbare NoSQL- und Cloud-Datenbanken in Forschung und Praxis , 2015, BTW Workshops.

[19]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[20]  Keith Marzullo,et al.  Mencius: Building Efficient Replicated State Machine for WANs , 2008, OSDI.

[21]  Divyakant Agrawal,et al.  G-Store: a scalable data store for transactional multi key access in the cloud , 2010, SoCC '10.

[22]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[23]  Michael J. Freedman,et al.  Don't settle for eventual: scalable causal consistency for wide-area storage with COPS , 2011, SOSP.

[24]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[25]  Carlo Curino,et al.  Relational Cloud: a Database Service for the cloud , 2011, CIDR.

[26]  Kristina Chodorow,et al.  MongoDB: The Definitive Guide , 2010 .

[27]  Marcos K. Aguilera,et al.  Transactional storage for geo-replicated systems , 2011, SOSP.

[28]  Hasso Plattner A Course in In-Memory Data Management , 2013 .

[29]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[30]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[31]  David Zhang,et al.  On brewing fresh espresso: LinkedIn's distributed data serving platform , 2013, SIGMOD '13.

[32]  Patrick Valduriez,et al.  Principles of Distributed Database Systems , 1990 .

[33]  Norbert Ritter,et al.  Scalable Data Management: An In-Depth Tutorial on NoSQL Data Stores , 2017, BTW.

[34]  Michael Stonebraker,et al.  The End of an Architectural Era (It's Time for a Complete Rewrite) , 2007, VLDB.

[35]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[36]  Yawei Li,et al.  Megastore: Providing Scalable, Highly Available Storage for Interactive Services , 2011, CIDR.

[37]  Philip A. Bernstein,et al.  Adapting microsoft SQL server for cloud computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[38]  Michael Stonebraker,et al.  H-store: a high-performance, distributed main memory transaction processing system , 2008, Proc. VLDB Endow..

[39]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[40]  Josiah L. Carlson,et al.  Redis in Action , 2013 .

[41]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[42]  Sang-Won Lee,et al.  SFS: random write considered harmful in solid state drives , 2012, FAST.