Growth of relational model: Interdependence and complementary to big data

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has wit- nessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, Big-data. This paper aims to provide a complete model of a relational database that is still being widely used because of its well-known ACID properties namely, Atomicity, Consistency, Integrity and Durabil-ity. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by Big-Data techniques. Towards addressing the reason for this in- corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the un- structured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system.

[1]  Ion Stoica,et al.  BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.

[2]  Marcin Zukowski,et al.  Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..

[3]  Mohammed Badawy,et al.  An improved algorithm for database concurrency control , 2019 .

[4]  Adam Chlipala,et al.  A program optimization for automatic database result caching , 2017, POPL.

[5]  Sameh Elnikety,et al.  One-copy serializability with snapshot isolation under the hood , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[7]  Goetz Graefe,et al.  Sorting And Indexing With Partitioned B-Trees , 2003, CIDR.

[8]  Robert C. Goldstein,et al.  The MacAIMS data management system , 1970, SIGFIDET '70.

[9]  Gang Chen,et al.  Exploiting Single-Threaded Model in Multi-Core In-Memory Systems , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Marcos Didonet Del Fabro,et al.  Toward RDB to NoSQL: transforming data with metamorfose framework , 2019, SAC.

[11]  Andrew Pavlo,et al.  Mainlining Databases: Supporting Fast Transactional Workloads on Universal Columnar Data File Formats , 2020, Proc. VLDB Endow..

[12]  C. Scheidegger,et al.  Load-nGo : Fast Approximate Join Visualizations That Improve Over Time , 2017 .

[13]  S. Sudarshan,et al.  Incremental Organization for Data Recording and Warehousing , 1997, VLDB.

[14]  Olga Pons,et al.  Evaluation of Indexing Strategies for Possibilistic Queries Based on Indexing Techniques Available in Traditional RDBMS , 2016, Int. J. Intell. Syst..

[15]  Vagelis Hristidis,et al.  A Comparative Study of Secondary Indexing Techniques in LSM-based NoSQL Databases , 2018, SIGMOD Conference.

[16]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[17]  Nikos Mamoulis,et al.  A Two-level Spatial In-Memory Index , 2020, ArXiv.

[18]  Divesh Srivastava,et al.  Semantic Data Caching and Replacement , 1996, VLDB.

[19]  Juan Loaiza,et al.  Distributed Architecture of Oracle Database In-memory , 2015, Proc. VLDB Endow..

[20]  Graham Cormode,et al.  An improved data stream summary: the count-min sketch and its applications , 2004, J. Algorithms.

[21]  Michael Stonebraker,et al.  Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores , 2014, Proc. VLDB Endow..

[22]  Yuanyuan Tian,et al.  Hybrid Transactional/Analytical Processing: A Survey , 2017, SIGMOD Conference.

[23]  Parag S. Deshpande,et al.  Mining Query Plans for Finding Candidate Queries and Sub-Queries for Materialized Views in BI Systems Without Cube Generation , 2019, Comput. Informatics.

[24]  Anastasia Ailamaki,et al.  Designing Access Methods: The RUM Conjecture , 2016, EDBT.

[25]  Rohit Singh Inductive Learning-Based SPARQL Query Optimization , 2021 .

[26]  Peter Bumbulis,et al.  Towards Scalable Real-time Analytics: An Architecture for Scale-out of OLxP Workloads , 2015, Proc. VLDB Endow..

[27]  Jim Gray,et al.  A critique of ANSI SQL isolation levels , 1995, SIGMOD '95.

[28]  Johnny S. Wong,et al.  A Brief Review on Leading Big Data Models , 2014, Data Sci. J..

[29]  Jay Patel,et al.  Query Morphing: A Proximity-Based Data Exploration for Query Reformulation , 2019 .

[30]  Gang Chen,et al.  Cool, a COhort OnLine analytical processing system , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[31]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[32]  Manos Athanassoulis,et al.  Adaptive partitioning and indexing for in situ query processing , 2019, The VLDB Journal.

[33]  Michal Kvet,et al.  Comparison of query performance in relational a non-relation databases , 2019, Transportation Research Procedia.

[34]  Hamidah Ibrahim,et al.  Analyses of Indexing Techniques on Uncertain Data With High Dimensionality , 2020, IEEE Access.

[35]  Yasin N. Silva,et al.  SQL: From Traditional Databases to Big Data , 2016, SIGCSE.

[36]  Deep Ganguli,et al.  Druid: a real-time analytical data store , 2014, SIGMOD Conference.

[37]  Burton S. Kaliski,et al.  Moore's Law , 2005, Encyclopedia of Cryptography and Security.

[38]  Eddie Kohler,et al.  Opportunities for optimism in contended main-memory multicore transactions , 2020, The VLDB Journal.

[39]  Xiaoou Li,et al.  A dynamic vertical partitioning approach for distributed database system , 2011, 2011 IEEE International Conference on Systems, Man, and Cybernetics.

[40]  Liwen Sun Skipping-oriented Data Design for Large-Scale Analytics , 2017 .

[41]  Daniel Deutch,et al.  Break It Down: A Question Understanding Benchmark , 2020, TACL.

[42]  Michael Stonebraker,et al.  The VoltDB Main Memory DBMS , 2013, IEEE Data Eng. Bull..

[43]  Liwen Sun,et al.  Skipping-oriented Partitioning for Columnar Layouts , 2016, Proc. VLDB Endow..

[44]  Andrew Pavlo,et al.  Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads , 2016, SIGMOD Conference.

[45]  Pengfei Zheng Artificial Intelligence for Understanding Large and Complex Datacenters , 2020 .

[46]  Anastasia Ailamaki,et al.  H2O: a hands-free adaptive store , 2014, SIGMOD Conference.

[47]  Tony Savor,et al.  Optimizing Space Amplification in RocksDB , 2017, CIDR.

[48]  Alan R. Hevner,et al.  A Guide to Performance Evaluation of Database Systems. , 1984 .

[49]  Marie-Anne Neimat,et al.  Oracle TimesTen: An In-Memory Database for Enterprise Applications , 2013, IEEE Data Eng. Bull..

[50]  Miguel Castro,et al.  Fast General Distributed Transactions with Opacity using Global Time , 2020, ArXiv.

[51]  T. H. Nelson,et al.  Complex information processing: a file structure for the complex, the changing and the indeterminate , 1965, ACM '65.

[52]  Alexander Zeier,et al.  HYRISE - A Main Memory Hybrid Storage Engine , 2010, Proc. VLDB Endow..

[53]  Shang Gao,et al.  Online Adaptive Approximate Stream Processing With Customized Error Control , 2019, IEEE Access.

[54]  Lei Gao,et al.  Serving large-scale batch computed data with project Voldemort , 2012, FAST.

[55]  Lars Lundberg,et al.  Performance Evaluation of SQL and NoSQL Database Management Systems in a Cluster , 2017 .

[56]  Craig Freedman,et al.  Hekaton: SQL server's memory-optimized OLTP engine , 2013, SIGMOD '13.

[57]  J. T. Robinson,et al.  On optimistic methods for concurrency control , 1979, TODS.

[58]  Mohammad Sadoghi,et al.  Transaction Processing on Modern Hardware , 2019, Synthesis Lectures on Data Management.

[59]  Daniel J. Abadi,et al.  Design Principles for Scaling Multi-core OLTP Under High Contention , 2015, SIGMOD Conference.

[60]  Michael J. Cahill Serializable isolation for snapshot databases , 2009, TODS.

[61]  Manos Athanassoulis,et al.  Design Tradeoffs of Data Access Methods , 2016, SIGMOD Conference.

[62]  Kimberly Keeton,et al.  Order-Preserving Key Compression for In-Memory Search Trees , 2020, SIGMOD Conference.

[63]  Radu Stoica,et al.  Identifying hot and cold data in main-memory databases , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[64]  Aoying Zhou,et al.  Adaptive Optimistic Concurrency Control for Heterogeneous Workloads , 2019, Proc. VLDB Endow..

[65]  Tim Kraska,et al.  Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views , 2015, Proc. VLDB Endow..

[66]  P. Flajolet,et al.  HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm , 2007 .

[67]  Ciprian-Octavian Truica,et al.  Building an Inverted Index at the DBMS Layer for Fast Full Text Search , 2017 .

[68]  Anastasia Ailamaki,et al.  The Case For Heterogeneous HTAP , 2017, CIDR.

[69]  João Paulo,et al.  HTAPBench: Hybrid Transactional and Analytical Processing Benchmark , 2017, ICPE.

[70]  Rohith Kumar Kurella,et al.  Systematic Literature Review : Cost Estimation in Relational Databases , 2018 .

[71]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[72]  Mustafa Canim,et al.  L-Store: A Real-time OLTP and OLAP System , 2016, EDBT.

[73]  Peter Boncz,et al.  Tree-Encoded Bitmaps , 2020, SIGMOD Conference.

[74]  Jan Lindström,et al.  IBM solidDB: In-Memory Database Optimized for Extreme Speed and Availability , 2013, IEEE Data Eng. Bull..

[75]  An Gong,et al.  Clustering-Based Dynamic Materialized View Selection Algorithm , 2008, 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery.

[76]  Jeffrey F. Naughton,et al.  Efficient Sampling Strategies for Relational Database Operations , 1993, Theor. Comput. Sci..

[77]  Surajit Chaudhuri,et al.  Bitvector-aware Query Optimization for Decision Support Queries , 2020, SIGMOD Conference.

[78]  Jialiang Li,et al.  Pinot: Realtime OLAP for 530 Million Users , 2018, SIGMOD Conference.

[79]  Reza Sherkat,et al.  Native Store Extension for SAP HANA , 2019, Proc. VLDB Endow..

[80]  Hamid Pirahesh,et al.  Evolving Databases for New-Gen Big Data Applications , 2017, CIDR.

[81]  Alvin Cheung,et al.  Sloth: being lazy is a virtue (when issuing database queries) , 2014, SIGMOD Conference.

[82]  Sam Lightstone,et al.  DB2 with BLU Acceleration: So Much More than Just a Column Store , 2013, Proc. VLDB Endow..

[83]  Xiongpai Qin,et al.  DB Facade: A Web Cache with Improved Data Freshness , 2009, 2009 Second International Symposium on Electronic Commerce and Security.

[84]  Marko Vukolic,et al.  DiNoDB: Efficient Large-Scale Raw Data Analytics , 2014, Data4U '14.

[85]  F. E.,et al.  A Relational Model of Data Large Shared Data Banks , 2000 .

[86]  Inder Monga,et al.  Lambda architecture for cost-effective batch and speed big data processing , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[87]  Peter Scheuermann,et al.  Active Database Systems , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[88]  Muhammad A. Awad,et al.  Engineering a high-performance GPU B-Tree , 2019, PPoPP.

[89]  Ahmed E. Hassan,et al.  CacheOptimizer: helping developers configure caching frameworks for hibernate-based database-centric web applications , 2016, SIGSOFT FSE.

[90]  Doron Rotem,et al.  Random sampling from databases: a survey , 1995 .

[91]  Ramakrishna Varadarajan,et al.  The Vertica Analytic Database: C-Store 7 Years Later , 2012, Proc. VLDB Endow..

[92]  Qi Huang,et al.  Gorilla: A Fast, Scalable, In-Memory Time Series Database , 2015, Proc. VLDB Endow..

[93]  Chen Luo,et al.  LSM-based storage techniques: a survey , 2018, The VLDB Journal.

[94]  Xiaoyi Lu,et al.  CirroData: Yet Another SQL-on-Hadoop Data Analytics Engine with High Performance , 2020, Journal of Computer Science and Technology.

[95]  Anastasia Ailamaki,et al.  Clotho: Decoupling memory page layout from storage organization , 2004, VLDB.

[96]  Timos K. Sellis,et al.  Efficient Cost Models for Spatial Queries Using R-Trees , 2000, IEEE Trans. Knowl. Data Eng..

[97]  Jiaheng Lu,et al.  Holistic evaluation in multi-model databases benchmarking , 2019, Distributed and Parallel Databases.

[98]  Paolo Atzeni,et al.  Data modeling in the NoSQL world , 2016, Comput. Stand. Interfaces.

[99]  Allan Kuchinsky,et al.  Integrating user-perceived quality into Web server design , 2000, Comput. Networks.

[100]  George Strawn,et al.  Relational Databases: Codd, Stonebraker, and Ellison , 2016, IT Professional.

[101]  Martin Boissier,et al.  Improving tuple reconstruction for tiered column stores: a workload-aware ansatz based on table reordering , 2017, ACSW.

[102]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.