Automated Performance Management for the Big Data Stack

More than 10,000 enterprises worldwide today use the big data stack that is composed of multiple distributed systems. At Unravel, we have worked with a representative sample of these enterprises that covers most industry verticals. This sample also covers the spectrum of choices for deploying the big data stack across on-premises datacenters, private cloud deployments, public cloud deployments, and hybrid combinations of these. In this paper, we aim to bring attention to the performance management requirements that arise in big data stacks. We provide an overview of the requirements both at the level of individual applications as well as holistic clusters and workloads. We present an architecture that can provide automated solutions for these requirements and then do a deep dive into a few of these solutions.

[1]  Olga Papaemmanouil,et al.  WiSeDB: A Learning-based Workload Management Advisor for Cloud Databases , 2016, Proc. VLDB Endow..

[2]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[3]  Shivnath Babu,et al.  Tuning Database Configuration Parameters with iTuned , 2009, Proc. VLDB Endow..

[4]  Ioannis Konstantinou,et al.  Elastic management of cloud applications using adaptive reinforcement learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[5]  Olga Papaemmanouil,et al.  Releasing Cloud Databases for the Chains of Performance Prediction Models , 2017, CIDR.

[6]  Benjamin Letham,et al.  Forecasting at Scale , 2018, PeerJ Prepr..

[7]  Lin Ma,et al.  Self-Driving Database Management Systems , 2017, CIDR.

[8]  Shivnath Babu,et al.  Tempo: Robust and Self-Tuning Resource Management in Multi-tenant Parallel Databases , 2015, Proc. VLDB Endow..

[9]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[10]  Michael Stonebraker,et al.  The BigDAWG polystore system and architecture , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[11]  Shivnath Babu,et al.  Analyzing Query Performance and Attributing Blame for Contentions in a Cluster Computing Framework , 2017, ArXiv.

[12]  Herodotos Herodotou,et al.  Profiling, what-if analysis, and cost-based optimization of MapReduce programs , 2011, Proc. VLDB Endow..

[13]  Carlo Curino,et al.  Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.

[14]  Divyakant Agrawal,et al.  Characterizing tenant behavior for placement and crisis mitigation in multitenant DBMSs , 2013, SIGMOD '13.

[15]  Kevin Wilkinson,et al.  VQA: vertica query analyzer , 2014, SIGMOD Conference.

[16]  Kevin Wilkinson,et al.  HFMS: Managing the lifecycle and complexity of hybrid analytic data flows , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[17]  Shivnath Babu,et al.  Guided Problem Diagnosis through Active Learning , 2008, 2008 International Conference on Autonomic Computing.

[18]  Magdalena Balazinska,et al.  PerfEnforce Demonstration: Data Analytics with Performance Guarantees , 2016, SIGMOD Conference.