Delivering software with agility and quality in a cloud environment

agility and quality in a cloud environment F. Oliveira T. Eilam P. Nagpurkar C. Isci M. Kalantar W. Segmuller E. Snible Cloud computing and the DevOps movement are two pillars that facilitate software delivery with extreme agility. “Born on the cloud” companies, such as Netflix®, have demonstrated rapid growth to their business and continuous improvement to the service they provide, by reportedly applying DevOps principles. In this paper, we claim that to fulfill the vision of fast software delivery, without compromising the quality of the provided services, we need a new approach to detecting problems, including problems that may have occurred during the continuous deployment cycle. A native DevOps-centric approach to problem resolution puts the focus on a wider range of possible error sources (including code commits), makes use of DevOps metadata to clearly define the source of the problem, and leads to a quick problem resolution. We propose such a continuous quality assurance approach, and we demonstrate it by preliminary experiments in our public Container Cloud environment and in a private OpenStack® cloud environment.

[1]  Mona Attariyan,et al.  AutoBash: improving configuration management with operating system causality analysis , 2007, SOSP.

[2]  Shan Lu,et al.  Flight data recorder: monitoring persistent-state interactions to improve systems management , 2006, OSDI '06.

[3]  Jennifer Neville,et al.  Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems , 2012, NSDI.

[4]  Mona Attariyan,et al.  X-ray: Automating Root-Cause Diagnosis of Performance Anomalies in Production Software , 2012, OSDI.

[5]  Richard P. Martin,et al.  Understanding and Validating Database System Administration , 2006, USENIX Annual Technical Conference, General Track.

[6]  Michael I. Jordan,et al.  Detecting large-scale system problems by mining console logs , 2009, SOSP '09.

[7]  Helen J. Wang,et al.  Automatic Misconfiguration Troubleshooting with PeerPressure , 2004, OSDI.

[8]  Harald C. Gall,et al.  EvoGraph: A Lightweight Approach to Evolutionary and Structural Analysis of Large Software Systems , 2006, 2006 13th Working Conference on Reverse Engineering.

[9]  Junfeng Yang,et al.  Context-based Online Configuration-Error Detection , 2011, USENIX Annual Technical Conference.

[10]  Tianyin Xu,et al.  EnCore: exploiting system environment and correlation information for misconfiguration detection , 2014, ASPLOS.

[11]  Armando Fox,et al.  Capturing, indexing, clustering, and retrieving system history , 2005, SOSP '05.

[12]  Harald C. Gall,et al.  Towards an Integrated View on Architecture and its Evolution , 2005, Electron. Notes Theor. Comput. Sci..

[13]  Anees Shaikh,et al.  PDA: A Tool for Automated Problem Determination , 2007, LISA.

[14]  David A. Patterson,et al.  Path-Based Failure and Evolution Management , 2004, NSDI.

[15]  Richard P. Martin,et al.  Understanding and Dealing with Operator Mistakes in Internet Services , 2004, OSDI.

[16]  Jez Humble,et al.  Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation , 2010 .

[17]  Michael H. Kalantar,et al.  Weaver: Language and runtime for software defined environments , 2014, IBM J. Res. Dev..

[18]  Marc Reichenbach,et al.  Continuous Integration and Automation for Devops , 2013 .

[19]  Wei-Ying Ma,et al.  Combining High Level Symptom Descriptions and Low Level State Information for Configuration Fault Diagnosis , 2004, LISA.