Management of historical data in continuous integration systems

Continuous integration is an agile software practice where code is checked in frequently and subsequently built and tested automatically. Due to the maturity of this practice at the company Ericsson, the frequency of these automated builds is increasing at the company and as such, much more data about the development process is generated. However, the software systems that gather and present this data was not designed to scale with the current data growth. This has led to several problems surrounding the continuous integration process and the evolution of the software systems that support this process. This master’s thesis reports on the design and evaluation of two different prototypes for management of historical data in continuous integration systems. One of these prototypes uses a NoSQL database for storing historical data and one uses a relational database. The constructed prototypes where designed specifically to address problems related to scalability and performance of the current continuous integration systems in use at Ericsson. Evaluation shows that the prototypes solve scalability problems and increase performance of current systems by separating live and historical data. The value of the prototypes is further motivated by investigating how historical data in a continuous integration system can be utilized for the benefit of the company that has this data.

[1]  Judy L. Bastin About Database Administration , 2014 .

[2]  Athanasios V. Vasilakos,et al.  Big data: From beginning to future , 2016, Int. J. Inf. Manag..

[3]  Kenneth Mark Anderson,et al.  MySQL to NoSQL: data modeling challenges in supporting scalability , 2012, SPLASH '12.

[4]  Krish Krishnan,et al.  Data Warehousing in the Age of Big Data , 2013 .

[5]  Daniel J. Abadi,et al.  Consistency Tradeoffs in Modern Distributed Database System Design: CAP is Only Part of the Story , 2012, Computer.

[6]  Kevin Waugh,et al.  A Classification of Object-Relational Impedance Mismatch , 2009, 2009 First International Confernce on Advances in Databases, Knowledge, and Data Applications.

[7]  Michael Stonebraker,et al.  10 rules for scalable performance in 'simple operation' datastores , 2011, Commun. ACM.

[8]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[9]  Samir Chatterjee,et al.  A Design Science Research Methodology for Information Systems Research , 2008 .

[10]  Syed Akhter Hossain,et al.  NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison , 2013, ArXiv.

[11]  Andrew Glover,et al.  Continuous Integration: Improving Software Quality and Reducing Risk (The Addison-Wesley Signature Series) , 2007 .

[12]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[13]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..

[14]  Alan R. Hevner,et al.  Design Science in Information Systems Research , 2004, MIS Q..

[16]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[17]  John Sharp,et al.  Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence , 2013 .

[18]  Guan Le,et al.  Survey on NoSQL database , 2011, 2011 6th International Conference on Pervasive Computing and Applications.

[19]  Anne Cleven,et al.  Design alternatives for the evaluation of design science research artifacts , 2009, DESRIST.