Conducting quantitative software engineering studies with Alitheia Core

Quantitative empirical software engineering research benefits mightily from processing large open source software repository data sets. The diversity of repository management tools and the long history of some projects, renders the task of working with those datasets a tedious and error-prone exercise. The Alitheia Core analysis platform preprocesses repository data into an intermediate format that allows researchers to provide custom analysis tools. Alitheia Core automatically distributes the processing load on multiple processors while enabling programmatic access to the raw data, the metadata, and the analysis results. The tool has been successfully applied on hundreds of medium to large-sized open-source projects, enabling large-scale empirical studies.

[1]  Ahmed E. Hassan,et al.  An experience report on scaling tools for mining software repositories using MapReduce , 2010, ASE '10.

[2]  Stéphane Ducasse,et al.  Modeling history to analyze software evolution: Research Articles , 2006 .

[3]  Harald C. Gall,et al.  Change Distilling:Tree Differencing for Fine-Grained Source Code Change Extraction , 2007, IEEE Transactions on Software Engineering.

[4]  Joost Visser,et al.  Faster issue resolution with higher technical quality of software , 2011, Software Quality Journal.

[5]  Thomas Zimmermann,et al.  Information needs for software development analytics , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[6]  Joost Visser,et al.  A Practical Model for Measuring Maintainability , 2007, 6th International Conference on the Quality of Information and Communications Technology (QUATIC 2007).

[7]  Harald C. Gall,et al.  Change Analysis with Evolizer and ChangeDistiller , 2009, IEEE Software.

[8]  Gregorio Robles,et al.  Empirical Software Engineering Research on Free/Libre/Open Source Software , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[9]  Walt Scacchi,et al.  Open Source Software Development , 2011 .

[10]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[11]  Diomidis Spinellis Code Quality: The Open Source Perspective (Effective Software Development Series) , 2006 .

[12]  Audris Mockus Amassing and indexing a large sample of version control systems: Towards the census of public source code history , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[13]  Gregorio Robles-Martínez,et al.  Empirical Software Engineering Research on Libre Software: Data Sources, Methodologies and Results , 2012 .

[14]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[15]  Tauno Kekäle,et al.  Beautiful Code. Leading Programmers Explain How They Think , 2009 .

[16]  Peter W. Resnick,et al.  Internet Message Format , 2001, RFC.

[17]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[18]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[19]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[20]  Chandra Krintz,et al.  An Evaluation of Distributed Datastores Using the AppScale Cloud Platform , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[21]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[22]  Paul W. Oman,et al.  Construction and testing of polynomials predicting software maintainability , 1994, J. Syst. Softw..

[23]  Michael A. Cusumano,et al.  Platform leadership , 2002 .

[24]  Kumaraswamy Ponnambalam,et al.  A maintainability model for industrial software systems using design level metrics , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[25]  Mary Shaw,et al.  Writing good software engineering research papers , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Gerardo Canfora,et al.  Fine grained indexing of software repositories to support impact analysis , 2006, MSR '06.

[27]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[28]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[29]  Audris Mockus,et al.  Future of Mining Software Archives: A Roundtable , 2009, IEEE Software.

[30]  Mary Shaw,et al.  Writing good software engineering research papers: minitutorial , 2003, ICSE 2003.

[31]  Jr. Frederick P. Brooks,et al.  The Mythical Man-Month: Essays on Softw , 1978 .

[32]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[33]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[34]  Victor R. Basili,et al.  The role of experimentation in software engineering: past, current, and future , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[35]  G. Krogh Open-Source Software Development , 2003 .

[36]  A BernsteinPhilip,et al.  Multiversion concurrency controltheory and algorithms , 1983 .

[37]  Daniel Izquierdo-Cortazar,et al.  FLOSSMetrics: Free/Libre/Open Source Software Metrics , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[38]  Michele Lanza,et al.  Distributed and Collaborative Software Evolution Analysis with Churrasco , 2010, Sci. Comput. Program..

[39]  Tony Gorschek,et al.  Empirical evidence in global software engineering: a systematic review , 2010, Empirical Software Engineering.

[40]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[41]  I K SjobergDag,et al.  A Survey of Controlled Experiments in Software Engineering , 2005 .

[42]  Marvin V. Zelkowitz,et al.  Experimental validation in software engineering , 1997, Inf. Softw. Technol..

[43]  Harald C. Gall,et al.  On the relation of refactorings and software defect prediction , 2008, MSR '08.

[44]  ZhangQin,et al.  Improving Software Development Management through Software Project Telemetry , 2005 .

[45]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[46]  Georgios Gousios,et al.  A platform for software engineering research , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[47]  Giancarlo Succi,et al.  Open Source Development, Communities and Quality: IFIP 20th World Computer Congress, Working Group 2.3 on Open Source Software, September 7-10, 2008, ... in Information and Communication Technology) , 2011 .

[48]  Michele Lanza,et al.  The Small Project Observatory , 2007 .

[49]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[50]  Christof Ebert,et al.  Open Source Development , 2012 .

[51]  David Montgomery,et al.  The Mythical Man , 2008, International Labor and Working-Class History.

[52]  Premkumar T. Devanbu,et al.  MIC check: A correlation tactic for ESE data , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[53]  Eirini Kalliamvakou,et al.  Mediterranean Conference on Information Systems ( MCIS ) 2009 Measuring Developer Contribution From Software Repository Data , 2017 .

[54]  Joost Visser,et al.  Faster Defect Resolution with Higher Technical Quality of Software , 2010 .

[55]  Katsuro Inoue,et al.  Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder , 2007, 29th International Conference on Software Engineering (ICSE'07).

[56]  Manos Roumeliotis,et al.  A Review of Experimental Investigations into Object-Oriented Technology , 2004, Empirical Software Engineering.

[57]  Measuring the Occurrence of Security-Related Bugs through Software Evolution , 2012, 2012 16th Panhellenic Conference on Informatics.

[58]  D. Spinellis,et al.  How is open source affecting software development? , 2004, IEEE Software.

[59]  Qin Zhang,et al.  Improving software development management through software project telemetry , 2005, IEEE Software.

[60]  Pearl Brereton,et al.  Evaluation and assessment in software engineering , 1997, J. Syst. Softw..

[61]  Michael Mitzenmacher,et al.  Detecting Novel Associations in Large Data Sets , 2011, Science.

[62]  Michael W. Godfrey,et al.  Facilitating software evolution research with kenyon , 2005, ESEC/FSE-13.

[63]  Russell W. Quong,et al.  ANTLR: A predicated‐LL(k) parser generator , 1995, Softw. Pract. Exp..

[64]  Giancarlo Succi,et al.  Open Source Development, Communities and Quality, IFIP 20th World Computer Congress, Working Group 2.3 on Open Source Software, OSS 2008, September 7-10, 2008, Milano, Italy , 2008, OSS.

[65]  Jesús M. González-Barahona,et al.  On the reproducibility of empirical software engineering studies based on data retrieved from development repositories , 2011, Empirical Software Engineering.

[66]  RoblesGregorio,et al.  Developer identification methods for integrated data from various sources , 2005 .

[67]  Zheng Shao,et al.  Data warehousing and analytics infrastructure at facebook , 2010, SIGMOD Conference.

[68]  Tore Dybå,et al.  The Future of Empirical Methods in Software Engineering Research , 2007, Future of Software Engineering (FOSE '07).

[69]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[70]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[71]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[72]  Rob Pike,et al.  Interpreting the data: Parallel analysis with Sawzall , 2005, Sci. Program..

[73]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[74]  Arie van Deursen,et al.  REPORT RAPPORT , 1997 .

[75]  Daniela E. Damian,et al.  The promises and perils of mining GitHub , 2009, MSR 2014.

[76]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.

[77]  Jesús M. González-Barahona,et al.  Developer identification methods for integrated data from various sources , 2005, ACM SIGSOFT Softw. Eng. Notes.

[78]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[79]  Stephen J. Fink,et al.  The Jalapeño virtual machine , 2000, IBM Syst. J..

[80]  Sushil Krishna Bajracharya,et al.  Sourcerer: mining and searching internet-scale software repositories , 2008, Data Mining and Knowledge Discovery.

[81]  Hridesh Rajan,et al.  Boa: analyzing ultra-large-scale code corpus , 2012, SPLASH '12.

[82]  Yuan Lin Mining and Analyzing Behavioral Characteristic of Developers in Open Source Software , 2010 .

[83]  Anita Sarma,et al.  Tesseract: Interactive visual exploration of socio-technical relationships in software development , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[84]  Daniel M. Germán,et al.  Measuring fine-grained change in software: towards modification-aware change metrics , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[85]  Carolyn B. Seaman,et al.  Qualitative Methods in Empirical Studies of Software Engineering , 1999, IEEE Trans. Software Eng..

[86]  Adam A. Porter,et al.  Empirical studies of software engineering: a roadmap , 2000, ICSE '00.

[87]  Paul Lukowicz,et al.  Experimental evaluation in computer science: A quantitative study , 1995, J. Syst. Softw..

[88]  Ahmed E. Hassan,et al.  Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report , 2012, J. Syst. Softw..

[89]  Ian Witten,et al.  Data Mining , 2000 .

[90]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[91]  Diomidis Spinellis,et al.  Code Quality: The Open Source Perspective , 2006 .

[92]  Grigori Melnik,et al.  On the success of empirical studies in the international conference on software engineering , 2006, ICSE.

[93]  Diomidis Spinellis A tale of four kernels , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[94]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[95]  Philip A. Bernstein,et al.  Categories and Subject Descriptors: H.2.4 [Database Management]: Systems. , 2022 .

[96]  Venkataraman Ramesh,et al.  Research in software engineering: an analysis of the literature , 2002, Inf. Softw. Technol..

[97]  Tom Mens,et al.  A comparison of identity merge algorithms for software repositories , 2013, Sci. Comput. Program..

[98]  B SeamanCarolyn Qualitative Methods in Empirical Studies of Software Engineering , 1999 .

[99]  Stéphane Ducasse,et al.  Modeling history to analyze software evolution , 2006, J. Softw. Maintenance Res. Pract..

[100]  Serge Demeyer,et al.  FAMIX 2. 1-the FAMOOS information exchange model , 1999 .

[101]  Romain Robbes,et al.  The Small Project Observatory: Visualizing software ecosystems , 2010, Sci. Comput. Program..

[102]  Angélica Caro,et al.  A Probabilistic Approach to Web Portal's Data Quality Evaluation , 2007 .

[103]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[104]  Ioannis Stamelos,et al.  Open source software development should strive for even greater code maintainability , 2004, CACM.

[105]  Barbara A. Kitchenham,et al.  The role of replications in empirical software engineering—a word of warning , 2008, Empirical Software Engineering.

[106]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..