Evaluating the Effects of Architectural Documentation: A Case Study of a Large Scale Open Source Project

Sustaining large open source development efforts requires recruiting new participants; however, a lack of architectural documentation might inhibit new participants since large amounts of project knowledge are unavailable to newcomers. We present the results of a multitrait, multimethod analysis of the effects of introducing architectural documentation into a substantial open source project-the Hadoop Distributed File System (HDFS). HDFS had only minimal architectural documentation, and we wanted to discover whether the putative benefits of architectural documentation could be observed over time. To do this, we created and publicized an architecture document and then monitored its usage and effects on the project. The results were somewhat ambiguous: by some measures the architecture documentation appeared to effect the project but not by others. Perhaps of equal importance is our discovery that the project maintained, in its Web-accessible JIRA archive of software issues and fixes, enough architectural discussion to support architectural thinking and reasoning. This “emergent” architecture documentation served an important purpose in recording core project members' architectural concerns and resolutions. However, this emergent architecture documentation did not serve all project members equally well; it appears that those on the periphery of the project-newcomers and adopters-still require explicit architecture documentation, as we will show.

[1]  David Garlan,et al.  Documenting software architectures: views and beyond , 2002, 25th International Conference on Software Engineering, 2003. Proceedings..

[2]  青島 矢一,et al.  書評 カーリス Y. ボールドウィン/キム B. クラーク著 安藤晴彦訳『デザイン・ルール:モジュール化パワー』 Carliss Y. Baldwin & Kim B. Clark/Design Rules, Vol. 1: The Power of Modularity , 2005 .

[3]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[4]  Kim B. Clark,et al.  The power of modularity , 2000 .

[5]  M. Callon,et al.  Mapping the Dynamics of Science and Technology , 1986 .

[6]  Walt Scacchi,et al.  Understanding Requirements for Open Source Software , 2009 .

[7]  Yuanfang Cai,et al.  Detecting software modularity violations , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[8]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[9]  Muhammad Ali Babar,et al.  The use of empirical methods in Open Source Software research: Facts, trends and future directions , 2009, 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development.

[10]  Walt Scacchi,et al.  Understanding the requirements for developing open source software systems , 2002, IEE Proc. Softw..

[11]  Pekka Abrahamsson,et al.  Microblogging in Open Source Software Development: The Case of Drupal and Twitter , 2014, IEEE Software.

[12]  Carsten S. Østerlund,et al.  Boundary-Spanning Documents in Online FLOSS Communities: Does One Size Fit All? , 2013, 2013 46th Hawaii International Conference on System Sciences.

[13]  Andrew E. Smith,et al.  Evaluation of unsupervised semantic mapping of natural language with Leximancer concept mapping , 2006, Behavior research methods.

[14]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[15]  Paul A. David,et al.  “It Takes All Kinds”: A Simulation Modeling Perspective on Motivation and Coordination in Libre Software Development Projects , 2007 .

[16]  Petra Kaufmann,et al.  Experimental And Quasi Experimental Designs For Research , 2016 .

[17]  Kevin Crowston,et al.  Reclassifying Success and Tragedy in FLOSS Projects , 2010, OSS.

[18]  Henry Muccini,et al.  What Industry Needs from Architectural Languages: A Survey , 2013, IEEE Transactions on Software Engineering.

[19]  Yuan Long,et al.  Social Network Structures in Open Source Software Development Teams , 2007, J. Database Manag..

[20]  C. Holt Author's retrospective on ‘Forecasting seasonals and trends by exponentially weighted moving averages’ , 2004 .

[21]  Walid Maalej,et al.  How do open source communities blog? , 2012, Empirical Software Engineering.

[22]  Raphaël Semeteys,et al.  Method for Qualification and Selection of Open Source Software , 2008 .

[23]  John W. Creswell,et al.  Designing and Conducting Mixed Methods Research , 2006 .

[24]  GhemawatSanjay,et al.  The Google file system , 2003 .

[25]  Matthias Naab,et al.  Software Architecture Documentation for Developers: A Survey , 2013, Software Engineering.

[26]  M. Zelen,et al.  Rethinking centrality: Methods and examples☆ , 1989 .

[27]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .

[28]  Philippe Kruchten,et al.  The 4+1 View Model of Architecture , 1995, IEEE Softw..

[29]  Etienne Wenger,et al.  Situated Learning: Legitimate Peripheral Participation , 1991 .

[30]  Giancarlo Succi,et al.  Assessing the Open Source Development Processes Using OMM , 2012, Adv. Softw. Eng..

[31]  Jean Pierre Courtial,et al.  Historical scientometrics? Mapping over 70 years of biological safety research with coword analysis , 1993, Scientometrics.

[32]  Hans van Vliet,et al.  Software Architecture Knowledge Management , 2008, 19th Australian Conference on Software Engineering (aswec 2008).

[33]  Chandrasekar Subramaniam,et al.  Determinants of open source software project success: A longitudinal study , 2009, Decis. Support Syst..

[34]  Arie van Deursen,et al.  Communication in open source software development mailing lists , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[35]  Z. Harris,et al.  Book Reviews: The Form of Information in Science: Analysis of an Immunology Sublanguage , 1989, CL.

[36]  Jan Bosch,et al.  Making the Right Decision: Supporting Architects with Design Decision Data , 2013, ECSA.

[37]  Jan Venselaar,et al.  DESIGN RULES , 1999 .

[38]  Sumeet Gupta,et al.  Measuring open source software success , 2009 .

[39]  Marcelo Serrano Zanetti,et al.  The co-evolution of socio-technical structures in sustainable software development: Lessons from the open source software communities , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[40]  Alberto Cambrosio,et al.  Big Data and the Collective Turn in Biomedicine. How Should We Analyze Post-genomic Practices? , 2014 .

[41]  Dennis R. Goldenson,et al.  Requirements and Their Impact Downstream: Improving Causal Analysis Processes Through Measurement and Analysis of Textual Information , 2008 .

[42]  Peng Liang,et al.  Knowledge-based approaches in software documentation: A systematic literature review , 2014, Inf. Softw. Technol..

[43]  Walt Scacchi,et al.  Ongoing software development without classical requirements , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[44]  R. English,et al.  Identifying Success and Tragedy of FLOSS Commons: A Preliminary Classification of Sourceforge.net Projects , 2007, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007).

[45]  Sandro Morasca,et al.  OpenBQR: a framework for the assessment of OSS , 2007, OSS.

[46]  Alan MacCormack,et al.  Exploring the Structure of Complex Software Designs: An Empirical Study of Open Source and Proprietary Code , 2006, Manag. Sci..

[47]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[48]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[49]  T. Kuhn,et al.  The Structure of Scientific Revolutions: 50th Anniversary Edition , 2012 .

[50]  Francesco Rullani,et al.  Explaining leadership in virtual teams: The case of open source software , 2008, Inf. Econ. Policy.

[51]  Bing Li,et al.  Applying Centrality Measures to the Behavior Analysis of Developers in Open Source Software Community , 2012, 2012 Second International Conference on Cloud and Green Computing.

[52]  Yuanfang Cai,et al.  Design rule spaces: a new form of architecture insight , 2014, ICSE.

[53]  Ingo Scholtes,et al.  Categorizing bugs with social networks: A case study on four open source software communities , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[54]  Richard N. Taylor,et al.  Software architecture: foundations, theory, and practice , 2009, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[55]  Neal S. Coulter,et al.  Software Engineering as Seen through Its Research Literature: A Study in Co-Word Analysis , 1998, J. Am. Soc. Inf. Sci..

[56]  Christine Nadel,et al.  Case Study Research Design And Methods , 2016 .

[57]  Françoise Détienne,et al.  A Methodological Framework for Socio-Cognitive Analyses of Collaborative Design of Open Source Software , 2006, Computer Supported Cooperative Work (CSCW).

[58]  Kim B. Clark,et al.  The Option Value of Modularity in Design: An Example From Design Rules, Volume 1: The Power of Modularity , 2000 .

[59]  Peng Liang,et al.  How Do Open Source Communities Document Software Architecture: An Exploratory Survey , 2014, 2014 19th International Conference on Engineering of Complex Computer Systems.

[60]  Kevin Crowston,et al.  Free/Libre open-source software development: What we know and what we do not know , 2012, CSUR.

[61]  J. Mackie,et al.  The Conduct of Inquiry: Methodology for Behavioural Science , 1965 .

[62]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[63]  Bert M. Sadowski,et al.  Transition of governance in a mature open software source community: Evidence from the Debian case , 2008, Inf. Econ. Policy.

[64]  Kevin Crowston,et al.  Information systems success in free and open source software development: theory and measures , 2006, Softw. Process. Improv. Pract..

[65]  Rob J Hyndman,et al.  Automatic Time Series Forecasting: The forecast Package for R , 2008 .

[66]  Jeannett Martin,et al.  Writing Science: Literacy And Discursive Power , 1993 .

[67]  Leonard J. Bass,et al.  Developing Architectural Documentation for the Hadoop Distributed File System , 2011, OSS.

[68]  Yvonne Dittrich,et al.  Software architecture awareness in long-term software product evolution , 2010, J. Syst. Softw..

[69]  Rick Kazman,et al.  Playing Detective: Reconstructing Software Architecture from Available Evidence , 1999, Automated Software Engineering.

[70]  Bernard Golden Making Open Source Ready for the Enterprise: The Open Source Maturity Model , 2008 .

[71]  Kate Ehrlich,et al.  All-for-one and one-for-all?: a multi-level analysis of communication patterns and individual performance in geographically distributed software development , 2012, CSCW.

[72]  Daniela E. Damian,et al.  Mining Task-Based Social Networks to Explore Collaboration in Software Teams , 2009, IEEE Software.

[73]  A. Brown,et al.  The Architecture of Open Source Applications , 2011 .

[74]  Rick Kazman,et al.  The metropolis model a new logic for development of crowdsourced systems , 2009, CACM.

[75]  Peter Naur,et al.  Programming as theory building , 1985 .

[76]  Torsten Bumgarner Software Architecture Knowledge Management Theory And Practice , 2016 .