Configuration Management and Open Source Projects

Configuration management tools are at the heart of every software project. Thus, it should not be surprising that they play a central role in Open Source projects as well. Most prominent in use is CVS, which is—indeed—an Open Source system in its own right. In this position paper we examine why CVS plays such a major role in the management of Open Source projects. Furthermore, we raise some areas in which we believe CVS should be improved, both in the short and long term. Concurrent Versions Systems As one of the essential tools needed during the development of a software product, configuration management tools were among the first faced with the reality of having to operate in a distributed setting. In response, many distributed CM systems have been developed in the past few years (e.g., ClearCase MultiSite [1], Gradient [3], ScmEngine [5], DISC [12], WWCM [13], DSCS [16], Perforce [17]). Despite this broad availability, it is clear that a single system has emerged as the de-facto configuration management system used in Open Source projects. In fact, this CM system, CVS [4], has been adopted by almost every major Open Source project. Further evidencing its popularity is the fact that CVS is the only CM system that has a book dedicated to its use in Open Source projects [9]. Several reasons can be identified for this intriguing phenomenon. • The CM policy embedded in CVS closely matches the Open Source process. An Open Source projects is typically organized around a central repository from which individual developers retrieve copies of the project. Developers make their changes within one of these copies. Once the changes are complete, they update the repository with new versions of the artifacts that have changed. Other developers synchronize their copies of the project by periodically downloading updated versions of artifacts and resolving any conflicts that may exist. This is exactly the CM policy that CVS excels in supporting: CVS is based on a single level of transactions that each are based on an optimistic scheme of conflict resolution. This one-to-one correspondence between the actual Open Source process and the process supported by CVS makes it very appealing to use CVS in Open Source projects. • CVS supports decentralized software development. Since Open Source projects involve developers that are located all over the world, a requirement for a configuration management tool to be used in an Open Source project is that the tool operates in a decentralized and distributed setting. Preferably, even, the tool supports intermittent disconnected operation in that each developer does not continuously have to be connected to a main repository with artifacts. Although initially not devised as a distributed CM system, CVS has been enhanced over the years to provide several methods of access to its central repository with artifacts. Together with its optimistic method of resolving conflicts, CVS, thus, precisely matches the distributed capabilities needed in an Open Source project • CVS is free, yet well maintained. Since Open Source projects typically have little to no funding and commercial CM systems tend to be rather expensive in nature, the use of a commercial CM system in an Open Source project is usually impossible. CVS is free. Despite being free, however, CVS is well maintained and rather complete in functionality when compared to the other freely available CM systems. In fact, CVS is an Open Source project itself and has been in widespread use for many years now. As a result, many of its initial problems have been solved and CVS is currently one of the best freely available CM systems. Given this combination of factors, it should come as no surprise that CVS is so widely used in Open Source projects. It provides the necessary functionality at a more than reasonable price: it is free, yet easy to install, setup, learn, and use. Potential Short-Term Enhancements to CVS With its unparalleled success, complacency seems to have settled into CVS and the functionality that it provides to its users. In fact, the functionality and CM policy that form the core of CVS have not been enhanced for quite some time now. Unfortunately, a close examination of the Open Source process seems to indicate that several enhancements to CVS could greatly enhance the applicability of CVS in the future to come. Specifically, we believe that the following four enhancements are important to be made in the near future. • An infrastructure that supports multiple repositories. More and more Open Source projects are based upon other pieces of software from other Open Source projects. Currently, these pieces of software need to be periodically incorporated via the vendor-code management functions of CVS. Although certainly usable, this solution becomes unwieldy if many subcomponents are present that each have a different release schedule. It would be preferable to link various CVS repositories together to directly and continuously import source code for subcomponents. In essence, this brings an automated and enhanced version of a tool like SRM [20] to the software development process. • Versioning of directories. One obvious improvement to be made to CVS is its handling of directory versioning. Currently, directories are not versioned at all, even though their contents can change over time. This not only leads to a rather crude way of handling these types of changes, but also to an overuse of tags to label the various configurations in which a project may exist over time. Given that more and more Open Source projects create a large number of configurations and regularly reorganize their project structure, this is a rather serious problem that deserves immediate attention. Fortunately, the wellknown solution of versioning directories solves this problem: it provides a clean way of dealing with the changing content of a directory and it provides a convenient and natural way of dealing with configurations. This is demonstrated by, for example, PRCS [14] and COOP/Orm [15], both of which are CM systems that intrinsically support the versioning of directories. • Private versioning capabilities. Despite the fact that developers may store intermediate versions of artifacts in a CVS repository, it is generally encouraged that only complete and working changes are committed. Therefore, developers are left without any versioning support in their private workspaces. As Open Source projects are becoming larger and changes more complex, such a capability is much needed. As demonstrated, for example, by Continuus [6] and Perforce [17], the availability of such functionality enhances the development experience and typically leads to the creation of many intermediate versions before the final changes are stored in the main repository. These intermediate versions remain private to a developer and they neither interfere with changes from others, nor clutter the version history in the main repository. • Repository Replication. It is common to use CVS in combination with a replication program like rsync [19] to improve access times for developers that are physically located in different continents than the main project repository. Although certainly beneficial, this solution has the problem that conflicts arising during synchronization cannot be resolved by rsync. Instead, the synchronization fails and manual intervention is needed to integrate and merge the changes from developers that use different instances of a replicated repository. Since rsync and CVS share many pieces of functionality, it should be possible to merge both into an integrated solution that resolves conflicts during synchronization in the same way CVS resolves conflicts during regular development. This would lead to a solution like ClearCase MultiSite [1]. It should be observed that each of these enhancements is based on solutions that already exist in commercial CM systems. They, in effect, can be seen as bringing CVS up-to-date with some of the advanced functionality that not only has emerged in today's CM systems, but also has proven to be very beneficial. It should also be noted that none of the above suggestions involves changing the core policy or functionality of CVS. The basic premise of a transaction-oriented CM system that resolves conflicts in an optimistic way remains. The functionality suggested merely increases the applicability and utility of CVS, it does not change its fundamental principles. Potential Long-Term Enhancements to CVS Even with the short-term enhancements suggested in the previous section, it remains an open question as to how long CVS, in its current incarnation, will survive as the myriad of CM systems that are available continue to evolve and incorporate more advanced functionality. Therefore, it may be time to look into a complete redesign of CVS that radically advances its functionality. In fact, we believe it is possible to leapfrog most of the existing CM systems in terms of functionality and popularity if a new, reincarnated CVS supports a complete cycle of the Open Source process and not just version control. In particular, we suggest that CVS be redesigned to include not only the changes suggested in the previous section, but also changes that lead to the incorporation of such activities as release management (automatically packaging software, creating and maintaining a change log, and publishing a package on a Web site), bug tracking (filing bug reports, keeping an archive of resolved bugs, and associating bug reports to those versions of the source code that fix each bug), and deployment (installing the software at the client side, periodically polling for updates, and actually upgrading the version of the software on the installed base). Although ambitious, several pieces of infrastructure exist that have proven to be beneficial in their respective domains and that may help in realizing the vision of an integrated and rejuvenated CVS. • SRM. S

[1]  Walter F. Tichy,et al.  Distributed Configuration Management via Java and the World Wide Web , 1997, SCM.

[2]  Murali Ramakrishnan Software release management , 2004, Bell Labs Technical Journal.

[3]  [8] Karl Fogel, Open Source Development With CVS, The Coriolis Group , .

[4]  Paul N. Hilfinger,et al.  PRCS: The Project Revision Control System , 1998, SCM.

[5]  Moshe Bar,et al.  Open Source Development with CVS , 1999 .

[6]  Brian Berliner,et al.  CVS II: Parallelizing Software Dev elopment , 1998 .

[7]  Richard S. Hall,et al.  An architecture for post-development configuration management in a wide-area network , 1997, Proceedings of 17th International Conference on Distributed Computing Systems.

[8]  J. Davenport Editor , 1960 .

[9]  Dennis Heimbigner,et al.  A generic, peer-to-peer repository for distributed configuration management , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[10]  Dave Belanger,et al.  Infrastructure for Wide-Area Software Development , 1996, SCM.

[11]  Bartosz Milewski Distributed Source Control System , 1997, SCM.

[12]  Richard S. Hall,et al.  A cooperative approach to support software deployment using the Software Dock , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[13]  David B. Leblang,et al.  ClearCase MultiSite: Supporting Geographically-Distributed Software Development , 1995, SCM.

[14]  Richard S. Hall,et al.  Software release management , 1997, ESEC '97/FSE-5.

[15]  Ed Bailey,et al.  Maximum RPM , 1997 .

[16]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[17]  Boris Magnusson,et al.  Fine Grained Version Control of Configurations in COOP/Orm , 1996, SCM.