Replicated document management in a group communication system

This paper is about the design and implementatbn of a repllwted database that forms the basis for the N&s*gray, communication system The system supports groups of people working on shared sets of donunents and is intended jor use in a p e r s o ~ l computer network enulronment in which the database servers are 'tarely connected". Most algorithms for guamnteetng amsistency across repllcas require more reliable network amnectbns between serwrs for adequate performance. Analysis of many group oommunkatfon applicatbns, however, revealed relatlwly weak consistency requirements across copies of the database. These requirements can be met b y a simple replication algorlthrn that works well in rarely connected environments. TRls kind of replication has been used prevfously for a limited set of applications such as name directory replicatfon: we haw applied this technique to a much larger class of applfcatlons. Our characterfzatbn ojthfs class of applications suggests that this technique generalizes to support dfstrlbuted database Lmplementatlons of other gray, Luork systems, tncluding computer oonfeenclng and bulletin board systems. Presented at the Second Conference on ComputerSupporId Coopontive Work. Portland, Oregon, September 26-28, 1988 Notes is a group communication system that is used by people to share textual. numerlc, and graphical information. The system operates on personal computers in local-area and wide-area networks, and provides end-users the ability to design and create document databases for specfflc applications. This paper focuses on the Notes document manager. which supports replicated databases with a number of interesting characteristics, particularly when compared with replication technology typical in transaction-oriented databases. In tmnsactlon-processing applications for record-oriented database management systems. replication algorithms must meet strict consistency criteria. usually defined in terms of serializabflity of transactions [Gray]. Most implementations of replication algorithms that provide strict consistency depend on high likelihood of continuous connection between database server machines. Notes has a strong requirement to support workgroups that cannot afford continuously available inter-network connections. We refer to such networks as rarelyamteckd networks. This level of connectMty is typical for PC users. Local area networks that connect small workgroups who share printers often arc not intemomected to support cross-group collaboration. Dial-up line connections that cross organizational boundaries an expensive and have low bandwidth. In mly connected environments, replication that guarantees eertallzabillty would at best be possible only at enormous cost in performance. However, Notes applications do not requirt this level of consistency across replicas. As a result we arc able to rely on a simple replication algorithm. It provides the replication that is crucial to the viability of the p d u c t in the PC environment. at a cost that is acceptable in that environment. The replication capability has had several additional benefits for the product, since it also prwldes a means of doing static load balancing on a single network, automatic backup of databases, and support of home or portable computers that operate in a standalone mode. Similar approaches to replicated databases have been previously used for distributed directory services [Oppen] [Smith]. Our work extends this approach to a broader class of applications, including document sharing, electronic mail, and conferencing. The implementation is new as well, because it Is optimized to work well wer low bandwidth, dial up h e s . Section 2 briefly outllnes the characteristics of typical applications. In section 3 we review the design goals for the document manager and replication pwess. Section 4 presents the algorlthms and implementation. Section 5 provides some data on current usage of replication. The concluding section reviews our approach to bringing shared document databases to the PC user. Thts approach meets a wide-range of user needs, not feast of which is the need to collaborate despite the reality of poor network connectMty for PC users in ad hoc workgroups. 2. Application Characteristics Notes is based on a shared document database system that can be tailored to the needs of specific workgroups. A user might participate in a number of workgroup actMties each supported by a different shared database. A database is a collection of related forms or semi-structured [Malonel documents, organized through views that sort or categorize information. Users build specialized applications by tailoring the database to store. organhe and present speclllc kinds of information. For example. a group managing a software development project would want a variety of different documents in the database: bug reports. bug fix notices. comments and suggestions, progress reports, etc. The documents would be organized to make it easy for a reader to h d new items. and so that comments related to particular topics were grouped together. The format of documents and appearance of information can vary from application to application: graphs, images, pictures and numerical information can be intennfxcd with textual information: layout and use of color can give individual applications very distinctive looks. Databases have been designed to support group applications such as: