Unison: A File Synchronizer and Its Specification

File synchronizers are tools that reconcile disconnected modifications to replicated directory structures. Like other replication and reconciliation facilities provided by modern operating systems and middleware layers, trustworthy synchronizers are notoriously difficult to build: they must deal correctly with both the semantic complexities of file systems and the unpredictable failure modes arising from distributed operation. On the other hand, synchronizers are simpler than most of their relatives in that they operate as stand-alone, user-level utilities, whose intended behavior is relatively easy to isolate from the other functions of the system. This combination of subtlety and isolation makes synchronizers attractive candidates for precise mathematical specification.We describe the specification and implementation of Unison - a file synchronizer engineered for portability, speed, and robustness, with thousands of daily users. Unison's code base and its specification have evolved in parallel, over several years, and each has strongly influenced the other. We present a precise high-level specification of Unison's behavior, an idealized implementation, and the outline of a proof (which we have formalized using Coq) that the implementation satisfies the specification. We begin with a straightforward definition of the system's core behavior - propagation of changes and detection of conflicting changes - then refine it to take into account the possibility of failures during reconciliation, then refine it again to cover synchronization of "metadata" such as permissions and modification times.In each part, we address two critical issues: first, the relation between the informal expectations of users and our mathematical specification, and, second, the relation between our idealized implementation and the actual code base (i.e., the abstractions needed to obtain a tractable mathematical object from a real-world systems program, and the extent to which studying this idealized implementation sheds useful light on the real one).