Progress Report on Version Aware LibreOffice

In an earlier paper at DocEng 2013, we reported on our efforts to make LibreOffice Writer documents be "version aware". Version aware documents use a namespace protected preamble to include a complete version history within the saved document file, plus unique identifier attributes on the document content elements in order to support efficient differencing and merging of versions. A particular challenge in this effort has been to ensure that the unique identifiers on the elements would be preserved through a complete load-edit-save cycle. This is challenging because content element data passes through three representations in its lifetime. At load time, XML is read to create ImportContext objects, which are then used to generate internal data structures used during editing. At save time, the internal data structures are converted to ExportContext objects, from which XML is generated for the saved file. The internal data structures are drawn from a small forest of inheritance hierarchies, where each hierarchy has a slightly different construction-destruction protocol and thus each one requires a different solution to preserving the unique identifiers. Working with a particular snapshot of the C++ implementation of LibreOffice Writer, we have reached a point where unique identifiers are preserved on nearly all content elements used in Writer documents. Unfortunately, there is no support for versioning of document style elements at this time. Support for version awareness has added about 3000 lines of code to a code base of slightly more than one million lines. The changes affect 128 files out of a total of 3354, organized in three large modules. We believe that these numbers show that adding full support for version awareness would have only a modest affect on the implementation of an office software suite. However, the fairly large number of files affected shows that version awareness resembles a non-functional requirement, since its support is not isolated in a small set of files.