From ReplayTool to Digital Replay System

DRS, the Digital Replay System, is a software tool being developed by the DReSS node of the UK ESRC-funded National Centre for e-Social Science. It has been developed from the previous ReplayTool application to support the coordinated replay, annotation and analysis of combinations of video, audio, transcripts, images and system log files. DRS uses a new internal data model which gives it much greater flexibility than ReplayTool. It also provides new facilities for project and data management and organization, complex synchronization between related media, structured annotation including transcription and coding (classification), and new support for processing and visualizing log files and databases. It is publicly available under an open source license and is hosted by SourceForge. The current (first public) release emphasizes usability with a core feature set. Two further releases are planned which will make more experimental facilities available to general users. Introduction ReplayTool (French et al., 2006) was the initial prototype developed by the NCeSS DReSS Research Node1 of a suite of tools to enable social scientists to handle ‘digital records’ (Crabtree et al., 2006b). Digital records consist of two essential components: 1) traditional resources that social scientists working in qualitative traditions might gather (video and/or audio recordings, transcripts, photographs, etc.) and 2) ‘system logs’ or electronic recordings of events including interaction in computational environments. DRS and ReplayTool enable time-based data – i.e., system recordings and audio/visual recordings – to be combined and replayed side-by-side and for annotations to be added to create new representations. As described by Crabtree et el. (2006b) ReplayTool was used and extended to support ethnographic analysis of the “Uncle Roy” mobile game/experience as a driving application. In addition the regular meetings of the DReSS node, which includes members from the Schools of Psychology and English, have also been used to share and explore analytical practice and requirements across a range of settings and perspectives. This has led to the identification and prioritization of further requirements. In the second major phase of development activity within DReSS we are responding to these through a major reengineering and extension of ReplayTool to create the “Digital Replay System” (DRS). This paper describes how the following requirements have been addressed: • Generalized support for project and data management and data overview: the “DataGoggles” component described by French et al. (2006) was hand-tailored for a particular pilot project and demonstration purposes. 1 http://www.ncess.ac.uk/research/nodes/DigitalRecord/ • Complex synchronization between multiple related media and log files: e.g., if different analysts have different views of the best correspondence between media or if video recordings run at different speeds or are discontinuous. • Complex and structured annotations: e.g., of time intervals and of non-temporal extents, and annotations which consist of structured codes rather than just text. • Support for log-file processing, storage and visualization within the tool-supported environment: e.g., to support repeatability, sharing and re-use of such elements. In this paper we consider each requirement in turn, describing how they are being addressed in DRS. We then give some examples of current use, and explain how to obtain DRS. The next section briefly describes the main technical changes in data modeling and persistence that underlie the other enhancements that follow. Internal Data Modeling and Persistence To address these diverse requirements the internal data model and storage mechanism for DRS has been changed from a simple file-based XML data model for viewing a single set of media files in ReplayTool to a more comprehensive and extensible data and metadata model based on the W3C’s Resource Description Framework (RDF) and Web Ontology Language (OWL), both Semantic Web technologies supported by the Open Source JENA RDF library for Java.2 The DRS ontology has been created using the Stanford Protégé Ontology editor,3 and includes portions for: • Its own configuration, e.g., workgroup or standalone, system users, JENA models and window layouts. • The files and databases that it is managing. • Other information that the system explicitly uses and depends on, e.g., projects, analyses, timing and synchronization, codes and annotations. • Other metadata associated with any of these, e.g., participants in studies, devices used, etc. This has given us a flexible platform for description and persistence within DRS which in turn has allowed us to respond to the above requirements (see Greenhalgh et al. (2007) for more technical details on the use of RDF and OWL within DRS). Projects and “analyses” The DataGoggles component of ReplayTool (French et al., 2006) provided an example of a project overview and the facility to export events and combinations of media to a simple ReplayTool session. However, this capability was significantly hand-tailored for use with the Uncle Roy driver application. DRS now provides a general “Project” mechanism whereby multiple files and annotation sets can be managed and viewed as a set of distinct “Analyses”. For each project a graphical Project Explorer (figure 1) provides a simple entry point to and 2 See http://www.w3.org/RDF and http://www.w3.org/TR/owl-features 3 See http://protege.stanford.edu representation of the project’s elements and organization (e.g., analyses, media, people). The popup (context) menus for the items within the browser give access to available operations for each item (e.g. to associate some media with a particular analysis). Figure 1. The project browser. In the terminology of DRS an “analysis” is a set of related (generally co-temporal) resources, potentially including digitized videos and audio recordings, images, transcripts, annotations and log files. Each analysis is viewed and manipulated via a similar analysis browser which shows only the resources specifically associated with that analysis. In addition, play-back within each analysis is controlled via VCR-like controls (comparable to ReplayTool). A time-line view is available for each analysis, giving a visual representation of the temporal extent and offsets of the media files, as well as visual representations of audio waveforms, coding and annotation (see figure 2). A basic concordance-style search view is also available to search across all annotations (coding and transcripts) within a project (see figure 3). From here the analyst can open and jump to the analysis and time associated with the text. Figure 2. The track viewer showing the time-line for an analysis with one video and one annotation (coding) track. Figure 3. Searching in the concordance view (the left column links to the particular media or analysis). Synchronization In terms of synchronization between related media and log files, ReplayTool had a simple model with a single viewing timeline and a single offset between that and each media/log file. In DRS every time-based media file and every analysis has its own abstract “timeline”. For example, the track viewer (above, figure 2), shows the temporal relationships between the video ‘Movie.mov’, the coding track ‘MovieCodes’ and the current analysis. These temporal relationships can be expressed in several different ways: • The movie and analysis may have explicit start date/times specified, from which their relationship is inferred – see figure 4. Figure 4. Synchronization using explicit dates and times. • The movie and analysis may have a directly specified relationship between their own timelines independent of the absolute date/time – see figure 5. This temporal relationship can be different for different analyses (e.g. to represent different perspectives on the same event(s)). • In addition to the above options, the coding track may be explicitly linked to the same timeline as the movie (or any other time-based media), e.g. if it is specifically a coding of what is happening in that other media. Figure 5. Synchronization using explicit timeline relationships. Media can be synchronized using the track viewer (figure 2), by dragging individual tracks along the analysis time-line. More comprehensive synchronization options can be found in the Synchronization Manager window, including specifying explicit start times and temporal relationships. Analysis start time: 2006-02-06 12:34:59pm GMT 0 Analysis Timeline 0 Media (file) Timeline Media start time: 2006-02-06 12:41:02pm GMT Analysis time 00:23:34 Absolute time 2006-02-06 12:58:33pm GMT Media time 00:17:31 Time relation: Media time 00:06:46 = Analysis time 00:12:49 0 Analysis Timeline 0 Media (file) Timeline Analysis time 00:23:34 Media time 00:17:31 Annotation In terms of annotation ReplayTool supported only free-text annotations of single moments of viewing time. In the original ReplayTool these were simply time-stamped text lines in a textual log file and the DataGoggles component subsequently held them in a single common annotation table in the DataGoggles database. Starting from the annotation graph approach of Bird and Liberman (2001) we defined in the DRS ontology (data model) a rich model of annotation as illustrated in figure 6. In general, each annotation associates some subject with some content. At present in DRS the main subject for an annotation is a region, in particular a region of time on the timeline of a piece of media or an analysis. Related annotations (e.g. of the same media) are organized together as annotation sets, which appear as a form of media with DRS (e.g. in the project and analysis browsers, as in figure 1). Figure 6. Annotation of video segment 0:05:24-0:14:53 with “interesting stuff” (omitting details of Anchor_index and RelativeTime) DRS supports the creation and time-synchronised viewing of text transc