Context Annotation for a Live Life Recording

We propose to use wearable sensors and computer systems to generate personal contextual annotations in audio-visual recordings of a person’s life. In this paper we argue that such annotations are essential and effective to allow retrieval of relevant information from large audio-visual databases. The paper summarizes work on automatically annotating meeting recordings, extracting context from body-worn acceleration sensors alone, and combining context from three different sensors (acceleration, audio, location) for estimating the interruptability of the user. These first experimental results indicate, that it is possible to automatically find useful annotations for a lifetime’s recording and discusses what can be reached with certain sensors and sensor configurations. INTRODUCTION Interestingly, about 500 Tera Bytes of storage are sufficient to record all audio-visual information a person perceives during an entire lifespan. This amount of storage will be available even for an average person in the not so distant future. A wearable recording and computing device therefore might be used to ’remember’ any talk, any discussion, or any environment the person saw. For annotating an entire life-time it is important that the recording device with the attached sensors can be worn in any situation by the user. Although it is possible to augment certain environments, this will not be sufficient. Furthermore wearable computers allow a truly personal audio-visual record of the environment of a person in any environment. Using a hator glassmounted camera and microphones attached to the chest assuming a lifespan of 100 years, 24h recording per day, and 10 MB per min recording results in approximately 500 TB or shoulders of the person enable a recording from a first-person perspective. Today however, the usefulness of such data is limited by the lack of adequate methods for accessing and indexing large audio-visual databases. While humans tend to remember events by associating them with personal experience and contextual information, today’s archiving systems are based solely on date, time, location and simple content classification. As a consequence even in a recording of a simple event sequence such as a short meeting, it is very difficult for the user to efficiently retrieve relevant events. Thus for example the user might remember a particular part of the discussion as being a heated exchange conducted during a short, unscheduled coffee break. However he is unlikely to remember the exact time of this discussion which is typically required today to retrieve in audio-visual recordings. In this paper we propose to use wearable sensors in order to enhance the recorded data with contextual, personal information to facilitate user friendly retrieval. Sensors, such as accelerometers and biometric sensors, can enhance the recording with information on the user’s context, activity and physical state. That sensor information can be used to annotate and structure the data stream for later associative access. This paper summarizes three papers [1, 2, 3] in which we have worked towards extracting such context and specifically using it for retrieving information. The second section of this paper summarizes [1], in which context annotations from audio and acceleration are used to annotate meeting recordings. The third section introduces [2], in which context from audio, acceleration and location is used to mediate notifications to the user. The fifth section examines in detail how much information can be extracted from acceleration sensors alone. A discussion of these three in the context of life recording concludes the paper. RELATED WORK Recently, the idea of recording an entire lifetime of information has received great attention. The UK Computing Research Committee formulated as part of the Grand Challenges Initiative a number of issues arising from recording a lifetime [4]. Microsoft’s MyLifeBits [5] Presentation Standing One speaker Discussion Sitting Two speakers Figure 1: Retrieval Application for a Meeting, presentation and discussion parts highlighted, graphs from top to bottom: Audio signal, speaker recognition first speaker, speaker recognition second speaker, Me vs. the World , Activity Recognition project tries to collect and store any digital information about a person, but leaves the annotation to the user. Finally, DARPA’s LifeLog initiative [6] invites researchers to investigate the issues of data collection, automatic annotation, and retrieval. The idea of computer-based support for human memory and retrieval is not new. Lamming and Flynn for example point out the importance of context as a retrieval key [7] but only used cues like location, phone calls, and interaction between different PDAs. The conference assistant [8] supports the organization of a conference visit, annotation of talks and discussions, and retrieval of information after the visit. Again, the cooperation and communication between different wearables and the environment is an essential part of the system. Rhodes proposed the text-based remembrance agent [9] to help people to retrieve notes they previously made on their computer. For speech recognition the automatic speech transcription of meetings is an extremely challenging task due to overlapping and spontaneous speech, large vocabularies, and difficult background noise [10, 11]. Often, multiple microphones are used such as close-talking, table microphones, and microphone arrays. The SpeechCorder project [12] for example aims to retrieve information from roughly transcribed speech recorded during a meeting. Summarization is another topic, which is currently under investigation in speech recognition [13] as well as video processing. We strongly believe, however, that summarization is not enough to allow effective and in particular associative access to the recorded data. Richter and Le [14] propose a device which will use predefined commands to record conversations and take low-resolution photos. At the university of Tokyo [15] researchers investigate the possibilities to record subjective experience by recording audio, video, as well as heartbeat or skin conductance so as to recall one’s experience from various aspects. StartleCam [16] is a wearable device which tries to mimic the wearer’s selective memory. The WearCam idea of Mann [17] is also related to the idea of constantly recording one’s visual environment. WEARABLE SENSING TO ANNOTATE MEETING RECORDINGS In order to give first experimental evidence that context annotations are useful, we recorded meetings and annotated them using audio and acceleration sensors [1]. In particular, we extracted information such as walking, standing, and sitting from the acceleration sensors, and speaker changes from the audio. Thus we facilitate the associative retrieval of the information in the meetings. Looking at the meeting scenario we have identified four classes of relevant annotations. Those are different meeting phases, flow of discussion, user activity and reactions, and interactions between the participants. The meeting phase includes the time of presentations, breaks, and when somebody is coming or leaving during the meeting. The flow of discussion annotations attach speaker identity and changes to the audio stream, and indicate the level of intensity of discussion. It can also help to differentiate single person presentations, interactive questions and answers, and heated debate. User activity and reactions indicate user’s level of interest, focus of attention, and agreement or disagreement with particular issues and comments. By tracking the interaction of the user with other participants personal discussions can be differentiated from general discussions.

[1]  Steve Mann Smart clothing: The wearable computer and wearcam , 2005, Personal Technologies.

[2]  Jennifer Healey,et al.  StartleCam: a cybernetic wearable camera , 1998, Digest of Papers. Second International Symposium on Wearable Computers (Cat. No.98EX215).

[3]  Paul Lukowicz,et al.  Wearable Sensing to Annotate Meeting Recordings , 2002, SEMWEB.

[4]  Bradley J. Rhodes,et al.  The wearable remembrance agent: A system for augmented memory , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[5]  Albrecht Schmidt,et al.  Multi-sensor Activity Context Detection for Wearable Computing , 2003, EUSAI.

[6]  Gregory D. Abowd,et al.  The Conference Assistant: combining context-awareness with wearable computing , 1999, Digest of Papers. Third International Symposium on Wearable Computers.

[7]  M. Lamming,et al.  "Forget-me-not" Intimate Computing in Support of Human Memory , 1994 .

[8]  Don Kimber,et al.  Acoustic Segmentation for Audio Browsers , 1997 .