Automatic Adaptation of Mobile Activity Recognition Systems to New Sensors

We describe a method that allows a mobile activity recognition system trained on a set of sensors to learn to use an additional sensor in a fully unsupervised way and, if required, be significantly improved with minimal user input. We evaluate our method on data from previously published multimodal activity recognition data sets with some 180 different sensor combinations, 5 users executing and over 1900 individual instances from 17 different activity classes. Over all sensor combinations our system achieves 7.7% of the improvement possible when training the classifier with the additional sensors in a fully supervised way. For sensor combinations that are particularly useful the improvement increases to over 20.9%. When a single, randomly chosen labeled data point that includes the new sensor is provided for each class the values raise to 22.7% and 37.7% respectively. INTRODUCTION Today, state of the art approaches to activity and context recognition systems mostly assume fixed, narrowly defined system configurations dedicated to often equally narrowly defined tasks. Thus, for each application, the user needs to place specific sensors at certain well-defined locations in the environment and on his body. For a widespread use of such systems this approach is not realistic. As the user moves around, he is at times in highly instrumented environments, while at other times he stays in places with little or no intelligent infrastructure. Concerning on-body sensing, the user may carry a more or less random collection of sensor enabled devices (mobile phone, watch, headset etc.) on different, dynamically varying body locations (different pockets, wrist, bag). Thus, systems are needed that can take advantage of devices that just “happen” to be in the environment rather than assuming specific well defined configurations. Towards this goal we investigate how a mobile activity recognition system that has been trained on a set of sensors can be made to use a new source of information with no or only minimal user input. General Considerations Clearly, a method that would without any training use the information provided by an additional sensor and always guarPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. UbiComp ’11, Sep 17-Sep 21, 2011, Beijing, China. Copyright 2011 ACM 978-1-60558-843-8/10/09...$10.00. antee to achieve a performance improvement is not feasible. Instead in this paper we investigate an approach that sometimes provides an improvement, overall does “more good than harm”, and can efficiently use minimal user feedback to improve even further. Thus, we require that: 1. For sensors that provide useful information with respect to the classification problem at hand the method should achieve significant improvement in at least some cases. 2. The improvement should not come at the cost of performance degradation in other cases. This means that: • Averaged over all sensors the effect of the method should remain positive (at least slight overall performance increase). • No (or very few) single cases experience a significant (more than a few percent) performance decrease. 3. The method should easily leverage user feedback, quickly improving system performance with even a few inputs. In this paper we describe a method that allows a tree classifier trained on a set of sensors to be extended to use an additional sensor according to the above requirements. The method works in a fully unsupervised way but, if required, can be further improved with minimal user input. We evaluate our method on previously published multimodal activity recognition data sets: a bicycle repair data set [12] and the household activity data set from which we select the actions of opening different doors and drawers [13]. Related Work The need for large amount of annotated training data has been long recognized as a major issue for the practical deployment of activity recognition systems. As a consequence different approaches were studied to alleviate the problem. One line of work looks at unsupervised discovery of structure in sensor data (e.g. [11, 9]). Another are attempts to develop activity models from online information such as common sense data base, WordNet, or general “how to” resources (e.g. [17, 15, 5, 14]). Due to the nature of the data such work has a strong focus on interaction with objects. Beyond fully unsupervised approaches there has also been considerable interest in semi-supervised systems [4] that limit the amount of data needed for training (e.g. [7, 14]). In the field of active learning, membership query learning (MQL) [1], stream-based active learning (SAL) [2] and poolbased active learning (PAL) [10] are the most important learning paradigms. The approach considered in this work can basically be regarded as an SAL technique. In general, active learning can be seen as a special case of supervised learning as the classifier is trained with labeled patterns only. In contrast to that, semi-supervised learning uses both, labeled and unlabeled patterns. However, semi-supervised learning and active learning have the same goal, namely to build a classifier with good generalization properties by using a minimal number of labeled patterns. We combine both concepts in our work. Other examples for the combination of semi-supervised and active learning can be found in [6, 8], for instance. While related to our work on the methodological level, none of the above addresses the problem of integrating new sensors into an existing system. Closest to our work is [3] which uses other sensors and behavioral assumptions to train a new sensor. Unlike our work it does not consider adding a new sensor to an existing system. ALGORITHM Overview An overview of our method is depicted in figure 1. We start with a classifier that is trained on sensor 1 and is performing reasonably well, but still has some regions in its simple feature space where classification is not always correct. When a new sensor appears (sensor 2), besides further classifying with the old system, we start collecting a new data set that includes both sensors but, obviously, contains no ground truth labels. We then rely on structures found in the new data set to automatically extend the classifier on the new dimension introduced by the second sensor, possibly improving classification performance. This is described in more detail below. The extended classifier is then applied to utilize information from both sensors. This method can easily be extended to support multiple initial and new sensors. Extending the classifier As soon as a large enough data set with both sensors has been recorded, we can start improving the classifier by extending it in the new dimension introduced by the additional sensor. This process is shown in figure 1 on the right side. We start by applying an agglomerative, hierarchical clustering algorithm (Ward [16]) on the new, unlabeled data set. We then iteratively traverse the cluster tree starting from top until a maximal number of clusters is reached. For each cluster level we (1) examine the cluster structure together with the training data in two increasingly complex ways described below, in order to find class labels for the clusters, and (2) compute a plausibility value for the labeling (how well does it resemble original class distribution) and the potential gain when applied to the classifier. Based on the plausibility and gain values we select the optimal cluster level and utilize the clusters that have been labeled to extend and refine the classifier on the new dimension where possible. Cluster labeling: inferring Given a cluster membership for each instance in the new data set we project the data onto the original feature space (sensor 1) and construct a decision tree based on cluster membership of the projected data. Figure 2(a) depicts an example of clustered data in the extended feature space and the Trained classifier is running on sensor 1 Collect data from sensors 1 & 2 Enough data collected? Improve classifier New Sensor appeared? Start

[1]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[2]  Chris D. Nugent,et al.  Ontology-based activity recognition in intelligent pervasive environments , 2009, Int. J. Web Inf. Syst..

[3]  Irfan A. Essa,et al.  Discovering Characteristic Actions from On-Body Sensor Data , 2006, 2006 10th IEEE International Symposium on Wearable Computers.

[4]  Alberto Calatroni,et al.  A methodology to use unknown new sensors for activity recognition by leveraging sporadic interactions with primitive sensors and behavioral assumptions , 2010 .

[5]  Rong Jin,et al.  Semisupervised SVM batch mode active learning with applications to image retrieval , 2009, TOIS.

[6]  Bernt Schiele,et al.  Exploring semi-supervised and active learning for activity recognition , 2008, 2008 12th IEEE International Symposium on Wearable Computers.

[7]  D. Angluin Queries and Concept Learning , 1988 .

[8]  David A. Cohn,et al.  Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[9]  Aristidis Likas,et al.  Semi-supervised and active learning with the probabilistic RBF classifier , 2008, Neurocomputing.

[10]  Bernt Schiele,et al.  Unsupervised Discovery of Structure in Activity Data Using Multiple Eigenspaces , 2006, LoCA.

[11]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[12]  Donghai Guan,et al.  Activity Recognition Based on Semi-supervised Learning , 2007, 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA 2007).

[13]  Matthai Philipose,et al.  Building Reliable Activity Models Using Hierarchical Shrinkage and Mined Ontology , 2006, Pervasive.

[14]  Paul Lukowicz,et al.  OPPORTUNITY: Towards opportunistic activity and context recognition systems , 2009, 2009 IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks & Workshops.

[15]  Matthai Philipose,et al.  Unsupervised Activity Recognition Using Automatically Mined Common Sense , 2005, AAAI.

[16]  Paul Lukowicz,et al.  Using ultrasonic hand tracking to augment motion analysis based recognition of manipulative gestures , 2005, Ninth IEEE International Symposium on Wearable Computers (ISWC'05).