ZOE: A Cloud-less Dialog-enabled Continuous Sensing Wearable Exploiting Heterogeneous Computation

The wearable revolution, as a mass-market phenomenon, has finally arrived. As a result, the question of how wearables should evolve over the next 5 to 10 years is assuming an increasing level of societal and commercial importance. A range of open design and system questions are emerging, for instance: How can wearables shift from being largely health and fitness focused to tracking a wider range of life events? What will become the dominant methods through which users interact with wearables and consume the data collected? Are wearables destined to be cloud and/or smartphone dependent for their operation? Towards building the critical mass of understanding and experience necessary to tackle such questions, we have designed and implemented ZOE - a match-box sized (49g) collar- or lapel-worn sensor that pushes the boundary of wearables in an important set of new directions. First, ZOE aims to perform multiple deep sensor inferences that span key aspects of everyday life (viz. personal, social and place information) on continuously sensed data; while also offering this data not only within conventional analytics but also through a speech dialog system that is able to answer impromptu casual questions from users. (Am I more stressed this week than normal?) Crucially, and unlike other rich-sensing or dialog supporting wearables, ZOE achieves this without cloud or smartphone support - this has important side-effects for privacy since all user information can remain on the device. Second, ZOE incorporates the latest innovations in system-on-a-chip technology together with a custom daughter-board to realize a three-tier low-power processor hierarchy. We pair this hardware design with software techniques that manage system latency while still allowing ZOE to remain energy efficient (with a typical lifespan of 30 hours), despite its high sensing workload, small form-factor, and need to remain responsive to user dialog requests.

[1]  Pan Hu,et al.  iShadow: design of a wearable, real-time mobile gaze tracker , 2014, MobiSys.

[2]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[3]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[4]  Alain Biem,et al.  A discriminative filter bank model for speech recognition , 1995, EUROSPEECH.

[5]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[6]  Zhigang Liu,et al.  The Jigsaw continuous sensing engine for mobile phone applications , 2010, SenSys '10.

[7]  James D. Herbsleb,et al.  Simplifying cyber foraging for mobile devices , 2007, MobiSys '07.

[8]  Suman Nath,et al.  ACE: Exploiting Correlation for Energy-Efficient and Continuous Context Sensing , 2012, IEEE Transactions on Mobile Computing.

[9]  James A. Landay,et al.  The Mobile Sensing Platform: An Embedded Activity Recognition System , 2008, IEEE Pervasive Computing.

[10]  Jie Liu,et al.  Enabling energy efficient continuous sensing on mobile phones with LittleRock , 2010, IPSN '10.

[11]  Jie Liu,et al.  LittleRock: Enabling Energy-Efficient Continuous Sensing on Mobile Phones , 2011, IEEE Pervasive Computing.

[12]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[13]  David W. Mizell,et al.  Using gravity to estimate accelerometer orientation , 2003, Seventh IEEE International Symposium on Wearable Computers, 2003. Proceedings..

[14]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[15]  Gaetano Borriello,et al.  BALANCE: towards a usable pervasive wellness application with accurate activity inference , 2009, HotMobile '09.

[16]  Deborah Estrin,et al.  SensLoc: sensing everyday places and paths using less energy , 2010, SenSys '10.

[17]  Andrew T. Campbell,et al.  Bewell: A smartphone application to monitor, model and promote wellbeing , 2011, PervasiveHealth 2011.

[18]  B E Ainsworth,et al.  Compendium of physical activities: an update of activity codes and MET intensities. , 2000, Medicine and science in sports and exercise.

[19]  Hojung Cha,et al.  Piggyback CrowdSensing (PCS): energy efficient crowdsourcing of mobile sensor data by exploiting smartphone app opportunities , 2013, SenSys '13.

[20]  John Krumm,et al.  Placer: semantic place labels from diary data , 2013, UbiComp.

[21]  Cecilia Mascolo,et al.  DSP.Ear: leveraging co-processor support for continuous audio sensing on smartphones , 2014, SenSys.

[22]  Insik Shin,et al.  SymPhoney: a coordinated sensing flow execution engine for concurrent mobile sensing applications , 2012, SenSys '12.

[23]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[24]  Guobin Shen,et al.  sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices -- a feasibility study , 2013, HotMobile '13.

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  Shahram Izadi,et al.  SenseCam: A Retrospective Memory Aid , 2006, UbiComp.

[27]  Mark D. Corner,et al.  Turducken: hierarchical power management for mobile devices , 2005, MobiSys '05.

[28]  Hojung Cha,et al.  LifeMap: A Smartphone-Based Context Provider for Location-Based Services , 2011, IEEE Pervasive Computing.

[29]  Jie Liu,et al.  Local business ambience characterization through mobile audio sensing , 2014, WWW.

[30]  Eric Horvitz,et al.  LOCADIO: inferring motion and location from Wi-Fi signal strengths , 2004, The First Annual International Conference on Mobile and Ubiquitous Systems: Networking and Services, 2004. MOBIQUITOUS 2004..

[31]  John D. Owens,et al.  Three-layer optimizations for fast GMM computations on GPU-like parallel processors , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[32]  Inseok Hwang,et al.  SocioPhone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion , 2013, MobiSys '13.

[33]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[34]  Deborah Estrin,et al.  Using mobile phones to determine transportation modes , 2010, TOSN.

[35]  Blake Hannaford,et al.  A Hybrid Discriminative/Generative Approach for Modeling Human Activities , 2005, IJCAI.

[36]  Mary Baker,et al.  The sound of silence , 2013, SenSys '13.

[37]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[38]  Dirk Heylen,et al.  Multi-modal analysis of small-group conversational dynamics , 2012 .

[39]  Sunny Consolvo,et al.  Learning and Recognizing the Places We Go , 2005, UbiComp.

[40]  Mi Zhang,et al.  BodyBeat: a mobile system for sensing non-speech body sounds , 2014, MobiSys.

[41]  Mahadev Satyanarayanan,et al.  Towards wearable cognitive assistance , 2014, MobiSys.

[42]  Daniel Gatica-Perez,et al.  StressSense: detecting stress in unconstrained acoustic environments using smartphones , 2012, UbiComp.

[43]  Georg Heigold,et al.  Small-footprint keyword spotting using deep neural networks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Zhen Wang,et al.  Draining our glass: an energy and heat characterization of Google Glass , 2014, APSys.

[45]  Feng Zhao,et al.  Energy-accuracy trade-off for continuous mobile device location , 2010, MobiSys '10.

[46]  Deborah Estrin,et al.  PEIR, the personal environmental impact report, as a platform for participatory sensing systems research , 2009, MobiSys '09.

[47]  Prabal Dutta,et al.  Opo: a wearable sensor for capturing high-fidelity face-to-face interactions , 2014, SenSys.

[48]  Roy Want,et al.  The Personal Server: Changing the Way We Think about Ubiquitous Computing , 2002, UbiComp.

[49]  Mattias Heldner,et al.  Pause and gap length in face-to-face interaction , 2009, INTERSPEECH.

[50]  Jun Li,et al.  Crowd++: unsupervised speaker count with smartphones , 2013, UbiComp.

[51]  Sasu Tarkoma,et al.  Accelerometer-based transportation mode detection on smartphones , 2013, SenSys '13.