Applying Machine Learning for Sensor Data Analysis in Interactive Systems

With the widespread proliferation of (miniaturized) sensing facilities and the massive growth and popularity of the field of machine learning (ML) research, new frontiers in automated sensor data analysis have been explored that lead to paradigm shifts in many application domains. In fact, many practitioners now employ and rely more and more on ML methods as integral part of their sensor data analysis workflows—thereby not necessarily being ML experts or having an interest in becoming one. The availability of toolkits that can readily be used by practitioners has led to immense popularity and widespread adoption and, in essence, pragmatic use of ML methods. ML having become mainstream helps pushing the core agenda of practitioners, yet it comes with the danger of misusing methods and as such running the risk of leading to misguiding if not flawed results. Based on years of observations in the ubiquitous and interactive computing domain that extensively relies on sensors and automated sensor data analysis, and on having taught and worked with numerous students in the field, in this article I advocate a considerate use of ML methods by practitioners, i.e., non-ML experts, and elaborate on pitfalls of an overly pragmatic use of ML techniques. The article not only identifies and illustrates the most common issues, it also offers ways and practical guidelines to avoid these, which shall help practitioners to benefit from employing ML in their core research domains and applications.

[1]  G. Abowd,et al.  IMUTube , 2020 .

[2]  Gregory D. Abowd,et al.  On specialized window lengths and detector based human activity recognition , 2018, UbiComp.

[3]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[7]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.

[8]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Eric C. Larson,et al.  HydroSense: infrastructure-mediated single-point sensing of whole-home water activity , 2009, UbiComp.

[10]  Gregory D. Abowd,et al.  Handling annotation uncertainty in human activity recognition , 2019, UbiComp.

[11]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[12]  Gregory D. Abowd,et al.  IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[13]  Lionel M. Ni,et al.  Generalizing from a Few Examples , 2020, ACM Comput. Surv..

[14]  Peter Andras,et al.  On preserving statistical characteristics of accelerometry data using their empirical cumulative distribution , 2013, ISWC '13.

[15]  J. Jacko,et al.  The human-computer interaction handbook: fundamentals, evolving technologies and emerging applications , 2002 .

[16]  Paul Lukowicz,et al.  Performance metrics for activity recognition , 2011, TIST.

[17]  A. J. Bernheim Brush Ubiquitous Computing Field Studies , 2010, Ubicomp 2010.

[18]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[19]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[20]  Gregory D. Abowd,et al.  At the Flick of a Switch: Detecting and Classifying Unique Electrical Events on the Residential Power Line (Nominated for the Best Paper Award) , 2007, UbiComp.

[21]  Christophe Biernacki,et al.  Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models , 2003, Comput. Stat. Data Anal..

[22]  Thorsten Dickhaus,et al.  Simultaneous Statistical Inference , 2014, Springer Berlin Heidelberg.

[23]  Thomas Plötz,et al.  Ensembles of Deep LSTM Learners for Activity Recognition using Wearables , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[24]  Frederick Jelinek,et al.  Some of my Best Friends are Linguists , 2005, Lang. Resour. Evaluation.

[25]  Claes Wohlin,et al.  Using Students as Subjects—A Comparative Study of Students and Professionals in Lead-Time Impact Assessment , 2000, Empirical Software Engineering.

[26]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[27]  D. Brillinger Time series - data analysis and theory , 1981, Classics in applied mathematics.

[28]  Anil K. Jain,et al.  39 Dimensionality and sample size considerations in pattern recognition practice , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[29]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[30]  Abraham H. Maslow,et al.  The psychology of science: a reconnaissance , 1966 .

[31]  Philip Sedgwick,et al.  Understanding the Hawthorne effect , 2015, BMJ : British Medical Journal.

[32]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[33]  Gaetano Borriello,et al.  Location Systems for Ubiquitous Computing , 2001, Computer.

[34]  Michael I. Jordan,et al.  Real-Time Machine Learning: The Missing Pieces , 2017, HotOS.

[35]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[36]  Ana M. Bernardos,et al.  Activity logging using lightweight classification techniques in mobile devices , 2012, Personal and Ubiquitous Computing.

[37]  H. B. Barlow,et al.  Unsupervised Learning , 1989, Neural Computation.

[38]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[39]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[40]  John Ignatius Griffin,et al.  Statistics; methods and applications , 1963 .

[41]  Alice Zheng,et al.  Evaluating Machine Learning Models , 2019, Machine Learning in the AWS Cloud.

[42]  Sendhil Mullainathan,et al.  Machine Learning: An Applied Econometric Approach , 2017, Journal of Economic Perspectives.

[43]  Jesse Hoey,et al.  Sensor-Based Activity Recognition , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[44]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[45]  Laveen N. Kanal,et al.  Classification, Pattern Recognition and Reduction of Dimensionality , 1982, Handbook of Statistics.

[46]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[47]  Gregory D. Abowd,et al.  What next, ubicomp?: celebrating an intellectual disappearing act , 2012, UbiComp.

[48]  John Krumm,et al.  Placer: semantic place labels from diary data , 2013, UbiComp.

[49]  Daniel Roggen,et al.  Automatic correction of annotation boundaries in activity datasets by class separation maximization , 2013, UbiComp.

[50]  Daniel Gatica-Perez,et al.  Discovering routines from large-scale human locations using probabilistic topic models , 2011, TIST.

[51]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[52]  Nils Y. Hammerla,et al.  Large Scale Population Assessment of Physical Activity Using Wrist Worn Accelerometers: The UK Biobank Study , 2017, PloS one.

[53]  Thomas Plötz,et al.  Let's (not) stick together: pairwise similarity biases cross-validation in activity recognition , 2015, UbiComp.

[54]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[55]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[56]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[57]  David V. Anderson,et al.  On the role of features in human activity recognition , 2019, UbiComp.

[58]  Niall Twomey,et al.  A Comprehensive Study of Activity Recognition Using Accelerometers , 2018, Informatics.

[59]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[60]  Nikolaos Doulamis,et al.  Deep Learning for Computer Vision: A Brief Review , 2018, Comput. Intell. Neurosci..

[61]  Richard Walker,et al.  PD Disease State Assessment in Naturalistic Environments Using Deep Learning , 2015, AAAI.

[62]  John Krumm,et al.  Ubiquitous Computing Fundamentals , 2009 .

[63]  Nicholas D. Lane,et al.  DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices , 2016, 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[64]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[65]  Paolo Missier,et al.  Bootstrapping Personalised Human Activity Recognition Models Using Online Active Learning , 2015, 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing.

[66]  Eric C. Larson,et al.  GasSense: Appliance-Level, Single-Point Sensing of Gas Activity in the Home , 2010, Pervasive.

[67]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[68]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[69]  Kin K. Leung,et al.  A Survey of Indoor Localization Systems and Technologies , 2017, IEEE Communications Surveys & Tutorials.

[70]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[71]  Yu Guan,et al.  Deep Learning for Human Activity Recognition in Mobile Computing , 2018, Computer.

[72]  Michael J. Brusco,et al.  Examining the effect of initialization strategies on the performance of Gaussian mixture modeling , 2015, Behavior Research Methods.