Labeling sensing data for mobility modeling

In urban environments, sensory data can be used to create personalized models for predicting efficient routes and schedules on a daily basis; and also at the city level to manage and plan more efficient transport, and schedule maintenance and events. Raw sensory data is typically collected as time-stamped sequences of records, with additional activity annotations by a human, but in machine learning, predictive models view data as labeled instances, and depend upon reliable labels for learning. In real-world sensor applications, human annotations are inherently sparse and noisy. This paper presents a methodology for preprocessing sensory data for predictive modeling in particular with respect to creating reliable labeled instances. We analyze real-world scenarios and the specific problems they entail, and experiment with different approaches, showing that a relatively simple framework can ensure quality labeled data for supervised learning. We conclude the study with recommendations to practitioners and a discussion of future challenges.

[1]  Alex Pentland,et al.  Reality mining: sensing complex social systems , 2006, Personal and Ubiquitous Computing.

[2]  Fakhri Karray,et al.  Multisensor data fusion: A review of the state-of-the-art , 2013, Inf. Fusion.

[3]  Gary M. Weiss,et al.  Activity recognition using cell phone accelerometers , 2011, SKDD.

[4]  Indre Zliobaite,et al.  Mobile Sensing Data for Urban Mobility Analysis: A Case Study in Preprocessing , 2014, EDBT/ICDT Workshops.

[5]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Deborah Estrin,et al.  Using mobile phones to determine transportation modes , 2010, TOSN.

[7]  Wazir Zada Khan,et al.  Mobile Phone Sensing Systems: A Survey , 2013, IEEE Communications Surveys & Tutorials.

[8]  Stephen Shaoyi Liao,et al.  Aggregating and Sampling Methods for Processing GPS Data Streams for Traffic State Estimation , 2013, IEEE Transactions on Intelligent Transportation Systems.

[9]  Andrea Passerini,et al.  Improving Activity Recognitionby Segmental Pattern Mining , 2014, IEEE Trans. Knowl. Data Eng..

[10]  Roberta Di Pace,et al.  Real-time Smoothing of Car-following Data Through Sensor-fusion Techniques , 2011 .

[11]  Alex Pentland,et al.  Social fMRI: Investigating and shaping social mechanisms in the real world , 2011, Pervasive Mob. Comput..

[12]  Jie Huang,et al.  Extensible Markov model , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[13]  Anna Monreale,et al.  WhereNext: a location predictor on trajectory pattern mining , 2009, KDD.

[14]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hugh F. Durrant-Whyte,et al.  Multisensor Data Fusion , 2016, Springer Handbook of Robotics, 2nd Ed..

[16]  Gerhard Thonhauser,et al.  Improving time series classification using Hidden Markov Models , 2012, 2012 12th International Conference on Hybrid Intelligent Systems (HIS).

[17]  Henry A. Kautz,et al.  Learning and inferring transportation routines , 2004, Artif. Intell..

[18]  Fanglin Chen,et al.  StudentLife: assessing mental health, academic performance and behavioral trends of college students using smartphones , 2014, UbiComp.

[19]  Federico Castanedo,et al.  A Review of Data Fusion Techniques , 2013, TheScientificWorldJournal.

[20]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[21]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[22]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[23]  Jesse Hoey,et al.  Sensor-Based Activity Recognition , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[24]  Indre Zliobaite,et al.  Optimizing regression models for data streams with missing values , 2014, Machine Learning.

[25]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[26]  David Barber,et al.  Bayesian reasoning and machine learning , 2012 .

[27]  Geoff Holmes,et al.  Evaluation methods and decision theory for classification of streaming data with temporal dependence , 2015, Machine Learning.

[28]  Diogo R. Ferreira,et al.  Preprocessing techniques for context recognition from accelerometer data , 2010, Personal and Ubiquitous Computing.

[29]  Mykola Pechenizkiy,et al.  Context-Aware Personal Route Recognition , 2011, Discovery Science.

[30]  Peter Widhalm,et al.  Transport mode detection with realistic Smartphone sensor data , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[31]  Jesse Read,et al.  Multi-label Classification with Meta-Labels , 2014, 2014 IEEE International Conference on Data Mining.

[32]  Charu C. Aggarwal,et al.  Managing and Mining Sensor Data , 2013, Springer US.