Learning from Imbalanced Multiclass Sequential Data Streams Using Dynamically Weighted Conditional Random Fields

The present study introduces a method for improving the classification performance of imbalanced multiclass data streams from wireless body worn sensors. Data imbalance is an inherent problem in activity recognition caused by the irregular time distribution of activities, which are sequential and dependent on previous movements. We use conditional random fields (CRF), a graphical model for structured classification, to take advantage of dependencies between activities in a sequence. However, CRFs do not consider the negative effects of class imbalance during training. We propose a class-wise dynamically weighted CRF (dWCRF) where weights are automatically determined during training by maximizing the expected overall F-score. Our results based on three case studies from a healthcare application using a batteryless body worn sensor, demonstrate that our method, in general, improves overall and minority class F-score when compared to other CRF based classifiers and achieves similar or better overall and class-wise performance when compared to SVM based classifiers under conditions of limited training data. We also confirm the performance of our approach using an additional battery powered body worn sensor dataset, achieving similar results in cases of high class imbalance.

[1]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields , 2010, Found. Trends Mach. Learn..

[2]  Alanson P. Sample,et al.  Design of an RFID-Based Battery-Free Programmable Sensing Platform , 2008, IEEE Transactions on Instrumentation and Measurement.

[3]  M. Maloof Learning When Data Sets are Imbalanced and When Costs are Unequal and Unknown , 2003 .

[4]  Roberto Luis Shinmoto Torres,et al.  Automated activity recognition and monitoring of elderly using wireless sensors: Research challenges , 2013, 5th IEEE International Workshop on Advances in Sensors and Interfaces IWASI.

[5]  Georgi Georgiev,et al.  Efficient $$F$$F measure maximization via weighted maximum likelihood , 2014, Machine Learning.

[6]  Noah A. Smith,et al.  Softmax-Margin Training for Structured Log-Linear Models , 2010 .

[7]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[8]  Shasha Wang,et al.  Cost-sensitive Bayesian network classifiers , 2014, Pattern Recognit. Lett..

[9]  Clemens Becker,et al.  Epidemiology of falls in residential aged care: analysis of more than 70,000 falls from residents of bavarian nursing homes. , 2012, Journal of the American Medical Directors Association.

[10]  Damith Chinthana Ranasinghe,et al.  Framework for preventing falls in acute hospitals using passive sensor enabled radio frequency identification technology , 2012, 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[11]  Damith Chinthana Ranasinghe,et al.  Evaluation of Wearable Sensor Tag Data Segmentation Approaches for Real Time Activity Classification in Elderly , 2013, MobiQuitous.

[12]  D C Ranasinghe,et al.  Low cost and batteryless sensor-enabled radio frequency identification tag based approaches to identify patient bed entry and exit posture transitions. , 2014, Gait & posture.

[13]  Gang Chen,et al.  Dynamic class imbalance learning for incremental LPSVM , 2013, Neural Networks.

[14]  Noah A. Smith,et al.  Softmax-Margin CRFs: Training Log-Linear Models with Cost Functions , 2010, NAACL.

[15]  Yi-Min Huang,et al.  Weighted support vector machine for classification with uneven training class sizes , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[16]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[17]  Damith C. Ranasinghe,et al.  Wearable Quarter-Wave Folded Microstrip Antenna for Passive UHF RFID Applications , 2013 .

[18]  João Gama,et al.  Iterative Bayes , 2000, Intell. Data Anal..

[19]  Josef Kittler,et al.  Inverse random under sampling for class imbalance problem and its application to multi-label classification , 2012, Pattern Recognit..

[20]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..

[21]  Hongnian Yu,et al.  Elderly activities recognition and classification for applications in assisted living , 2013, Expert Syst. Appl..

[22]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[23]  A K Bourke,et al.  Activity classification using a single chest mounted tri-axial accelerometer. , 2011, Medical engineering & physics.

[24]  Jing-Hao Xue,et al.  Why Does Rebalancing Class-Unbalanced Data Improve AUC for Linear Discriminant Analysis? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Paul Lukowicz,et al.  Collecting complex activity datasets in highly rich networked sensor environments , 2010, 2010 Seventh International Conference on Networked Sensing Systems (INSS).

[26]  Diane J. Cook,et al.  Handling Imbalanced and Overlapping Classes in Smart Environments Prompting Dataset , 2014 .

[27]  Diane J. Cook,et al.  Activity recognition on streaming sensor data , 2014, Pervasive Mob. Comput..

[28]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[29]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[30]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[31]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[32]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[33]  Michel Verleysen,et al.  Weighted Conditional Random Fields for Supervised Interpatient Heartbeat Classification , 2012, IEEE Transactions on Biomedical Engineering.

[34]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[35]  Zhi-Hua Zhou,et al.  ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[36]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[37]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[38]  Qinfeng Shi,et al.  Sensor enabled wearable RFID technology for mitigating the risk of falls near beds , 2013, 2013 IEEE International Conference on RFID (RFID).

[39]  Kamiar Aminian,et al.  Ambulatory system for human motion analysis using a kinematic sensor: monitoring of daily physical activity in the elderly , 2003, IEEE Transactions on Biomedical Engineering.

[40]  P. Topo Technology Studies to Meet the Needs of People With Dementia and Their Caregivers , 2009 .

[41]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[42]  Joachim M. Buhmann,et al.  Entropy and Margin Maximization for Structured Output Learning , 2010, ECML/PKDD.

[43]  Robert B. Fisher,et al.  Classifying imbalanced data sets using similarity based hierarchical decomposition , 2015, Pattern Recognit..

[44]  Matthias Ehrgott,et al.  Multicriteria Optimization , 2005 .