论文信息 - When less is more powerful: Shapley value attributed ablation with augmented learning for practical time series sensor data classification

When less is more powerful: Shapley value attributed ablation with augmented learning for practical time series sensor data classification

Time series sensor data classification tasks often suffer from training data scarcity issue due to the expenses associated with the expert-intervened annotation efforts. For example, Electrocardiogram (ECG) data classification for cardio-vascular disease (CVD) detection requires expensive labeling procedures with the help of cardiologists. Current state-of-the-art algorithms like deep learning models have shown outstanding performance under the general requirement of availability of large set of training examples. In this paper, we propose Shapley Attributed Ablation with Augmented Learning: ShapAAL, which demonstrates that deep learning algorithm with suitably selected subset of the seen examples or ablating the unimportant ones from the given limited training dataset can ensure consistently better classification performance under augmented training. In ShapAAL, additive perturbed training augments the input space to compensate the scarcity in training examples using Residual Network (ResNet) architecture through perturbation-induced inputs, while Shapley attribution seeks the subset from the augmented training space for better learnability with the goal of better general predictive performance, thanks to the “efficiency” and “null player” axioms of transferable utility games upon which Shapley value game is formulated. In ShapAAL, the subset of training examples that contribute positively to a supervised learning setup is derived from the notion of coalition games using Shapley values associated with each of the given inputs’ contribution into the model prediction. ShapAAL is a novel push-pull deep architecture where the subset selection through Shapley value attribution pushes the model to lower dimension while augmented training augments the learning capability of the model over unseen data. We perform ablation study to provide the empirical evidence of our claim and we show that proposed ShapAAL method consistently outperforms the current baselines and state-of-the-art algorithms for time series sensor data classification tasks from publicly available UCR time series archive that includes different practical important problems like detection of CVDs from ECG data.

A. Jara | Leandro Marín | A. Ukil

[1] S. Mukhopadhyay,et al. AFSense-ECG: Atrial Fibrillation Condition Sensing From Single Lead Electrocardiogram (ECG) Signals , 2022, IEEE Sensors Journal.

[2] L. Abualigah,et al. Fusion of modern meta-heuristic optimization methods using arithmetic optimization algorithm for global optimization tasks , 2022, Soft Computing.

[3] L. Abualigah,et al. Hybrid arithmetic optimization algorithm with hunger games search for global optimization , 2022, Multimedia Tools and Applications.

[4] L. Abualigah,et al. Hybrid Aquila optimizer with arithmetic optimization algorithm for global optimization tasks , 2022, Soft Computing.

[5] Péter Bayer,et al. The Shapley Value in Machine Learning , 2022, IJCAI.

[6] F. Najafi,et al. A full pipeline of diagnosis and prognosis the risk of chronic diseases using deep learning and Shapley values: The Ravansar county anthropometric cohort study , 2022, PloS one.

[7] N. Mittal,et al. Performance evaluation of Non-Uniform circular antenna array using integrated harmony search with Differential Evolution based Naked Mole Rat algorithm , 2021, Expert Syst. Appl..

[8] Sara Hooker,et al. Randomness In Neural Network Training: Characterizing The Impact of Tooling , 2021, MLSys.

[9] Abeer B. Ahmed,et al. Improved Chan algorithm based optimum UWB sensor node localization using hybrid particle swarm optimization , 2022, IEEE Access.

[10] Shubham Mahajan,et al. Hybrid method to supervise feature selection using signal processing and complex algebra techniques , 2021, Multimedia Tools and Applications.

[11] Mohamed Abouhawwash,et al. Multi-population and dynamic-iterative cuckoo search algorithm for linear antenna array synthesis , 2021, Appl. Soft Comput..

[12] Geoffrey E. Hinton,et al. Deep learning for AI , 2021, Commun. ACM.

[13] Daniel Fryer,et al. Shapley values for feature selection: The good, the bad, and the axioms , 2021, IEEE Access.

[14] Ahmed K. Farahat,et al. Deep Time Series Models for Scarce Data , 2021, Neurocomputing.

[15] Brian Kenji Iwana,et al. An empirical survey of data augmentation for time series classification with neural networks , 2020, PloS one.

[16] Mubarak Shah,et al. Norm-Preservation: Why Residual Networks Can Become Extremely Deep? , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17] Sathish Kumar Jayapal,et al. Global Burden of Cardiovascular Diseases and Risk Factors, 1990–2019 , 2020, Journal of the American College of Cardiology.

[18] Talal Rahwan,et al. The Shapley value for a fair division of group discounts for coordinating cooling loads , 2020, PloS one.

[19] Geoffrey I. Webb,et al. TS-CHIEF: a scalable and accurate forest algorithm for time series classification , 2019, Data Mining and Knowledge Discovery.

[20] Nitesh V. Chawla,et al. Deep Prototypical Networks for Imbalanced Time Series Classification under Data Scarcity , 2019, CIKM.

[21] Arijit Ukil,et al. Knowledge-Driven Analytics and Systems Impacting Human Quality of Life , 2019, CIKM.

[22] S. Du,et al. Towards Understanding the Importance of Shortcut Connections in Residual Networks , 2019, NeurIPS.

[23] Aleksander Madry,et al. Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[24] Nick S. Jones,et al. catch22: CAnonical Time-series CHaracteristics , 2019, Data Mining and Knowledge Discovery.

[25] Geoffrey I. Webb,et al. Proximity Forest: an effective and scalable distance-based classifier for time series , 2018, Data Mining and Knowledge Discovery.

[26] Yann LeCun,et al. The Power and Limits of Deep Learning , 2018, Research-Technology Management.

[27] Aarti S. Dalal,et al. Can smartphone wireless ECGs be used to accurately assess ECG intervals in pediatrics? A comparison of mobile health monitoring to standard 12-lead ECG , 2018, PloS one.

[28] Eamonn J. Keogh,et al. The great time series classification bake off: a review and experimental evaluation of recent algorithmic advances , 2016, Data Mining and Knowledge Discovery.

[29] R. Cooper,et al. Premature Mortality from Cardiovascular Disease in the Americas – Will the Goal of a Decline of “25% by 2025” be Met? , 2015, PloS one.

[30] Jason Lines,et al. Time-Series Classification with COTE: The Collective of Transformation-Based Ensembles , 2015, IEEE Transactions on Knowledge and Data Engineering.

[31] Jason Lines,et al. Time series classification with ensembles of elastic distance measures , 2015, Data Mining and Knowledge Discovery.

[32] George C. Runger,et al. A time series forest for classification and feature extraction , 2013, Inf. Sci..