Bidirectional self-adaptive resampling in internet of things big data learning

This paper focuses on the problem of low learning algorithm accuracy caused by serious imbalance of big data in Internet of Things, and proposes a bidirectional self-adaptive resampling algorithm for imbalanced big data. Based on the sizes of data sets and imbalance ratios inputted by the user, the algorithm will process the data using a combination of oversampling for minority class and distribution sensitive undersampling for majority class. This paper proposes a new distribution-sensitive resampling algorithm. According to the distribution of samples, the majority and minority samples are divided into different categories, and different processing methods are adopted for the samples with different distribution characteristics The algorithm makes the sample set after resampling keep the same characteristics with the original data set as much as possible. The algorithm emphasizes the importance of boundary samples, that is, the samples at the boundary of majority classes and minority classes are more important than other samples for learning algorithm. The boundary minority samples will be copied, and the boundary majority samples will be reserved. Real-world application is introduced in the end, which shows that compared with the existing imbalanced data resampling algorithms, this algorithm improves the accuracy of learning algorithm, especially for the accuracy and recall rate of minority class.

[1]  Chumphol Bunkhumpornpat,et al.  DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique , 2011, Applied Intelligence.

[2]  Naixue Xiong,et al.  Comparative analysis of quality of service and memory usage for adaptive failure detectors in healthcare systems , 2009, IEEE Journal on Selected Areas in Communications.

[3]  Naixue Xiong,et al.  EPCBIR: An efficient and privacy-preserving content-based image retrieval scheme in cloud computing , 2017, Inf. Sci..

[4]  Naixue Xiong,et al.  A Kernel-Based Compressive Sensing Approach for Mobile Data Gathering in Wireless Sensor Network Systems , 2018, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[5]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[6]  Naixue Xiong,et al.  Dynamic power management in new architecture of wireless sensor networks , 2009, Int. J. Commun. Syst..

[7]  Naixue Xiong,et al.  A Distributed Efficient Flow Control Scheme for Multirate Multicast Networks , 2010, IEEE Transactions on Parallel and Distributed Systems.

[8]  Feng Jiang,et al.  Deep Learning Based Multi-Channel Intelligent Attack Detection for Data Security , 2020, IEEE Transactions on Sustainable Computing.

[9]  Chumphol Bunkhumpornpat,et al.  CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique , 2015, Int. J. Data Min. Bioinform..

[10]  Yang Xiao,et al.  Energy-efficient node scheduling algorithms for wireless sensor networks using Markov Random Field model , 2016, Inf. Sci..

[11]  Naixue Xiong,et al.  Post-cloud computing paradigms: a survey and comparison , 2017 .

[12]  Chumphol Bunkhumpornpat,et al.  Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem , 2009, PAKDD.

[13]  Akito Monden,et al.  MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction , 2018, IEEE Transactions on Software Engineering.

[14]  Naixue Xiong,et al.  Effective Alternating Direction Optimization Methods for Sparsity-Constrained Blind Image Deblurring , 2017, Sensors.

[15]  Lei Yu,et al.  Data Fusion-Based Multi-Object Tracking for Unconstrained Visual Sensor Networks , 2018, IEEE Access.

[16]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[17]  Naixue Xiong,et al.  A Pretreatment Workflow Scheduling Approach for Big Data Applications in Multicloud Environments , 2016, IEEE Transactions on Network and Service Management.

[18]  Xianzhi Wang,et al.  Trust architecture and reputation evaluation for internet of things , 2018, J. Ambient Intell. Humaniz. Comput..

[19]  Naixue Xiong,et al.  A Game-Based Localized Multi-Objective Topology Control Scheme in Heterogeneous Wireless Networks , 2017, IEEE Access.

[20]  Naixue Xiong,et al.  Construction Low Complexity and Low Delay CDS for Big Data Code Dissemination , 2018, Complex..

[21]  Hong Wen,et al.  The Rayleigh Fading Channel Prediction via Deep Learning , 2018, Wirel. Commun. Mob. Comput..

[22]  Naixue Xiong,et al.  Differentiated Data Aggregation Routing Scheme for Energy Conserving and Delay Sensitive Wireless Sensor Networks , 2018, Sensors.

[23]  Jinqiao Shi,et al.  Toward a Comprehensive Insight Into the Eclipse Attacks of Tor Hidden Services , 2019, IEEE Internet of Things Journal.

[24]  Naixue Xiong,et al.  Reconstruction of Undersampled Big Dynamic MRI Data Using Non-Convex Low-Rank and Sparsity Constraints , 2017, Sensors.

[25]  Naixue Xiong,et al.  Joint Mobile Data Collection and Wireless Energy Transfer in Wireless Rechargeable Sensor Networks , 2017, Sensors.

[26]  Zhihua Xia,et al.  Secure Image LBP Feature Extraction in Cloud-Based Smart Campus , 2018, IEEE Access.

[27]  Naixue Xiong,et al.  Nodes organization for channel assignment with topology preservation in multi-radio wireless mesh networks , 2012, Ad Hoc Networks.

[28]  Edison Tse,et al.  A Quantitative Measure of Identifiability , 1978, IEEE Transactions on Systems, Man, and Cybernetics.

[29]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[30]  Naixue Xiong,et al.  A novel dynamic network data replication scheme based on historical access record and proactive deletion , 2012, The Journal of Supercomputing.

[31]  Naixue Xiong,et al.  Dynamic propagation characteristics estimation and tracking based on an EM-EKF algorithm in time-variant MIMO channel , 2017, Inf. Sci..

[32]  Lior Rokach,et al.  Fast-CBUS: A fast clustering-based undersampling method for addressing the class imbalance problem , 2017, Neurocomputing.

[33]  Yan-Ping Zhang,et al.  Cluster-based majority under-sampling approaches for class imbalance learning , 2010, 2010 2nd IEEE International Conference on Information and Financial Engineering.

[34]  Naixue Xiong,et al.  Knowledge-aware Proactive Nodes Selection approach for energy management in Internet of Things , 2017, Future Gener. Comput. Syst..

[35]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[36]  Jie Wu,et al.  A Self-tuning Failure Detection Scheme for Cloud Computing Service , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[37]  Naixue Xiong,et al.  Connectivity and coverage maintenance in wireless sensor networks , 2010, The Journal of Supercomputing.

[38]  Naixue Xiong,et al.  Node Scheduling Strategies for Achieving Full-View Area Coverage in Camera Sensor Networks , 2017, Sensors.

[39]  Naixue Xiong,et al.  On the throughput-energy tradeoff for data transmission between cloud and mobile devices , 2014, Inf. Sci..

[40]  Yue Gao,et al.  A closer look at Eclipse attacks against Tor hidden services , 2017, 2017 IEEE International Conference on Communications (ICC).

[41]  Chongcheng Chen,et al.  Data quality analysis and cleaning strategy for wireless sensor networks , 2018, EURASIP J. Wirel. Commun. Netw..

[42]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[43]  Naixue Xiong,et al.  A novel self-tuning feedback controller for active queue management supporting TCP flows , 2010, Inf. Sci..

[44]  Naixue Xiong,et al.  CS-PLM: Compressive Sensing Data Gathering Algorithm Based on Packet Loss Matching in Sensor Networks , 2018, Wirel. Commun. Mob. Comput..

[45]  Naixue Xiong,et al.  Distributed k-connected fault-tolerant topology control algorithms with PSO in future autonomic sensor systems , 2012, Int. J. Sens. Networks.

[46]  Naixue Xiong,et al.  Non-Convex Sparse and Low-Rank Based Robust Subspace Segmentation for Data Mining , 2017, Sensors.