A MapReduce-Based Ensemble Learning Method with Multiple Classifier Types and Diversity for Condition-Based Maintenance with Concept Drifts

Condition-based maintenance in Industry 4.0 collects a huge amount of production datastreams continuously from the Internet of Things attached to machines to forecast the time when to maintain machines or replace components. However, as conditions of machines change dynamically with time owing to machine aging, malfunction, or replacement, the concept of capturing the forecasting pattern from the datastream could drift unpredictably, so it is hard to find a robust forecasting method with high precision. Therefore, this work proposes an ensemble learning method with multiple classifier types and diversity for condition-based maintenance in manufacturing industries, to address the bias problem when using only one base classifier type. Aside from manipulating data diversity, this method includes multiple classifier types, dynamic weight adjusting, and databased adaption to concept drifts for offline learning models, to promote precision of the forecasting model and precisely detect and adapt to concept drifts. With these features, the proposed method requires powerful computing resources to efficiently respond to practical condition-based maintenance applications. Therefore, the implementation of this method based on the MapReduce framework is proposed to increase computational efficiency. Simulation results show that this method can detect and adapt to all concept drifts with a high precision rate.

[1]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Indranil Palit,et al.  Scalable and Parallel Boosting with MapReduce , 2012, IEEE Transactions on Knowledge and Data Engineering.

[3]  Xin Yao,et al.  DDD: A New Ensemble Approach for Dealing with Concept Drift , 2012, IEEE Transactions on Knowledge and Data Engineering.

[4]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[5]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[6]  Heng Lu,et al.  A context-aware system architecture for leak point detection in the large-scale petrochemical industry , 2014, IEEE Communications Magazine.

[7]  S. G. Nawaz,et al.  On Traffic-Aware Partition and Aggregation in Mapreduce for Big Data Applications , 2018 .

[8]  Der-Jiunn Deng,et al.  Forecasting Rare Faults of Critical Components in LED Epitaxy Plants Using a Hybrid Grey Forecasting and Harmony Search Approach , 2016, IEEE Transactions on Industrial Informatics.

[9]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[10]  Osman Hegazy,et al.  A mapreduce fuzzy techniques of big data classification , 2016, 2016 SAI Computing Conference (SAI).

[11]  Song Guo,et al.  Green Industrial Internet of Things Architecture: An Energy-Efficient Perspective , 2016, IEEE Communications Standards.

[12]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[13]  A. Bifet,et al.  Early Drift Detection Method , 2005 .

[14]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[15]  Dong Yue,et al.  Toward Distributed Data Processing on Intelligent Leak-Points Prediction in Petrochemical Industries , 2016, IEEE Transactions on Industrial Informatics.