A big data MapReduce framework for fault diagnosis in cloud-based manufacturing

This research develops a MapReduce framework for automatic pattern recognition based on fault diagnosis by solving data imbalance problem in a cloud-based manufacturing (CBM). Fault diagnosis in a CBM system significantly contributes to reduce the product testing cost and enhances manufacturing quality. One of the major challenges facing the big data analytics in CBM is handling of data-sets, which are highly imbalanced in nature due to poor classification result when machine learning techniques are applied on such data-sets. The framework proposed in this research uses a hybrid approach to deal with big data-set for smarter decisions. Furthermore, we compare the performance of radial basis function-based Support Vector Machine classifier with standard techniques. Our findings suggest that the most important task in CBM is to predict the effect of data errors on quality due to highly imbalance unstructured data-set. The proposed framework is an original contribution to the body of literature, where our proposed MapReduce framework has been used for fault detection by managing data imbalance problem appropriately and relating it to firm’s profit function. The experimental results are validated using a case study of steel plate manufacturing fault diagnosis, with crucial performance matrices such as accuracy, specificity and sensitivity. A comparative study shows that the methods used in the proposed framework outperform the traditional ones.

[1]  Werner Dubitzky,et al.  Knowledge Exploration in Life Science Informatics , 2004, Lecture Notes in Computer Science.

[2]  Sam Anand,et al.  ASSESSMENT OF CIRCULARITY ERROR USING A SELECTIVE DATA PARTITION APPROACH , 1999 .

[3]  Hisashi Kashima,et al.  Roughly balanced bagging for imbalanced data , 2009, Stat. Anal. Data Min..

[4]  Michael R. Lyu,et al.  Learning classifiers from imbalanced data based on biased minimax probability machine , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  M. Buscema,et al.  A new meta-classifier , 2010, 2010 Annual Meeting of the North American Fuzzy Information Processing Society.

[6]  C K Lau,et al.  Fault diagnosis of the polypropylene production process (UNIPOL PP) using ANFIS. , 2010, ISA transactions.

[7]  Vasile Palade,et al.  Adjusted Geometric-mean: a Novel Performance Measure for Imbalanced Bioinformatics Datasets Learning , 2012, J. Bioinform. Comput. Biol..

[8]  Herna L. Viktor,et al.  Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach , 2004, SKDD.

[9]  Xun Xu,et al.  From cloud computing to cloud manufacturing , 2012 .

[10]  Silvio Simani,et al.  Fault diagnosis in power plant using neural networks , 2000, Inf. Sci..

[11]  Lihui Wang,et al.  A sensor-driven approach to Web-based machining , 2009, J. Intell. Manuf..

[12]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[13]  Manoj Kumar Tiwari,et al.  Data mining in manufacturing: a review based on the kind of knowledge , 2009, J. Intell. Manuf..

[14]  HerreraFrancisco,et al.  Analysing the classification of imbalanced data-sets with multiple classes , 2013 .

[15]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[16]  Yilu Liu,et al.  Rough set and fuzzy wavelet neural network integrated with least square weighted fusion algorithm based fault diagnosis research for power transformers , 2008 .

[17]  Reza Eslamloueyan,et al.  Designing a hierarchical neural network based on fuzzy clustering for fault diagnosis of the Tennessee-Eastman process , 2011, Appl. Soft Comput..

[18]  Georgios Andreadis,et al.  A collaborative framework for social media aware manufacturing , 2015 .

[19]  Nathalie Japkowicz,et al.  Boosting support vector machines for imbalanced data sets , 2008, Knowledge and Information Systems.

[20]  Jun Wu,et al.  Failure time prediction for mechanical device based on the degradation sequence , 2015, J. Intell. Manuf..

[21]  C H Lo,et al.  Fusion of qualitative bond graph and genetic algorithms: a fault diagnosis application. , 2002, ISA transactions.

[22]  Jay Lee,et al.  A Cyber-Physical Systems architecture for Industry 4.0-based manufacturing systems , 2015 .

[23]  Manoj Kumar Tiwari,et al.  Kernel distance-based robust support vector methods and its application in developing a robust K-chart , 2006 .

[24]  S. Deng,et al.  Application of multiclass support vector machines for fault diagnosis of field air defense gun , 2011, Expert Syst. Appl..

[25]  Taeho Jo,et al.  Class imbalances versus small disjuncts , 2004, SKDD.

[26]  R. Barandelaa,et al.  Strategies for learning in class imbalance problems , 2003, Pattern Recognit..

[27]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[28]  Manoj Kumar Tiwari,et al.  Fast clonal algorithm , 2008, Eng. Appl. Artif. Intell..

[29]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[30]  Manoj Kumar Tiwari,et al.  Consensus-based intelligent group decision-making model for the selection of advanced technology , 2006, Decis. Support Syst..

[31]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[33]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[34]  Chai Xu-dong,et al.  Cloud manufacturing:a new service-oriented networked manufacturing model , 2010 .

[35]  Manoj Kumar Tiwari,et al.  A real time clustering and SVM based price-volatility prediction for optimal trading strategy , 2014, Neurocomputing.

[36]  Manoj Kumar Tiwari,et al.  Solving machine loading problems in a flexible manufacturing system using a genetic algorithm based heuristic approach , 2000 .

[37]  Chieh-Yuan Tsai,et al.  A data mining approach to optimise shelf space allocation in consideration of customer purchase and moving behaviours , 2015 .

[38]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[39]  Xin Yao,et al.  MWMOTE--Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning , 2014 .

[40]  R. Shankar,et al.  Productivity improvement of a computer hardware supply chain , 2005 .

[41]  Ana L. C. Bazzan,et al.  Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets , 2004, KELSI.

[42]  Francisco Herrera,et al.  On the use of MapReduce for imbalanced big data using Random Forest , 2014, Inf. Sci..

[43]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[44]  Jay Lee,et al.  Predictive Manufacturing System - Trends of Next-Generation Production Systems , 2013 .

[45]  Dazhong Wu,et al.  Cloud-based design and manufacturing: A new paradigm in digital manufacturing and design innovation , 2015, Comput. Aided Des..

[46]  Taghi M. Khoshgoftaar,et al.  RUSBoost: A Hybrid Approach to Alleviating Class Imbalance , 2010, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[47]  Manoj Kumar Tiwari,et al.  Rollout strategy-based probabilistic causal model approach for the multiple fault diagnosis , 2010 .

[48]  Manoj Kumar Tiwari,et al.  Solving machine-loading problem of a flexible manufacturing system with constraint-based genetic algorithm , 2006, Eur. J. Oper. Res..

[49]  M. A. H. Farquad,et al.  Preprocessing unbalanced data using support vector machine , 2012, Decis. Support Syst..

[50]  Alok Choudhary,et al.  Risks in Enterprise Cloud Computing: The Perspective of it Experts , 2013, J. Comput. Inf. Syst..

[51]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[52]  GuoHongyu,et al.  Learning from imbalanced data sets with boosting and data generation , 2004 .

[53]  Vadlamani Ravi,et al.  Churn prediction using comprehensible support vector machine: An analytical CRM application , 2014, Appl. Soft Comput..