Analyze, Sense, Preprocess, Predict, Implement, and Deploy (ASPPID): An incremental methodology based on data analytics for cost-efficiently monitoring the industry 4.0

Abstract Industry 4.0 is revolutionizing decision making processes within the manufacturing industry. Among the technological portfolio enabling this revolution, the late literature has capitalized on the potential of data analytics for improving the production cycle at different stages, from resource provisioning to planning, delivery and storage. However, such a promising role of data analytics has been so far explored without a proper, quantitative inspection of the cost-improvement trade-off, nor has the process of acquiring sensors and extracting valuable information from their captured data formalized in a series of methodological steps. This paper introduces the Analyze, Sense, Preprocess, Predict, Implement and Deploy (ASPPID) methodology, an iterative decision workflow that spans from the acquisition of sensing equipment to the quantitative assessment of the contribution of their captured data to enhance the production step under focus. By placing the data scientist at the core of the workflow, this methodology helps improvement teams make informed decisions about which parts of the process need to be sensed, and how to exploit this information towards a verifiable improvement of the production cycle. The implementation of this methodology is exemplified in a real use case within the automotive industry, where the detection of defects in an annealing process can be modeled as a classification problem over a highly imbalanced dataset. Results obtained after applying the proposed ASPPID methodology show that the scrap ratio is reduced by sensing the correct part of the process at minimal investment costs, thus highlighting the crucial role of the data scientist in the management team of manufacturing plants.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[3]  Ridha Derrouiche,et al.  Big Valuable Data in Supply Chain: Deep Analysis of Current Trends and Coming Potential , 2017, PRO-VE.

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Leif Enarsson,et al.  Evaluation of suppliers: how to consider the environment , 1998 .

[6]  Hans-Christian Pfohl,et al.  Concept and Diffusion-Factors of Industry 4.0 in the Supply Chain , 2016, LDIC.

[7]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[8]  M. J. Harry,et al.  SIX SIGMA : A BREAKTHROUGH STRATEGY FOR PROFITABILITY , 1998 .

[9]  Rolf Steinhilper,et al.  The Digital Twin: Realizing the Cyber-Physical Production System for Industry 4.0☆ , 2017 .

[10]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[11]  Will Reese,et al.  Nginx: the high-performance web server and reverse proxy , 2008 .

[12]  Min Chen,et al.  Big-Data Analytics for Cloud, IoT and Cognitive Computing , 2017 .

[13]  Andrew Kusiak,et al.  Data-driven minimization of pump operating and maintenance cost , 2015, Eng. Appl. Artif. Intell..

[14]  Kevin D Potter,et al.  Using real-time data for increasing the efficiency of the automated fibre placement process , 2017 .

[15]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[16]  Jiafu Wan,et al.  Implementing Smart Factory of Industrie 4.0: An Outlook , 2016, Int. J. Distributed Sens. Networks.

[17]  Benjamin T. Hazen,et al.  Big data and predictive analytics for supply chain and organizational performance , 2017 .

[18]  Saso Dzeroski,et al.  Noise detection and elimination in data preprocessing: Experiments in medical domains , 2000, Appl. Artif. Intell..

[19]  W. Deming Improvement of quality and productivity through action by management , 1981 .

[20]  D Neuhauser,et al.  Walter A Shewhart, 1924, and the Hawthorne factory , 2006, Quality and Safety in Health Care.

[21]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Eduardo F. Camacho,et al.  Model predictive control in the process industry , 1995 .

[23]  Ian Postlethwaite,et al.  Knowledge-elicitation and data-mining: Fusing human and industrial plant information , 2006, Eng. Appl. Artif. Intell..

[24]  Michael R. Beauregard,et al.  The Basics of FMEA , 1996 .

[25]  Emerson Delgado López Propuesta de un plan para la reducción de la merma utilizando la metodología six sigma en una planta de productos plásticos , 2016 .

[26]  Evangelos Psomas,et al.  Identifying the critical determinants of TQM and their impact on company performance: Evidence from the hotel industry of Greece , 2017 .

[27]  David C. Hoaglin,et al.  Applications, basics, and computing of exploratory data analysis , 1983 .

[28]  Connie M. Borror,et al.  A Review of Methods for Measurement Systems Capability Analysis , 2003 .

[29]  Mashiour Rahman,et al.  Mining Industrial Engineered Data of Apparel Industry: A Proposed Methodology , 2017 .

[30]  Farhad Nabhani,et al.  Reducing the scrap rate in an electronic manufacturing SME through Lean Six Sigma methodology , 2016 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[33]  Miriam Seoane Santos,et al.  Influence of Data Distribution in Missing Data Imputation , 2017, AIME.

[34]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[35]  Hans-Georg Kemper,et al.  Application-Pull and Technology-Push as Driving Forces for the Fourth Industrial Revolution , 2014 .

[36]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[37]  Michael R. Braun,et al.  The Performance Implications of Financial Slack during Economic Recession and Recovery: Observations from the Software Industry (2001-2003) * , 2008 .

[38]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[39]  Jiju Antony,et al.  A systematic review of statistical process control implementation in the food manufacturing industry , 2017 .

[40]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[41]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[42]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[43]  Neena Sinha,et al.  Mapping the linkage between Organizational Culture and TQM , 2016 .