Towards a data science toolbox for industrial analytics applications

Abstract Manufacturing companies today have access to a vast number of data sources providing gigantic amounts of process and status data. Consequently, the need for analytical information systems is ever-growing to guide corporate decision-making. However, decision-makers in production environments are still very much focused on static, explanatory modeling provided by business intelligence suites instead of embracing the opportunities offered by predictive analytics. We develop a data science toolbox for manufacturing prediction tasks to bridge the gap between machine learning research and concrete practical needs. We provide guidelines and best practices for modeling, feature engineering and interpretation. To this end, we leverage tools from business information systems as well as machine learning. We illustrate the usage of this toolbox by means of a real-world manufacturing defect prediction case study. Thereby, we seek to enhance the understanding of predictive modeling. In particular, we want to emphasize that simply dumping data into “smart” algorithms is not the silver bullet. Instead, constant refinement and consolidation are required to improve the predictive power of a business analytics solution.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[3]  Nishant Kumar,et al.  Using big data to enhance the bosch production line performance: A Kaggle challenge , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[4]  Christian Janiesch,et al.  A Method and Tool for Predictive Event-Driven Process Analytics , 2013, Wirtschaftsinformatik.

[5]  Jörg Becker,et al.  Comprehensible Predictive Models for Business Processes , 2016, MIS Q..

[6]  Abhinav Maurya Bayesian optimization for predicting rare internal failures in manufacturing processes , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[7]  Veda C. Storey,et al.  Business Intelligence and Analytics: From Big Data to Big Impact , 2012, MIS Q..

[8]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[9]  Boudewijn F. van Dongen,et al.  Business process mining: An industrial application , 2007, Inf. Syst..

[10]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[11]  Peter A. Flach The Geometry of ROC Space: Understanding Machine Learning Metrics through ROC Isometrics , 2003, ICML.

[12]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[13]  Zhuang Wang,et al.  Log-based predictive maintenance , 2014, KDD.

[14]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[15]  Klaus-Dieter Thoben,et al.  Machine learning in manufacturing: advantages, challenges, and applications , 2016 .

[16]  Jay Lee,et al.  Recent advances and trends in predictive manufacturing systems in big data environment , 2013 .

[17]  Dragan Djurdjanovic,et al.  Analytical approach to similarity-based prediction of manufacturing system performance , 2013, Comput. Ind..

[18]  Panagiotis G. Ipeirotis,et al.  Duplicate Record Detection: A Survey , 2007 .

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[21]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[22]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..

[23]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[24]  Ricardo Seguel,et al.  Process Mining Manifesto , 2011, Business Process Management Workshops.

[25]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[26]  Noureddine Zerhouni,et al.  Data-driven prognostic method based on Bayesian approaches for direct remaining useful life prediction , 2016, J. Intell. Manuf..

[27]  Jenna Burrell,et al.  How the machine ‘thinks’: Understanding opacity in machine learning algorithms , 2016 .

[28]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[29]  J. Friedman Stochastic gradient boosting , 2002 .

[30]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[31]  Peter Norvig,et al.  The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.

[32]  Jan vom Brocke,et al.  Utilizing big data analytics for information systems research: challenges, promises and guidelines , 2016, Eur. J. Inf. Syst..

[33]  Erik Brynjolfsson,et al.  Big data: the management revolution. , 2012, Harvard business review.

[34]  Rajeev Sharma,et al.  Transforming Decision-Making Processes Transforming decision-making processes : a research agenda for understanding the impact of business analytics on organizations , 2017 .

[35]  Bohdan M. Pavlyshenko,et al.  Machine learning, linear and Bayesian models for logistic regression in failure detection problems , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[36]  Christoph Flath,et al.  Applying Data Science for shop-Floor Performance Prediction , 2017, ECIS.

[37]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[38]  Martin Wattenberg,et al.  How to Use t-SNE Effectively , 2016 .

[39]  Fionn Murtagh,et al.  Algorithms for hierarchical clustering: an overview , 2012, WIREs Data Mining Knowl. Discov..

[40]  Frédéric Thiesse,et al.  Pushing the limits of RFID: Empowering RFID-based Electronic Article Surveillance with Data Analytics Techniques , 2015, ICIS.

[41]  Foster J. Provost,et al.  Explaining Data-Driven Document Classifications , 2013, MIS Q..

[42]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[43]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[44]  Galit Shmueli,et al.  To Explain or To Predict? , 2010 .

[45]  F. Galton Regression Towards Mediocrity in Hereditary Stature. , 1886 .

[46]  N. Diakopoulos Algorithmic Accountability Reporting: On the Investigation of Black Boxes , 2014 .