Improving Innovation from Science Using Kernel Tree Methods as a Precursor to Designed Experimentation

A key challenge in applied science when planning a designed experiment is to determine the aliasing structure of the interaction effects and selecting the appropriate levels for the factors. In this study, kernel tree methods are used as precursors to identify significant interactions and levels of the factors useful for developing a designed experiment. This approach is aligned with integrating data science with the applied sciences to reduce the time from innovation in research and development to the advancement of new products, a very important consideration in today’s world of rapid advancements in industries such as pharmaceutical, medicine, aerospace, etc. Significant interaction effects for six common independent variables using boosted trees and random forests of k = 1000 and k = 10,000 bootstraps were identified from industrial databases. The four common variables were related to speed, pressing time, pressing temperature, and fiber refining. These common variables maximized tensile strength of medium density fiberboard (MDF) and the ultimate static load of oriented strand board (OSB), both widely-used industrial products. Given the results of the kernel tree methods, four possible designs with interaction effects were developed: full factorial, fractional factorial Resolution IV, Box–Behnken, and Central Composite Designs (CCD).

[1]  Timothy M. Young,et al.  Predicting and Correlating the Strength Properties of Wood Composite Process Parameters by Use of Boosted Regression Tree Models , 2015 .

[2]  David J. Edwards,et al.  Robustly Estimating Lower Percentiles When Observations Are Costly , 2015 .

[3]  G. V. Kass Significance Testing in Automatic Interaction Detection (A.I.D.) , 1975 .

[4]  Timothy M. Young,et al.  Predicting Key Reliability Response with Limited Response Data , 2014 .

[5]  G. Box,et al.  Some New Three Level Designs for the Study of Quantitative Variables , 1960 .

[6]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[7]  J. Friedman Stochastic gradient boosting , 2002 .

[8]  David J. Edwards,et al.  A Study of Missing Data Imputation in Predictive Modeling of a Wood-Composite Manufacturing Process , 2016 .

[9]  Eric Eaton,et al.  Building more accurate decision trees with the additive tree , 2019, Proceedings of the National Academy of Sciences.

[10]  Jerome H Friedman,et al.  Multiple additive regression trees with application in epidemiology , 2003, Statistics in medicine.

[11]  Hyunjoong Kim,et al.  Classification Trees With Bivariate Linear Discriminant Node Models , 2003 .

[12]  W. Loh,et al.  Tree-Structured Classification via Generalized Discriminant Analysis. , 1988 .

[13]  Jacek Caban,et al.  Bootstrap Analysis of the Production Processes Capability Assessment , 2019, Applied Sciences.

[14]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[15]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[16]  Berthold Lausen,et al.  Ensemble of optimal trees, random forest and random projection ensemble classification , 2019, Adv. Data Anal. Classif..

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  G. De’ath,et al.  CLASSIFICATION AND REGRESSION TREES: A POWERFUL YET SIMPLE TECHNIQUE FOR ECOLOGICAL DATA ANALYSIS , 2000 .

[19]  Mohamed Medhat Gaber,et al.  Random forests: from early developments to recent advancements , 2014 .

[20]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[21]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[22]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[23]  Timothy M. Young,et al.  Real-time process modeling of particleboard manufacture using variable selection and regression methods ensemble , 2013, European Journal of Wood and Wood Products.

[24]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[25]  H. Akaike A new look at the statistical model identification , 1974 .

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  G. Box Science and Statistics , 1976 .