Using Support Vector Machines to Formalize the Valid Input Domain of Predictive Models in Systems Design Problems

Predictive modeling can be a valuable tool for systems designers, allowing them to capture and reuse knowledge from a set of observed data related to their system. An important challenge associated with predictive modeling is that of describing the domain over which model predictions are valid. This is necessary to avoid extrapolating beyond the original data, particularly when designers use predictive models in concert with optimizers or other computational routines that search a model’s input space automatically. The general problem of domain description is complicated by the characteristics of observational data sets, which can contain small numbers of samples, can have nonlinear associations among the variables, can be nonconvex, and can occur in disjoint clusters. Support vector machine (SVM) techniques, developed originally in the machine learning community, offer a solution to this problem. This paper is a description of a kernel-based SVM approach that yields a formal mathematical description of the valid input domain of a predictive model. The approach also provides for cluster analysis, which can lead to improved model accuracy through the decomposition of a data set into multiple subsets that designers can model independently. This paper includes a mathematical presentation of kernel-based SVM methods, an explanation of the procedure for applying the approach to predictive modeling problems, and illustrative examples for applying and using the approach in systems design.

[1]  J. M. Castelain,et al.  Use of Parametric Models in an Economic Evaluation Step During the Design Phase , 2001 .

[2]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[3]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[4]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[5]  Joel Dean Statistical cost estimation , 1976 .

[6]  Daewon Lee,et al.  An improved cluster labeling method for support vector clustering , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Alice E. Smith,et al.  COST ESTIMATION PREDICTIVE MODELING: REGRESSION VERSUS NEURAL NETWORK , 1997 .

[8]  Seymour Geisser,et al.  8. Predictive Inference: An Introduction , 1995 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[11]  J. M. Daschbach,et al.  Design analysis through techniques of parametric cost estimation , 1988 .

[12]  Christiaan J. J. Paredis,et al.  Validating behavioral models for reuse , 2007 .

[13]  Christiaan J. J. Paredis,et al.  Using Support Vector Machines to Formalize the Valid Input Domain of Models in Data-Driven Predictive Modeling for Systems Design , 2009 .

[14]  David P. Dobkin,et al.  The quickhull algorithm for convex hulls , 1996, TOMS.

[15]  Mohammed Reza Shabani,et al.  Chemical processes equipment cost estimation using parametric models , 2006 .

[16]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[17]  P. Sanders DoD Modeling and Simulation (M&S) Verification, Validation, and Accreditation (VV&A), , 1996 .

[18]  G. Gary Wang,et al.  Review of Metamodeling Techniques in Support of Engineering Design Optimization , 2007 .

[19]  Karen Daniels,et al.  Gaussian Kernel Width Generator for Support Vector Clustering , 2005, Advances in Bioinformatics and Its Applications.

[20]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[21]  George B. Kleindorfer,et al.  Validation in Simulation: Various Positions in the Philosophy of Science , 1998 .

[22]  Osman Balci,et al.  Principles of simulation model validation, verification, and testing , 1997 .

[23]  Christiaan J. J. Paredis,et al.  Using Parameterized Pareto Sets to Model Design Concepts , 2010 .

[24]  Matthew B. Parkinson,et al.  Including Preference in Anthropometry-Driven Models for Design , 2007, DAC 2007.

[25]  Søren Nymand Lophaven,et al.  DACE - A Matlab Kriging Toolbox , 2002 .

[26]  Robert P. W. Duin,et al.  Data domain description using support vectors , 1999, ESANN.

[27]  H. Raiffa,et al.  Decisions with Multiple Objectives , 1993 .

[28]  Christiaan J. J. Paredis,et al.  COMPOSING TRADEOFF MODELS FOR MULTI-ATTRIBUTE SYSTEM-LEVEL DECISION MAKING , 2008 .

[29]  A. Belegundu,et al.  Optimization Concepts and Applications in Engineering , 2011 .

[30]  Karen M. Daniels,et al.  Cone Cluster Labeling for Support Vector Clustering , 2006, SDM.

[31]  Christiaan J. J. Paredis,et al.  Compositional Modelling of Fluid Power Systems using Predictive Tradeoff Models , 2008 .

[32]  Hava T. Siegelmann,et al.  Support Vector Clustering , 2002, J. Mach. Learn. Res..

[33]  S. Geisser Aspects of the Predictive and Estimative Approaches in the Determination of Probabilities , 1982 .

[34]  Francesco Masulli,et al.  A survey of kernel and spectral methods for clustering , 2008, Pattern Recognit..

[35]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[36]  Gunnar Abrahamson,et al.  Terminology for model credibility , 1980 .

[37]  David Hume A Treatise of Human Nature: Being an Attempt to introduce the experimental Method of Reasoning into Moral Subjects , 1972 .

[38]  Alice M. Agogino,et al.  Decision-Based Conceptual Design: Modeling and Navigating Heterogeneous Design Spaces , 2005 .

[39]  Dong Sik Jang,et al.  Approximate Estimation of the Product Life Cycle Cost Using Artificial Neural Networks in Conceptual Design , 2002 .

[40]  Timothy W. Simpson,et al.  Metamodels for Computer-based Engineering Design: Survey and recommendations , 2001, Engineering with Computers.

[41]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[42]  Petter Krus ESTIMATION MODELS FOR CONCEPT OPTIMISATION OF POWER TRANSFORMATION AND TRANSMISSION , 2005 .

[43]  David C. Yen,et al.  Data mining techniques for customer relationship management , 2002 .

[44]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[45]  Sung-Hae Jun,et al.  A Competitive Co-evolving Support Vector Clustering , 2006, ICONIP.

[46]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[47]  Leonard E. Schwer,et al.  An overview of the PTC 60/V&V 10: guide for verification and validation in computational solid mechanics , 2007, Engineering with Computers.

[48]  Richard J. Malak,et al.  Using parameterized efficient sets to model alternatives for systems design decisions , 2008 .