Using Support Vector Machines to Formalize the Valid Input Domain of Models in Data-Driven Predictive Modeling for Systems Design

Predictive modeling can be a valuable tool for systems designers, allowing them to capture and reuse knowledge from a set of observed data related to their system. An important challenge associated with predictive modeling is that of describing the domain over which model predictions are valid. This is necessary to avoid extrapolating beyond the original data, particularly when designers use predictive models in concert with optimizers or other computational routines that search a model’s input space automatically. The general problem of domain description is complicated by the characteristics of observational data sets, which can contain small numbers of samples, can have nonlinear associations among the variables, can be non-convex, and can occur in largely disjoint clusters. Support Vector Machine (SVM) techniques, developed originally in the machine learning community, offer a solution to this problem. This paper is a description of a kernel-based SVM approach that yields a formal mathematical description of the valid input domain of a predictive model. The approach also provides for cluster analysis, which can lead to improved model accuracy through the decomposition of a data set into multiple subsets that designers can model independently. The paper includes a mathematical presentation of kernel-based SVM methods, an explanation of the procedure for applying the approach to predictive modeling problems, and illustrative examples for applying and using the approach in systems design.Copyright © 2009 by ASME