A sequential sampling strategy for adaptive classification of computationally expensive data

Many real-world problems in engineering can be represented and solved as a data-driven classification problem, where the goal is to build a classifier that maps a given set of input parameters onto a corresponding class or label. In some cases, the collection of data samples can be computationally expensive. It is therefore crucial to solve the problem using as little data as possible. To this end, a novel sequential sampling algorithm is proposed that begins with a very small training set and supplements it in each iteration by a small batch of additional (expensive) data points. The outcome is a representative set of data samples that focuses the sampling on those locations in the input space where the class labels are changing more rapidly, while making sure that no class regions are missed.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Michal Valko,et al.  Simple regret for infinitely many armed bandits , 2015, ICML.

[3]  Sonja Kuhnt,et al.  Design and analysis of computer experiments , 2010 .

[4]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[5]  Tom Dhaene,et al.  Sequential design and rational metamodelling , 2005, Proceedings of the Winter Simulation Conference, 2005..

[6]  B.G.M. Husslage,et al.  Maximin designs for computer experiments , 2006 .

[7]  A. Basudhar,et al.  An improved adaptive sampling scheme for the construction of explicit boundaries , 2010 .

[8]  Zbigniew Michalewicz,et al.  Evolutionary Computation at the Edge of Feasibility , 1996, PPSN.

[9]  Max D. Morris,et al.  Factorial sampling plans for preliminary computational experiments , 1991 .

[10]  Tom Dhaene,et al.  Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization , 2014, J. Glob. Optim..

[11]  Andy J. Keane,et al.  Recent advances in surrogate-based optimization , 2009 .

[12]  A. Basudhar,et al.  Adaptive explicit decision functions for probabilistic design and optimization using support vector machines , 2008 .

[13]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  R. Luus,et al.  Importance of search-domain reduction in random optimization , 1992 .

[16]  Antonio Harrison Sánchez,et al.  Limit state function identification using Support Vector Machines for discontinuous responses and disjoint failure domains , 2008 .

[17]  Ata Kabán,et al.  Non-parametric detection of meaningless distances in high dimensional data , 2011, Statistics and Computing.

[18]  E. Saff,et al.  Distributing many points on a sphere , 1997 .

[19]  Kevin G. Jamieson,et al.  The Analysis of Adaptive Data Collection Methods for Machine Learning , 2015 .

[20]  D Deschrijver,et al.  Adaptive Sampling Algorithm for Macromodeling of Parameterized $S$ -Parameter Responses , 2011, IEEE Transactions on Microwave Theory and Techniques.

[21]  Tom Dhaene,et al.  Adaptive classification algorithm for EMC-compliance testing of electronic devices , 2013 .

[22]  Tom Dhaene,et al.  Efficient space-filling and non-collapsing sequential design strategies for simulation-based modeling , 2011, Eur. J. Oper. Res..

[23]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[24]  Dick den Hertog,et al.  Maximin Latin Hypercube Designs in Two Dimensions , 2007, Oper. Res..

[25]  Wei Chen,et al.  An Efficient Algorithm for Constructing Optimal Design of Computer Experiments , 2005, DAC 2003.

[26]  Dirk Gorissen,et al.  Space-filling sequential design strategies for adaptive surrogate modelling , 2009, SOCO 2009.

[27]  Tom Dhaene,et al.  A Fuzzy Hybrid Sequential Design Strategy for Global Surrogate Modeling of High-Dimensional Computer Experiments , 2015, SIAM J. Sci. Comput..

[28]  Samy Missoum,et al.  A generalized “max-min” sample for surrogate update , 2014 .

[29]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[30]  Fred J. Hickernell,et al.  A generalized discrepancy and quadrature error bound , 1998, Math. Comput..

[31]  Piet Demeester,et al.  A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design , 2010, J. Mach. Learn. Res..

[32]  Yong He,et al.  Theory and application of near infrared reflectance spectroscopy in determination of food quality , 2007 .

[33]  Kenneth Falconer,et al.  Unsolved Problems In Geometry , 1991 .

[34]  Horst Nowacki,et al.  Modelling of Design Decision for CAD , 1980, CAD Advanced Course.

[35]  A. Sudjianto,et al.  An Efficient Algorithm for Constructing Optimal Design of Computer Experiments , 2005, DAC 2003.

[36]  Aditya Kumar,et al.  Towards In-Flight Detection and Accommodation of Faults in Aircraft Engines , 2004 .

[37]  Gábor Lugosi,et al.  Introduction to Statistical Learning Theory , 2004, Advanced Lectures on Machine Learning.

[38]  Martin T. Hagan,et al.  Neural network design , 1995 .

[39]  Chee Keong Kwoh,et al.  Using classification for constrained memetic algorithm: A new paradigm , 2008, 2008 IEEE International Conference on Systems, Man and Cybernetics.

[40]  Henry Cohn,et al.  Universally optimal distribution of points on spheres , 2006, math/0607446.

[41]  Michael James Sasena,et al.  Flexibility and efficiency enhancements for constrained global design optimization with kriging approximations. , 2002 .

[42]  Tom Dhaene,et al.  A balanced sequential design strategy for global surrogate modeling , 2013, 2013 Winter Simulations Conference (WSC).

[43]  Dirk Gorissen,et al.  A Novel Hybrid Sequential Design Strategy for Global Surrogate Modeling of Computer Experiments , 2011, SIAM J. Sci. Comput..

[44]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[45]  Peter Z. G. Qian Nested Latin hypercube designs , 2009 .

[46]  H. Niederreiter Quasi-Monte Carlo methods and pseudo-random numbers , 1978 .

[47]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[48]  Alberto L. Sangiovanni-Vincentelli,et al.  Support vector machines for analog circuit performance representation , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[49]  G. Gary Wang,et al.  Survey of modeling and optimization strategies to solve high-dimensional design problems with computationally-expensive black-box functions , 2010 .

[50]  Dick den Hertog,et al.  Space-filling Latin hypercube designs for computer experiments , 2008 .

[51]  A. Basudhar,et al.  Constrained efficient global optimization with support vector machines , 2012, Structural and Multidisciplinary Optimization.

[52]  Hyeongjin Song,et al.  Efficient sampling-based Rbdo by using virtual support vector machine and improving the accuracy of the Kriging method , 2013 .

[53]  AurenhammerFranz Voronoi diagramsa survey of a fundamental geometric data structure , 1991 .

[54]  Hendrik Rogier,et al.  Design of a protective garment GPS antenna , 2009 .

[55]  Nir Ailon,et al.  Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity , 2011, NIPS.