Domain-Based Benchmark Experiments: Exploratory and Inferential Analysis

Benchmark experiments are the method of choice to compare learning algorithms empirically. For collections of data sets, the empirical performance distributions of a set of learning algorithms are estimated, compared, and ordered. Usually this is done for each data set separately. The present manuscript extends this single data set-based approach to a joint analysis for the complete collection, the so called problem domain. This enables to decide which algorithms to deploy in a specific application or to compare newly developed algorithms with well-known algorithms on established problem domains. Specialized visualization methods allow for easy exploration of huge amounts of benchmark data. Furthermore, we take the benchmark experiment design into account and use mixed-effects models to provide a formal statistical analysis. Two domain-based benchmark experiments demonstrate our methods: the UCI domain as a well-known domain when one is developing a new algorithm; and the Grasshopper domain as a domain where we want to find the  best learning algorithm for a prediction component in an enterprise application software system.

[1]  John G. Kemeny,et al.  Mathematical models in the social sciences , 1964 .

[2]  John G. Kemeny,et al.  Mathematical models in the social sciences , 1964 .

[3]  Richard A. Becker,et al.  The Visual Design and Control of Trellis Display , 1996 .

[4]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[5]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[6]  D. Bates,et al.  Mixed-Effects Models in S and S-PLUS , 2001 .

[7]  S. Wellek Testing Statistical Hypotheses of Equivalence , 2002 .

[8]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[9]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[10]  Kurt Hornik,et al.  The Design and Analysis of Benchmark Experiments , 2005 .

[11]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .

[12]  K. Hornik,et al.  A Lego System for Conditional Inference , 2006 .

[13]  Kurt Hornik,et al.  Deriving Consensus Rankings from Benchmarking Experiments , 2006, GfKl.

[14]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[15]  T. Hothorn,et al.  Simultaneous Inference in General Parametric Models , 2008, Biometrical journal. Biometrische Zeitschrift.

[16]  Klaus Nordhausen,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition by Trevor Hastie, Robert Tibshirani, Jerome Friedman , 2009 .

[17]  Friedrich Leisch,et al.  gcExplorer: interactive exploration of gene clusters , 2009, Bioinform..

[18]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[19]  Manuel J. A. Eugster,et al.  Exploratory analysis of benchmark experiments an interactive approach , 2011, Comput. Stat..

[20]  Manuel J. A. Eugster,et al.  Benchmark experiments: a tool for analyzing statistical learning algorithms , 2011 .

[21]  Cedric E. Ginestet ggplot2: Elegant Graphics for Data Analysis , 2011 .

[22]  Gerhard Wellein,et al.  Introduction to High Performance Computing for Scientists and Engineers , 2010, Chapman and Hall / CRC computational science series.

[23]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[24]  Manuel J. A. Eugster Benchmark Experiments Toolbox , 2014 .

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  Carolin Strobl,et al.  (Psycho-)analysis of benchmark experiments: A formal framework for investigating the relationship between data sets and learning algorithms , 2014, Comput. Stat. Data Anal..