Automated Modelling in Empirical Social Sciences Using a Genetic Algorithm

Automated modelling is of increasing relevance in empirical social sciences because of the increasing availability of potentially important variables. The availability of many variables causes uncertainty about which variables should be included in parsimonious models for the explanation of phenomena in social sciences and which variables should be excluded. Given a large number of potentially informative variables this paper argues that the use of genetic algorithms for Bayesian model selection allows the efficient automated identification of an optimal subset of variables. The advantages of using a genetic algorithm as a method for automated modelling is exemplified by the identification of previously unknown but important causal relationships for long-run inflation, the share spent on defence and political rights on the basis of a cross-country data set.