Model selection - An overview

FOR many scientists models are synonymous with paradigms. They are models of some aspects of reality as depicted in a particular science. So the problem of choosing a model appears when that science is at the crossroads. An example of this was the situation in the twenties, when physicists had to choose between Newton’s classical theory of gravitation and the theory of gravitation in Einstein’s general theory of relativity. One of our examples, Example 2, illustrates this sort of problem, but most others are of a different kind. They occur all the time. Typically, when one has to analyse data arising from complex scientific experiments or observational studies in social sciences and epidemiology, there are various aspects that are not deterministic. One way of modelling nondeterministic phenomena is through a probability model. For complex phenomena it is quite rare to have only one plausible model, instead there are several to choose from. In all such situations model selection becomes a fundamental problem. To the extent that large data sets are increasingly common because of advances in information technology, selecting a model has tended to become an essential part of analysis of such data. They present challenging methodological, computational and theoretical problems and have led to a fast-growing literature in both statistics and computer science. This article reviews some of the major statistical developments in this area. No previous background in model selection is assumed. The next section presents a brief background, followed by six examples, some theory and analysis of some of the examples in later sections. The last section provides some concluding remarks. The section ‘State-of-the-art’ is based mainly on Shao and Mukhopadhyay. Background