Probabilistic approach to model selection: comparison with unstructured data set

The problem of model selection by data is discussed. This is the problem of finding the internal structure of a given data set, such as signal or image segmentation, cluster analysis, curve and surface fitting. In many cases, there is limited background information, and the given set of data is almost the only source of analysis and decision about the model. A brief review of different approaches to the problem is presented and the probabilistic approach, based on the comparison of the results obtained for a given data set with the one obtained for a set of unstructured data, is introduced. This approach is presented in greater detail for two problems of regression analysis: selecting best subset of regressors and finding best piece-wise regression. Some theoretical results are presented and relations with other approaches to model selection are discussed.