Approximating the Distribution of the Sample R2 in Best Subset Regressions

This note presents research on the problem of determining the distribution of the usual sample R 2 statistic in multiple regression studies where the variables to be included in the regression equation are the subset of k variables, from a set of m variables, which maximize the sample R 2 value or satisfy some similar criterion. A Monte—Carlo approach was used to estimate certain percentile points of the distribution of R 2 under the null hypothesis of independence between the dependent variable and the m independent variables. A function has been developed which appears to provide a good approximation to percentile points of the R 2 distribution.