Feature selection with stochastic complexity

The application of J. Rissanen's theory (1986) of stochastic complexity to the problem of features selection in statistical pattern recognition (SPR) is discussed. Stochastic complexity provides a general framework for statistical problems such as coding, prediction, estimation, and classification. A brief review of the SPR paradigm and traditional methods of feature selection is presented, followed by a discussion of the basic of stochastic complexity. Two forms of stochastic complexity, minimum description length and an integral form, are applied to the problem of feature selection. Experimental results using simulated data generated with Gaussian distributions are given and compared with results from cross validation, a traditional technique. The stochastic complexity measures give superior results, as measured by their ability to select subsets of relevant features, as well as probability of error computed based on the selected feature subset.<<ETX>>