A Comparison of Criteria for Maximum Entropy/ Minimum Divergence Feature Selection

In this paper we study the gain, a naturally-arising statistic from the theory of MEMD modeling [2], as a figure of merit for selecting features for an MEMD language model. We compare the gain with two popular alternatives-empirical activation and mutual information-and argue that the gain is the preferred statistic, on the grounds that it directly measures a feature's contribution to improving upon the base modeL