论文信息 - A minimum description length framework for unsupervised learning

A minimum description length framework for unsupervised learning

A fundamental problem in learning and reasoning about a set of information is finding the right representation. The primary goal of an unsupervised learning procedure is to optimize the quality of a system's internal representation. In this thesis, we present a general framework for describing unsupervised learning procedures based on the Minimum Description Length (MDL) principle. The MDL principle states that the best model is one that minimizes the summed description length of the model and the data with respect to the model. Applying this approach to the unsupervised learning problem makes explicit a key trade off between the accuracy of a representation (i.e., how concise a description of the input may be generated from it) and its succinctness (i.e., how compactly the representation itself can be described). Viewing existing unsupervised learning procedures in terms of the framework exposes their implicit assumptions about the type of structure assumed to underlie the data. While these existing algorithms typically minimize the data description using a fixed length representation, we use the framework to derive a class of objective functions for training self-supervised neural networks, where the goal is to minimize the description length of the representation simultaneously with that of the data. Formulating a description of the representation forces assumptions about the structure of the data to be made explicit, which in turn leads to a particular network configuration as well as an objective function that can be used to optimize the network parameters. We describe three new learning algorithms derived in this manner from the MDL framework. Each algorithm embodies a different scheme for describing the internal representation, and is therefore suited to a range of datasets based on the structure underlying the data. Simulations demonstrate the applicability of these algorithms on some simple computational vision tasks.

R. Zemel