Visual recognition, inference and coding using learned sparse overcomplete representations

We present a hierarchical architecture and learning algorithm for visual recognition and inference tasks such as imagination, reconstruction of occluded images, and expectation-driven segmentation. Certain characteristics of biological vision are used for guidance, such as extensive feedback and lateral recurrence, a highly overcomplete early stage (VI) and sparse distributed activity. Recent advances in computational methods for learning overcomplete dictionaries are used to explore how overcompleteness can be useful for visual tasks. We posit a stochastic, hierarchical generative-world-model (GWM) and develop a simplified-world-model (SWM) based on a variational approximation to the Boltzmann-like distribution. The SWM is designed to enforce sparsity and leads to a tractable dynamic network. Experimentally, we show that increasing the degree of overcompleteness results in improved recognition and segmentation. Critical to the success of this vision system is the sparse coding of images using a learned overcomplete dictionary. An algorithm for performing dictionary learning termed FOCUSS-CNDL is developed in Chapter 2. In tests with natural images, learned overcomplete dictionaries are shown to have higher coding efficiency than complete dictionaries: images encoded with an overcomplete dictionary have both higher compression (fewer bits/pixel) and higher accuracy (lower mean-square error). The vision algorithm of Chapter 1 requires non-negative sparse codes, which is discussed in Chapter 3. A non-negative version of the FOCUSS algorithm is shown to be superior to a matching-pursuit variant. Also, the FOCUSS-CNDL algorithm is found to have better image coding performance than another overcomplete independent analysis (ICA) algorithm. The final chapter presents methods for detecting rare events in a time series of noisy and nonparametrically-distributed data. These algorithms are tested on a difficult real-world problem: predicting failures in hard-drives. An algorithm is developed based on the multiple-instance learning framework and the naive Bayesian classifier (mi-NB) which is specifically designed for the low false-alarm case. Other methods compared are support vector machines (SVMs), unsupervised clustering, and non-parametric statistical tests. While not specific to vision tasks, the mi-NB algorithm may find uses in semi-supervised image categorization tasks.