Calculating degenerate structures via convex optimization with applications in computer vision and pattern recognition
暂无分享,去创建一个
In a wide range of computer vision and pattern recognition problems, the captured images and videos often live in high-dimensional observation spaces. Directly computing them may suffer from computational infeasibility and numerical instability. On the other hand, the data in the real world are often generated due to limited number of physical causes, and thus embed degenerate structures in the nature. For instance, they can be modeled by a low-dimensional subspace, a union of subspaces, a manifold or even a manifold stratification. Discovering and harnessing such intrinsic structures not only brings semantic insight into the problems at hand, but also provides critical information to overcome challenges encountered in the practice.
Recent years have witnessed great development in both the theory and application of convex optimization. Efficient and elegant solutions have been found for NP-hard problems such as low-rank matrix recovery and sparse representation. In this thesis, we study the problem of discovering degenerate structures of high-dimensional inputs using these techniques. Especially we focus ourselves on low-dimensional subspaces and their unions, and address their application in overcoming the challenges encountered under three practical scenarios: face image alignment, background subtraction and automatic plant identification.
In facial image alignment, we propose a method that jointly brings multiple images of an unseen face into alignment with a pre-trained generic appearance model despite different poses, expressions and illumination conditions of the face in the images. The idea is to pursue an intrinsic affine subspace of the target face that is low-dimensional while at the same time lies close to the generic subspace. Compared with conventional appearance-based methods that rely on accurate appearance models, ours works well with only a generic one and performs much better on unseen faces even if they significantly differ from those for training the generic model. The result is approximately good as that in an idealistic case where a specific model for the target face is provided.
For background subtraction, we propose a background model that captures the changes caused by the background switching among a few configurations, like traffic lights statuses. The background is modeled as a union of low-dimensional subspaces, each characterizing one configuration of the background, and the proposed algorithm automatically switches among them and identifies violating elements as foreground pixels. Moreover, we propose a robust learning approach that can work with foreground- present training samples at the background modeling stage – it builds a correct background model with outlying foreground pixels automatically pruned out. This is practically important when foreground-free training samples are difficult to obtain in scenarios such as traffic monitoring.
For automatic plant identification, we propose a novel and practical method that recognizes plants based on leaf shapes extracted from photographs. Different from existing studies that are mostly focused on simple leaves, the proposed method is designed to recognize both simple and compound leaves. The key to that is, instead of either measuring geometric features or matching shape features as in conventional methods, we describe leaves by counting on them the numbers of certain shape patterns. The patterns are learned in a way that they form a degenerate polytope (a special union of affine subspaces) in the feature space, and can simulate, to some extent, the “keys” used by botanists – each pattern reflects a common feature of several different species and all the patterns together can form a discriminative rule for recognition. Experiments conducted on a variety of datasets show that our algorithm significantly outperforms the state-of-art methods in terms of recognition accuracy, efficiency and storage, and thus hasa good promise for practicing.
In conclusion, our performed studies show that: 1) the visual data with semantic meanings are often not random – although they can be high-dimensional, they typically embed degenerate structures in the observation space. 2) With appropriate assumptions made and clever computational tools developed, these structures can be efficiently and stably calculated. 3) The employment of these intrinsic structures helps overcoming practical challenges and is critical for computer vision and pattern recognition algorithms to achieve good performance.