Part based object detection, segmentation, and tracking by boosting simple shape feature based weak classifiers

Detection, segmentation, and tracking of objects of a known class is a fundamental problem in computer vision. For this task, we need to first detect the objects of interest and segment them from the background, and then track them across different frames while maintaining the correct identities. The two principle sources of difficulty in performing this task are: (a) change in appearance of the objects with viewpoint, illumination, and possible articulation, and (b) partial occlusion of objects of interest by other objects. The objective of this work is to develop a system to automatically detect, segment, and track multiple, possibly partially occluded objects of a known class from a single camera. We take pedestrians, which are important for many real-life applications, as the main class of interest to demonstrate our approach. However, some components of the method are also applied to the class of cars to show the generality of our approach. We represent an object as a hierarchy of parts. The use of part based model enables us to detect and track objects when some parts of them are not visible. We develop a new type of shape oriented features, called edgelet, to capture the silhouette based patterns. We integrate the edgelet features with some other existing shape features, and learn tree structured classifiers for object parts. Part detection responses are combined jointly so that the spatial relations, including possible occlusions, between multiple objects are analyzed. For specific applications, an unsupervised, online learning algorithm is used to improve the performance of the detectors by adapting them to the particular environment. Object segmentor, whose output is pixel-level figure-ground segmentation, is learned based on the local shape features. The object detection and segmentation results provide the observations for tracking. Trajectory initialization and termination are both automatic and rely on the detection results. Two complementary techniques, data association and mean-shift, are used to track an object. An automatic object detection and tracking system has been implemented and evaluated on a number of images and videos. The experimental results show that our method achieves the state-of-the-art performance.