Robust visual recognition with locally adaptive regression kernels

Visual recognition concerns identifying objects in an image or actions in a video. Recent progress in network, storage, and computational power makes visual recognition algorithms practical in such applications as surveillance, medical image analysis, visual image search, and more. Although current learning-based frameworks achieve state of the art performance on the existing benchmark databases, they are often slow in training phase and require a large number of training examples. However, a single image can be the only example available in such applications as automatic passport control at airports and image retrieval from the Web. As such, developing a sophisticated descriptor is a key to visual recognition from a single (or a few) examples. In this work, we propose to use a novel descriptor, locally adaptive regression kernels (LARK). LARKs have several advantageous properties: (i) LARK is robust to illumination variations, local deformation, and presence of data uncertainty, (ii) LARKs capture local geometry exceedingly well by taking advantage of geodesic distance over the Euclidean distance, and (iii) LARKs can be computed from multi-dimensional data. Thus, they are applicable to a wide variety of problems, such as generic object detection, action recognition, saliency detection, and more. We also develop a real-time detection framework by efficiently computing LARKs. The comprehensive experimental results presented in each chapter will show the superiority of the LARKs over other descriptors.