论文信息 - A New Class of Learnable Detectors for Categorisation

A New Class of Learnable Detectors for Categorisation

A new class of image-level detectors that can be adapted by machine learning techniques to detect parts of objects from a given category is proposed. A classifier (e.g. neural network or adaboost trained classifier) within the detector selects a relevant subset of extremal regions, i.e. regions that are connected components of a thresholded image. Properties of extremal regions render the detector very robust to illumination change. Robustness to viewpoint change is achieved by using invariant descriptors and/or by modeling shape variations by the classifier. The approach is brought to bear on three problems: text detection, face segmentation and leopard skin detection. High detection rates were obtained for unconstrained (i.e. brightness, affine and font invariant) text detection (92%) with a reasonable false positive rate. The time-complexity of the detection is approximately linear in the number of pixels and a non-optimized implementation runs at about 1 frame per second for a 640× 480 image on a high-end PC.

Jiri Matas | Karel Zimmermann

[1] Stepán Obdrzálek,et al. Object Recognition using Local Affine Frames on Distinguished Regions , 2002, BMVC.

[2] Reinhard Koch,et al. 3D Structure from Multiple Images of Large-Scale Environments , 1998, Lecture Notes in Computer Science.

[3] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4] Andrew Zisserman,et al. Matching and Reconstruction from Widely Separated Views , 1998, SMILE.

[5] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6] Bernt Schiele,et al. Analyzing appearance and contour based methods for object categorization , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[7] Luc Van Gool,et al. Content-Based Image Retrieval Based on Local Affinely Invariant Regions , 1999, VISUAL.

[8] Cordelia Schmid,et al. Affine-invariant local descriptors and neighborhood statistics for texture recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] Pietro Perona,et al. Unsupervised Learning of Models for Recognition , 2000, ECCV.

[10] Christopher M. Bishop,et al. Non-linear Bayesian Image Modelling , 2000, ECCV.

[11] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[13] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[14] Luc Van Gool,et al. Real-time affine region tracking and coplanar grouping , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15] Cordelia Schmid,et al. Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16] Luc Van Gool,et al. Wide-baseline multiple-view correspondences , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[17] Jitendra Malik,et al. Shape contexts enable efficient retrieval of similar shapes , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18] Cordelia Schmid,et al. Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[19] Jiri Matas,et al. Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[20] Michael Brady,et al. Saliency, Scale and Image Description , 2001, International Journal of Computer Vision.

[21] Pietro Perona,et al. Object class recognition by unsupervised scale-invariant learning , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..