A latent model of discriminative aspect

Recognition using appearance features is confounded by phenomena that cause images of the same object to look different, or images of different objects to look the same. This may occur because the same object looks different from different viewing directions, or because two generally different objects have views from which they look similar. In this paper, we introduce the idea of discriminative aspect, a set of latent variables that encode these phenomena. Changes in view direction are one cause of changes in discriminative aspect, but others include changes in texture or lighting. However, images are not labelled with relevant discriminative aspect parameters. We describe a method to improve discrimination by inferring and then using latent discriminative aspect parameters. We apply our method to two parallel problems: object category recognition and human activity recognition. In each case, appearance features are powerful given appropriate training data, but traditionally fail badly under large changes in view. Our method can recognize an object quite reliably in a view for which it possesses no training example. Our method also reweights features to discount accidental similarities in appearance. We demonstrate that our method produces a significant improvement on the state of the art for both object and activity recognition.

[1]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[2]  Kevin W. Bowyer,et al.  Aspect graphs: An introduction and survey of recent results , 1990, Int. J. Imaging Syst. Technol..

[3]  David A. Forsyth,et al.  Invariant Descriptors for 3D Object Recognition and Pose , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Katsushi Ikeuchi,et al.  Why aspect graphs are not (yet) practical for computer vision , 1991, [1991 Proceedings] Workshop on Directions in Automated CAD-Based Vision.

[5]  Joshua B. Tenenbaum,et al.  Learning bilinear models for two-factor problems in vision , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Tapas Kanungo,et al.  Object recognition using appearance-based parts and relations , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[8]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[9]  Pietro Perona,et al.  Human action recognition by sequence of movelet codewords , 2002, Proceedings. First International Symposium on 3D Data Processing Visualization and Transmission.

[10]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Cordelia Schmid,et al.  Semi-Local Affine Parts for Object Recognition , 2004, BMVC.

[12]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[14]  Luc Van Gool,et al.  Towards Multi-View Object Class Detection , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[16]  Silvio Savarese,et al.  3D generic object categorization, localization and pose estimation , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Jason Weston,et al.  Large-scale kernel machines , 2007 .

[18]  Leslie Pack Kaelbling,et al.  Virtual Training for Multi-View Object Class Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  Cordelia Schmid,et al.  Flexible Object Models for Category-Level 3D Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Ali Farhadi,et al.  Transfer Learning in Sign language , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Silvio Savarese,et al.  View Synthesis for Recognizing Unseen Poses of Object Classes , 2008, ECCV.

[24]  Patrick Pérez,et al.  Cross-View Action Recognition from Temporal Self-similarities , 2008, ECCV.

[25]  Ali Farhadi,et al.  Learning to Recognize Activities from the Wrong View Point , 2008, ECCV.