Fisher Kernels on Visual Vocabularies for Image Categorization

Within the field of pattern classification, the Fisher kernel is a powerful framework which combines the strengths of generative and discriminative approaches. The idea is to characterize a signal with a gradient vector derived from a generative probability model and to subsequently feed this representation to a discriminative classifier. We propose to apply this framework to image categorization where the input signals are images and where the underlying generative model is a visual vocabulary: a Gaussian mixture model which approximates the distribution of low-level features in images. We show that Fisher kernels can actually be understood as an extension of the popular bag-of-visterms. Our approach demonstrates excellent performance on two challenging databases: an in-house database of 19 object/scene categories and the recently released VOC 2006 database. It is also very practical: it has low computational needs both at training and test time and vocabularies trained on one set of categories can be applied to another set without any significant loss in performance.

[1]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[2]  Nello Cristianini,et al.  Advances in Kernel Methods - Support Vector Learning , 1999 .

[3]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[4]  Pedro J. Moreno,et al.  Using the Fisher kernel method for Web audio classification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Mark J. F. Gales,et al.  Maximum margin training of generative kernels , 2004 .

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[9]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[11]  S. Lazebnik,et al.  Local Features and Kernels for Classification of Texture and Object Categories: An In-Depth Study , 2005 .

[12]  John Shawe-Taylor,et al.  Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels , 2005 .

[13]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Jean-Marc Odobez,et al.  Constructing Visual Models with a Latent Space Approach , 2005, SLSFS.

[16]  Pietro Perona,et al.  Combining generative models and Fisher kernels for object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[18]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[19]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[20]  Gabriela Csurka,et al.  Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.