Multi-feature pLSA for combining visual features in image annotation

We study in this paper the problem of combining low-level visual features for image region annotation. The problem is tackled with a novel method that combines texture and color features via a mixture model of their joint distribution. The structure of the presented model can be considered as an extension of the probabilistic latent semantic analysis (pLSA) in that it handles data from two different visual feature domains by attaching one more leaf node to the graphical structure of the original pLSA. Therefore, the proposed approach is referred to as multi-feature pLSA (MF-pLSA). The supervised paradigm is adopted to classify a new image region into one of a few pre-defined object categories using the MF-pLSA. To evaluate the performance, the VOC2009 and LabelMe databases were employed in our experiments, along with various experimental settings in terms of the number of visual words and mixture components. Evaluated based on the average recall and precision, the MF-pLSA is demonstrated superior to seven other approaches, including other schemes for visual feature combination.

[1]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Ling Guan,et al.  Multimodal information fusion for selected multimedia applications , 2010, Int. J. Multim. Intell. Secur..

[4]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[5]  Wei-Ying Ma,et al.  Annotating Images by Mining Image Search Results , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Rainer Lienhart,et al.  Multilayer pLSA for multimodal image retrieval , 2009, CIVR '09.

[7]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[8]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[9]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[10]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.