CMIB: Unsupervised Image Object Categorization in Multiple Visual Contexts

Object categorization in images is fundamental to various industrial areas, such as automated visual inspection, fast image retrieval, and intelligent surveillance. Most existing methods treat visual features (e.g., scale-invariant feature transform) as content information of the objects, while regarding image tags as their contextual information. However, the image tags can hardly be acquired in completely unsupervised settings, especially when the image volume is too large to be marked. In this article, we propose a novel contextual multivariate information bottleneck (CMIB) method to conduct unsupervised image object categorization in multiple visual contexts. Unlike using manual contexts, the CMIB method first automatically generates a set of high-level basic clusterings by multiple global features, which are unprecedentedly defined as visual contexts since they can provide overall information about the target images. Then, the idea of the data compression procedure for object category discovery is proposed, in which the content and multiple visual contexts are maximally preserved through a “bottleneck.” Specifically, two Bayesian networks are initially built to characterize the relationship between data compression and information preservation. Finally, a novel sequential information-theoretic optimization is proposed to ensure the convergence of the CMIB objective function. Experimental results on seven real-world benchmark image datasets demonstrate that the CMIB method achieves better performance than the state-of-the-art baselines.

[1]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[2]  Luc Van Gool,et al.  Unsupervised High-level Feature Learning by Ensemble Projection for Semi-supervised Image Classification and Image Clustering , 2016, ArXiv.

[3]  Antonio Torralba,et al.  Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[5]  Ling Shao,et al.  Binary Multi-View Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Fahad Shahbaz Khan,et al.  Top-down color attention for object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Noam Slonim,et al.  The Information Bottleneck : Theory and Applications , 2006 .

[8]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[9]  Cordelia Schmid,et al.  Coloring Local Feature Extraction , 2006, ECCV.

[10]  Yi Yang,et al.  Image Clustering Using Local Discriminant Models and Global Integration , 2010, IEEE Transactions on Image Processing.

[11]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[12]  Baihua Li,et al.  A Fast Image Retrieval Method Designed for Network Big Data , 2017, IEEE Transactions on Industrial Informatics.

[13]  Feiping Nie,et al.  Heterogeneous image feature integration via multi-modal spectral clustering , 2011, CVPR 2011.

[14]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Chang-Dong Wang,et al.  Locally Weighted Ensemble Clustering , 2016, IEEE Transactions on Cybernetics.

[16]  Debin Zhao,et al.  Real-Time Moving Object Segmentation and Classification From HEVC Compressed Surveillance Video , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Éric Gaussier,et al.  A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation , 2005, ECIR.

[18]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[19]  Junsong Yuan,et al.  Combining Feature Context and Spatial Context for Image Pattern Discovery , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Ling Shao,et al.  Unsupervised Spectral Dual Assignment Clustering of Human Actions in Context , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ying Wu,et al.  Action recognition with multiscale spatio-temporal contexts , 2011, CVPR 2011.

[22]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[23]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[24]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[25]  Luis Herranz,et al.  Joint multi-feature spatial context for scene recognition in the semantic manifold , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Lei Du,et al.  Robust Multi-View Spectral Clustering via Low-Rank and Sparse Decomposition , 2014, AAAI.

[27]  James R. Glass,et al.  Unsupervised Learning of Spoken Language with Visual Context , 2016, NIPS.

[28]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[29]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[31]  Kaiqi Huang,et al.  A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios , 2019, IEEE Transactions on Image Processing.

[32]  Ngai-Man Cheung,et al.  Embedding Based on Function Approximation for Large Scale Image Search , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Yangdong Ye,et al.  Unsupervised video categorization based on multivariate information bottleneck method , 2015, Knowl. Based Syst..

[34]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Dinh Q. Phung,et al.  Bayesian Nonparametric Multilevel Clustering with Group-Level Contexts , 2014, ICML.

[36]  In-So Kweon,et al.  Ambiguous Surface Defect Image Classification of AMOLED Displays in Smartphones , 2016, IEEE Transactions on Industrial Informatics.

[37]  Naftali Tishby,et al.  Multivariate Information Bottleneck , 2001, Neural Computation.

[38]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[39]  Guosheng Lin,et al.  Exploring Context with Deep Structured Models for Semantic Segmentation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[42]  Yaniv Taigman,et al.  Descriptor Based Methods in the Wild , 2008 .

[43]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[44]  Hui Xiong,et al.  Understanding and Enhancement of Internal Clustering Validation Measures , 2013, IEEE Transactions on Cybernetics.

[45]  Michal Irani,et al.  “Clustering by Composition”—Unsupervised Discovery of Image Categories , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Hal Daumé,et al.  Co-regularized Multi-view Spectral Clustering , 2011, NIPS.

[47]  Li-Jia Li,et al.  Dense Captioning with Joint Inference and Visual Context , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Ying Wu,et al.  Context-aware clustering , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.