Fine-grained object recognition in underwater visual data

In this paper we investigate the fine-grained object categorization problem of determining fish species in low-quality visual data (images and videos) recorded in real-life settings. We first describe a new annotated dataset of about 35,000 fish images (MA-35K dataset), derived from the Fish4Knowledge project, covering 10 fish species from the Eastern Indo-Pacific bio-geographic zone. We then resort to a label propagation method able to transfer the labels from the MA-35K to a set of 20 million fish images in order to achieve variability in fish appearance. The resulting annotated dataset, containing over one million annotations (AA-1M), was then manually checked by removing false positives as well as images with occlusions between fish or showing partially fish. Finally, we randomly picked more than 30,000 fish images distributed among ten fish species and extracted from about 400 10-minute videos, and used this data (both images and videos) for the fish task of the LifeCLEF 2014 contest. Together with the fine-grained visual dataset release, we also present two approaches for fish species classification in, respectively, still images and videos. Both approaches showed high performance (for some fish species the precision and recall were close to one) in object classification and outperformed state-of-the-art methods. In addition, despite the fact that dataset is unbalanced in the number of images per species, both methods (especially the one operating on still images) appear to be rather robust against the long-tail curse of data, showing the best performance on the less populated object classes.

[1]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[2]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Simone Palazzo,et al.  Covariance based Fish Tracking in Real-life Underwater Environment , 2018, VISAPP.

[4]  Simone Palazzo,et al.  A rule-based event detection system for real-life underwater domain , 2013, Machine Vision and Applications.

[5]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[6]  Hervé Glotin,et al.  Efficient Bag of Scenes Analysis for Image Categorization , 2013, ICPRAM.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Pietro Perona,et al.  Multiclass recognition and part localization with humans in the loop , 2011, 2011 International Conference on Computer Vision.

[9]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[10]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[11]  Robert B. Fisher,et al.  A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage , 2014, Ecol. Informatics.

[12]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[13]  Larry S. Davis,et al.  Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[14]  Hervé Glotin,et al.  Sparse coding for histograms of local binary patterns applied for image categorization: Toward a Bag-of-Scenes analysis , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[15]  Pietro Perona,et al.  A Bayesian approach to unsupervised one-shot learning of object categories , 2003, ICCV 2003.

[16]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[19]  Y-Lan Boureau,et al.  Learning Hierarchical Feature Extractors For Image Recognition , 2012 .

[20]  Fahad Shahbaz Khan,et al.  Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[21]  Jonathan Krause,et al.  Fine-Grained Crowdsourcing for Fine-Grained Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  W. John Kress,et al.  Leafsnap: A Computer Vision System for Automatic Plant Species Identification , 2012, ECCV.

[23]  Hervé Glotin,et al.  Efficient Instance-based Fish Species Visual Identification by Global Representation , 2014, CLEF.

[24]  Hervé Glotin,et al.  LifeCLEF 2014: Multimedia Life Species Identification Challenges , 2014, CLEF.

[25]  Fei-Fei Li,et al.  Combining randomization and discrimination for fine-grained image categorization , 2011, CVPR 2011.

[26]  Li Fei-Fei,et al.  Integrating Randomization and Discrimination for Classifying Human-Object Interaction Activities , 2014, Human-Centered Social Media Analytics.

[27]  Frédéric Precioso,et al.  Fish Species Recognition from Video using SVM Classifier , 2014, MAED '14.

[28]  Robert B. Fisher,et al.  Hierarchical classification with reject option for live fish recognition , 2014, Machine Vision and Applications.

[29]  Simone Palazzo,et al.  A texton-based kernel density estimation approach for background modeling under extreme conditions , 2014, Comput. Vis. Image Underst..

[30]  Xiaoyang Tan,et al.  Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[31]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[32]  Fei-Fei Li,et al.  Grouplet: A structured image representation for recognizing human and object interactions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Gary R. Bradski,et al.  A codebook-free and annotation-free approach for fine-grained image categorization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Robert B. Fisher,et al.  Underwater Live Fish Recognition Using a Balance-Guaranteed Optimized Tree , 2012, ACCV.

[35]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[36]  Simone Palazzo,et al.  Nonparametric label propagation using mutual local similarity in nearest neighbors , 2015, Comput. Vis. Image Underst..

[37]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  C. V. Jawahar,et al.  Cats and dogs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[40]  Florent Perronnin,et al.  Modeling the spatial layout of images beyond spatial pyramids , 2012, Pattern Recognit. Lett..