MorphoCluster: Efficient Annotation of Plankton Images by Clustering

In this work, we present MorphoCluster, a software tool for data-driven, fast, and accurate annotation of large image data sets. While already having surpassed the annotation rate of human experts, volume and complexity of marine data will continue to increase in the coming years. Still, this data requires interpretation. MorphoCluster augments the human ability to discover patterns and perform object classification in large amounts of data by embedding unsupervised clustering in an interactive process. By aggregating similar images into clusters, our novel approach to image annotation increases consistency, multiplies the throughput of an annotator, and allows experts to adapt the granularity of their sorting scheme to the structure in the data. By sorting a set of 1.2 M objects into 280 data-driven classes in 71 h (16 k objects per hour), with 90% of these classes having a precision of 0.889 or higher. This shows that MorphoCluster is at the same time fast, accurate, and consistent; provides a fine-grained and data-driven classification; and enables novelty detection.

[1]  Heidi M. Sosik,et al.  WHOI-Plankton- A Large Scale Fine Grained Visual Recognition Benchmark Dataset for Plankton Classification , 2015, ArXiv.

[2]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[3]  Martin Edwards,et al.  Changing zooplankton seasonality in a changing ocean: Comparing time series of zooplankton phenology , 2012 .

[4]  Daniel Cremers,et al.  Clustering with Deep Learning: Taxonomy and New Methods , 2018, ArXiv.

[5]  L. Artigas,et al.  Globally Consistent Quantitative Observations of Planktonic Ecosystems , 2019, Front. Mar. Sci..

[6]  B. Fasolo,et al.  The effect of choice complexity on perception of time spent choosing: When choice takes longer but feels shorter , 2009 .

[7]  Eugenio Culurciello,et al.  An Analysis of Deep Neural Network Models for Practical Applications , 2016, ArXiv.

[8]  J. Strickler,et al.  Automatic classification of field-collected dinoflagellates by artificial neural network , 1996 .

[9]  P. Utgoff,et al.  RAPID: Research on Automated Plankton Identification , 2007 .

[10]  N. Macleod,et al.  Automated Taxon Identification in Systematics : Theory, Approaches and Applications , 2007 .

[11]  Laurens van der Maaten,et al.  Submanifold Sparse Convolutional Networks , 2017, ArXiv.

[12]  Robert J. Olson,et al.  Automated taxonomic classification of phytoplankton sampled with imaging‐in‐flow cytometry , 2007 .

[13]  Maike Kramer,et al.  Tergipes antarcticus (Gastropoda, Nudibranchia): distribution, life cycle, morphology, anatomy and adaptation of the first mollusc known to live in Antarctic sea ice , 2008, Polar Biology.

[14]  Patrick Mäder,et al.  Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review , 2017, Archives of Computational Methods in Engineering.

[15]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Jaume Piera,et al.  Hierarchical segmentation-based software for cover classification analyses of seabed images (Seascape) , 2011 .

[18]  Lei Shu,et al.  Unseen Class Discovery in Open-world Classification , 2018, ArXiv.

[19]  Phil Culverhouse,et al.  Time to automate identification , 2010, Nature.

[20]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ketil Malde,et al.  Beyond image classification: zooplankton identification with deep vector space embeddings , 2019, ArXiv.

[22]  Dhruv Batra,et al.  Joint Unsupervised Learning of Deep Representations and Image Clusters , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Arthur Zimek,et al.  Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection , 2015, ACM Trans. Knowl. Discov. Data.

[25]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[26]  Philip Culverhouse Natural Object Categorization: Man versus Machine , 2007 .

[27]  N. Mayot,et al.  In situ imaging reveals the biomass of giant protists in the global ocean , 2016, Nature.

[28]  R. Olson,et al.  A submersible imaging‐in‐flow instrument to analyze nano‐and microplankton: Imaging FlowCytobot , 2007 .

[29]  Arnt-Børre Salberg,et al.  Machine intelligence and the data-driven future of marine science , 2020, ICES Journal of Marine Science.

[30]  Reinhard Koch,et al.  Particulate matter flux interception in oceanic mesoscale eddies by the polychaete Poeobius sp. , 2018, Limnology and Oceanography.

[31]  Peter Linke,et al.  The Pelagic In situ Observation System (PELAGIOS) to reveal biodiversity, behavior and ecology of elusive oceanic fauna , 2018 .

[32]  R. Cowen,et al.  In situ ichthyoplankton imaging system (ISIIS): system design and preliminary results , 2008 .

[33]  Allen R. Hanson,et al.  Automatic In Situ Identification of Plankton , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[34]  Hongyu Li,et al.  Quantifying California current plankton samples with efficient machine learning techniques , 2015, OCEANS 2015 - MTS/IEEE Washington.

[35]  Nitesh V. Chawla,et al.  A Review on Quantification Learning , 2017, ACM Comput. Surv..

[36]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[37]  Itheri Yahiaoui,et al.  Interactive plant identification based on social image data , 2014, Ecol. Informatics.

[38]  Daniel Cremers,et al.  Associative Deep Clustering: Training a Classification Network with No Labels , 2018, GCPR.

[39]  Mark D. Ohman,et al.  Improving plankton image classification using context metadata , 2019, Limnology and Oceanography: Methods.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Pål Buhl-Mortensen,et al.  Current and future trends in marine image annotation software , 2016 .

[42]  Hansang Lee,et al.  Plankton classification on imbalanced large scale database via convolutional neural networks with transfer learning , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[43]  Volker Eiselein,et al.  Deep Active Learning for In Situ Plankton Classification , 2018, CVAUI/IWCF/MIPPSNA@ICPR.

[44]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[45]  Nitesh V. Chawla,et al.  Data Mining for Imbalanced Datasets: An Overview , 2005, The Data Mining and Knowledge Discovery Handbook.

[46]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[47]  Joachim Denzler,et al.  Local Novelty Detection in Multi-class Recognition Problems , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[48]  B.M. Schlining,et al.  MBARI's Video Annotation and Reference System , 2006, OCEANS 2006.

[49]  Marc Picheral,et al.  Digital zooplankton image analysis using the ZooScan integrated system , 2010 .

[50]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[51]  Bert W. Hoeksema,et al.  Global Coordination and Standardisation in Marine Biodiversity through the World Register of Marine Species (WoRMS) and Related Databases , 2013, PloS one.

[52]  P. Roberts,et al.  The Prince William Sound Plankton Camera: a profiling in situ observatory of plankton and particulates , 2020, ICES Journal of Marine Science.

[53]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[54]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[55]  P. Culverhouse,et al.  Do experts make mistakes? A comparison of human and machine identification of dinoflagellates , 2003 .

[56]  Reinhard Koch,et al.  Low-Shot Learning of Plankton Categories , 2018, GCPR.

[57]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[58]  Xin Sun,et al.  Few-Shot Learning for Domain-Specific Fine-Grained Image Classification , 2019, IEEE Transactions on Industrial Electronics.

[59]  Oscar Beijbom,et al.  Transfer Learning and Deep Feature Extraction for Planktonic Image Data Sets , 2017, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[60]  Fahad Shahbaz Khan,et al.  Fine-grained Recognition: Accounting for Subtle Differences between Similar Classes , 2019, AAAI.

[61]  Yuandong Tian,et al.  A Face Annotation Framework with Partial Clustering and Interactive Labeling , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[63]  Tim W. Nattkemper,et al.  BIIGLE 2.0 - Browsing and Annotating Large Marine Image Collections , 2017, Front. Mar. Sci..

[64]  R. Hopcroft,et al.  Assessment of ZooImage as a tool for the classification of zooplankton , 2008 .

[65]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[66]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[67]  Olivier Gibaru,et al.  CNN features are also great at unsupervised classification , 2017, ArXiv.

[68]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[69]  Leland McInnes,et al.  Accelerated Hierarchical Density Based Clustering , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[70]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[71]  Bram van Ginneken,et al.  Off-the-shelf convolutional neural network features for pulmonary nodule detection in computed tomography scans , 2015, 2015 IEEE 12th International Symposium on Biomedical Imaging (ISBI).

[72]  G. Gorsky,et al.  The Underwater Vision Profiler 5: An advanced instrument for high spatial resolution studies of particle size spectra and zooplankton , 2010 .

[73]  Patrick Mäder,et al.  Machine learning for image based species identification , 2018, Methods in Ecology and Evolution.

[74]  J. Díez,et al.  Validation methods for plankton image classification systems , 2017 .

[75]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[76]  Vasilis Trygonis,et al.  photoQuad: A dedicated seabed image processing software, and a comparative error analysis of four photoquadrat methods , 2012 .

[77]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.