Improving plankton image classification using context metadata

Advances in both hardware and software are enabling rapid proliferation of in situ plankton imaging methods, requiring more effective machine learning approaches to image classification. Deep Learning methods, such as convolutional neural networks (CNNs), show marked improvement over traditional feature-based supervised machine learning algorithms, but require careful optimization of hyperparameters and adequate training sets. Here, we document some best practices in applying CNNs to zooplankton andmarine snow images and note where our results differ from contemporary Deep Learning findings in other domains. We boost the performance of CNN classifiers by incorporating metadata of different types and illustrate how to assimilate metadata beyond simple concatenation. We utilize both geotemporal (e.g., sample depth, location, time of day) and hydrographic (e.g., temperature, salinity, chlorophyll a) metadata and show that either type by itself, or both combined, can substantially reduce error rates. Incorporation of contextmetadata also boosts performance of the feature-based classifiers we evaluated: RandomForest, Extremely Randomized Trees, Gradient Boosted Classifier, Support Vector Machines, and Multilayer Perceptron. For our assessments, we use an original data set of 350,000 in situ images (roughly 50%marine snow and 50% nonsnow sorted into 26 categories) from a novel in situ Zooglider.We document asymptotically increasing performance with more computationally intensive techniques, such as substantially deeper networks and artificially augmented data sets. Our best model achieves 92.3% accuracy with our 27-class data set. We provide guidance for further refinements that are likely to provide additional gains in classifier accuracy. The burgeoning number of digital imaging methods available to aquatic ecologists, both in situ (Davis et al. 1992; Samson et al. 2001; Benfield et al. 2003; Watson 2004; Olson and Sosik 2007; Cowen and Guigland 2008; Picheral et al. 2010; Schulz et al. 2010; Thompson et al. 2012; Briseño-Avena et al. 2015; Ohman et al. 2018) and in the laboratory (Sieracki et al. 1998; Gorsky et al. 2010), is generating rapidly expanding libraries of digital images useful in a variety of scientific applications. However, the accumulation of large numbers of images increases the need for much more efficient machine learning methods in order to automate the processes of image classification, data extraction, and analysis. Until recently, most automated image classification has employed methods we refer to as “feature-based,” in that they operate on a set of descriptive geometric features calculated from the digital images, such as area, shape, aspect ratio, fractal dimension, textures, and grayscale histograms (e.g. Peura and Iivarinen 1997). The feature-based algorithms then derive a mapping from the calculated values to labels corresponding to the type of organism. Ideally, thismapping will extrapolate to future images. Some of the feature-based algorithms that have been applied to classification of plankton images with varying degrees of success include Random Forest (Grosjean et al. 2004; Gorsky et al. 2010), support vector machines (SVMs) (Hu and Davis 2005; Sosik and Olson 2007; Ellen et al. 2015), andmultilayer perceptron (MLP) (Wilkins et al. 1996), among others. Since 2012, “Deep Learning” algorithms (Krizhevsky et al. 2012; LeCun and Ranzato 2013; LeCun et al. 2015) have outperformed feature-based classifiers in a variety of fields, including natural language processing (Socher et al. 2013), time series analysis (Graves et al. 2013), variational autoencoders (algorithms that learn to generate or alter existing data, such as image correction; Kingma and Welling 2013), plankton image analysis (Orenstein et al. 2015; Dai et al. 2016; Dieleman et al. 2016b; Graff and Ellen 2016; Wang et al. 2016; Zheng et al. 2017; Orenstein and Beijbom 2017; Luo et al. 2018), et al. Multiple algorithms have been characterized as examples of Deep Learning, the commonality being the use of repetitive layers of algorithmic structure that operate on the prior layers rather than the *Correspondence: Present address: Department of Computer Science, University of California, Irvine, Irvine, California This is an open access article under the terms of the Creative Commons Attribution-NonCommercial License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

