Optimizing the number of classes in automated zooplankton classification

Zooplankton biomass and abundance estimation, based on surveys or time-series, is carried out routinely. Automated or semi-automated image analysis processes, combined with machine-learning techniques for the identification of plankton, have been proposed to assist in sample analysis. A difficulty in automated plankton recognition and classification systems is the selection of the number of classes. This selection can be formulated as a balance between the number of classes identified (zooplankton taxa) and performance (accuracy; correctly classified individuals). Here, a method is proposed to evaluate the impact of the number of selected classes, in terms of classification performance. On the basis of a data set of classified zooplankton images, a machine-learning method suggests groupings that improve the performance of the automated classification. The end-user can accept or reject these mergers, depending on their ecological value and the objectives of the research. This method permits both objectives to be equally balanced: (i) maximization of the number of classes and (ii) performance, guided by the end-user.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Ethem Alpaydin,et al.  Introduction to machine learning , 2004, Adaptive computation and machine learning.

[3]  Michael J. Pazzani,et al.  Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[4]  R. R. Strathmann,et al.  ESTIMATING THE ORGANIC CARBON CONTENT OF PHYTOPLANKTON FROM CELL VOLUME OR PLASMA VOLUME1 , 1967 .

[5]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[6]  Cullen Schaffer,et al.  Technical Note: Selecting a Classification Method by Cross-Validation , 1993, Machine Learning.

[7]  Lawrence O. Hall,et al.  Recognizing plankton images from the shadow image particle profiling evaluation recorder , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Phil F. Culverhouse,et al.  Automatic image analysis of plankton: future perspectives , 2006 .

[9]  P. Utgoff,et al.  RAPID: Research on Automated Plankton Identification , 2007 .

[10]  Martial Hebert,et al.  Automatic Class Selection and Prototyping for 3-D Object Classification , 2005, Fifth International Conference on 3-D Digital Imaging and Modeling (3DIM'05).

[11]  D. Mackas Spatial autocorrelation of plankton community composition in a continental shelf ecosystem , 1984 .

[12]  J. Steele The ocean ‘landscape’ , 1989, Landscape Ecology.

[13]  C. Devey,et al.  Introduction to the InterRidge Special Issue , 2007 .

[14]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[15]  Timothy M. Hagle,et al.  Goodness-of-Fit Measures for Probit and Logit , 1992 .

[16]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[17]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[18]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[19]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[20]  Qiao Hu,et al.  Accurate automatic quantification of taxa-specific plankton abundance using dual classification with correction , 2006 .

[21]  Philippe Grosjean,et al.  Enumeration, measurement, and identification of net zooplankton samples using the ZOOSCAN digital imaging system , 2004 .

[22]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[23]  P. Culverhouse,et al.  Do experts make mistakes? A comparison of human and machine identification of dinoflagellates , 2003 .

[24]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[25]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[26]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[27]  M. Alcaraz,et al.  Estimating zooplankton biomass through image analysis , 2003 .