Comparative study on supervised learning methods for identifying phytoplankton species

Phytoplankton plays an important role in marine ecosystem. It is defined as a biological factor to assess marine quality. The identification of phytoplankton species has a high potential for monitoring environmental, climate changes and for evaluating water quality. However, phytoplankton species identification is not an easy task owing to their variability and ambiguity due to thousands of micro and pico-plankton species. Therefore, the aim of this paper is to build a framework for identifying phytoplankton species and to perform a comparison on different features types and classifiers. We propose a new features type extracted from raw signals of phytoplankton species. We then analyze the performance of various classifiers on the proposed features type as well as two other features types for finding the robust one. Through experiments, it is found that Random Forest using the proposed features gives the best classification results with average accuracy up to 98.24%.

[1]  Li Wei,et al.  Fast time series classification using numerosity reduction , 2006, ICML.

[2]  H. W. Balfoort,et al.  Automatic identification of algae: neural network analysis of flow cytometric data , 1992 .

[3]  D S Frankel,et al.  Application of neural networks to flow cytometry data analysis and real-time cell classification. , 1996, Cytometry.

[4]  Houtao Deng,et al.  Guided Random Forest in the RRF Package , 2013, ArXiv.

[5]  He Huang,et al.  Automatic Plankton Image Recognition , 1998, Artificial Intelligence Review.

[6]  Ulf Grandin,et al.  Comparison of classification-then-modelling and species-by-species modelling for predicting lake phytoplankton assemblages , 2012 .

[7]  Antanas Verikas,et al.  An Integrated Approach to Analysis of Phytoplankton Images , 2015, IEEE Journal of Oceanic Engineering.

[8]  J. Lund,et al.  The inverted microscope method of estimating algal numbers and the statistical basis of estimations by counting , 1958, Hydrobiologia.

[9]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[10]  George C. Runger,et al.  Gene selection with guided regularized random forest , 2012, Pattern Recognit..

[11]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[12]  Christopher D G Harley,et al.  The impacts of climate change in coastal marine systems. , 2006, Ecology letters.

[13]  Allen R. Hanson,et al.  Automatic In Situ Identification of Plankton , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[14]  Philippe Grosjean,et al.  Spring zooplankton distribution in the Bay of Biscay from 1998 to 2006 in relation with anchovy recruitment , 2008 .

[15]  B. Ripley Support Functions and Datasets for Venables and Ripley's MASS , 2015 .

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Lawrence O. Hall,et al.  Recognizing plankton images from the shadow image particle profiling evaluation recorder , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[18]  J. Gray Multivariate Exploratory Data Analysis , 1990 .

[19]  F. Colijn,et al.  Phytoplankton monitoring by flow cytometry , 1994 .

[20]  Brian D. Ripley,et al.  Functions for Classification , 2015 .

[21]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[22]  Philip Heraud,et al.  FOURIER TRANSFORM INFRARED SPECTROSCOPY AS A NOVEL TOOL TO INVESTIGATE CHANGES IN INTRACELLULAR MACROMOLECULAR POOLS IN THE MARINE MICROALGA CHAETOCEROS MUELLERII (BACILLARIOPHYCEAE) , 2001 .

[23]  C. W. Morris,et al.  Neural network analysis of flow cytometric data for 40 marine phytoplankton species. , 1994, Cytometry.

[24]  L Boddy,et al.  Comparison of five clustering algorithms to classify phytoplankton from flow cytometry data. , 2001, Cytometry.

[25]  Luis Felipe Artigas,et al.  An optimised protocol to prepare Phaeocystis globosa morphotypes for scanning electron microscopy observation. , 2009, Journal of microbiological methods.

[26]  George C. Runger,et al.  Feature selection via regularized trees , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[27]  Lynne Boddy,et al.  Identification of Phytoplankton from Flow Cytometry Data by Using Radial Basis Function Neural Networks , 1999, Applied and Environmental Microbiology.

[28]  Frank R. Burden,et al.  Fourier Transform Infrared microspectroscopy and chemometrics as a tool for the discrimination of cyanobacterial strains , 1999 .

[29]  P. Burkill,et al.  The rapid analysis of single marine cells by flow cytometry , 1990, Philosophical Transactions of the Royal Society of London. Series A: Physical and Engineering Sciences.

[30]  M. Giordano,et al.  FOURIER TRANSFORM INFRARED SPECTROSCOPY OF MICROALGAE AS A NOVEL TOOL FOR BIODIVERSITY STUDIES, SPECIES IDENTIFICATION, AND THE ASSESSMENT OF WATER QUALITY 1 , 2009, Journal of phycology.

[31]  Lynne Boddy,et al.  Identification of 72 phytoplankton species by radial basis function neural network analysis of flow cytometric data , 2000 .

[32]  Sallie W. Chisholm,et al.  Use of a neural net computer system for analysis of flow cytometric data of phytoplankton populations , 1989 .

[33]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[34]  Philippe Grosjean,et al.  Enumeration, measurement, and identification of net zooplankton samples using the ZOOSCAN digital imaging system , 2004 .

[35]  Marius Brouwer,et al.  Rationale for a New Generation of Indicators for Coastal Waters , 2004, Environmental health perspectives.

[36]  Robert J. Olson,et al.  Automated taxonomic classification of phytoplankton sampled with imaging‐in‐flow cytometry , 2007 .

[37]  Liam Paninski,et al.  Estimation of Entropy and Mutual Information , 2003, Neural Computation.

[38]  B. Schmid,et al.  ECOSYSTEM EFFECTS OF BIODIVERSITY: A CLASSIFICATION OF HYPOTHESES AND EXPLORATION OF EMPIRICAL RESULTS , 1999 .

[39]  Pierre-Alexandre Hébert,et al.  Dissimilarity-Based Classification of Multidimensional Signals by Conjoint Elastic Matching: Application to Phytoplanktonic Species Recognition , 2009, EANN.

[40]  C. Davis,et al.  Real-time observation of taxa-specific plankton distributions: an optical sampling method , 2004 .

[41]  Marc Picheral,et al.  Digital zooplankton image analysis using the ZooScan integrated system , 2010 .