Hierarchical classification of diatom images using ensembles of predictive clustering trees

Abstract This paper presents a hierarchical multi-label classification (HMC) system for diatom image classification. HMC is a variant of classification where an instance may belong to multiple classes at the same time and these classes/labels are organized in a hierarchy. Our approach to HMC exploits the classification hierarchy by building a single predictive clustering tree (PCT) that can simultaneously predict all different levels in the hierarchy of taxonomic ranks: genus, species, variety, and form. Hence, PCTs are very efficient: a single classifier is valid for the hierarchical classification scheme as a whole. To improve the predictive performance of the PCTs, we construct ensembles of PCTs. We evaluate our system on the ADIAC database of diatom images. We apply several feature extraction techniques that can be used in the context of diatom images. Moreover, we investigate whether the combination of these techniques increases predictive performance. The results show that ensembles of PCTs have better predictive performance and are more efficient than SVMs. Furthermore, the proposed system outperforms the most widely used approaches for image annotation. Finally, we demonstrate how the system can be used by taxonomists to annotate new diatom images.

[1]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[2]  N. Macleod,et al.  Automated Taxon Identification in Systematics : Theory, Approaches and Applications , 2007 .

[3]  Robert J. Olson,et al.  Automated taxonomic classification of phytoplankton sampled with imaging‐in‐flow cytometry , 2007 .

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  R. E. Loke,et al.  IDENTIFICATION BY CURVATURE OF CONVEX AND CONCAVE SEGMENTS , 2002 .

[6]  Saso Dzeroski,et al.  Hierarchical annotation of medical images , 2011, Pattern Recognit..

[7]  Jos B. T. M. Roerdink,et al.  Identification by mathematical morphology , 2002 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Celine Vens,et al.  Predicting gene function in S. cerevisiae and A. thaliana using hierarchical multi-label decision tree ensembles , 2008 .

[10]  Eileen J. Cox,et al.  The Diatoms: Applications for the Environmental and Earth Sciences , 2012 .

[11]  Daniel H. Huson,et al.  Short clones or long clones? A simulation study on the use of paired reads in metagenomics , 2010, BMC Bioinformatics.

[12]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[13]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[14]  Dragi Kocev,et al.  Ensembles for Predicting Structured Outputs , 2012, Informatica.

[15]  H. D. Buf,et al.  Automatic diatom identification , 2002 .

[16]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[17]  Gabriela Csurka,et al.  LEAR and XRCE's Participation to Visual Concept Detection Task - ImageCLEF 2010 , 2010, CLEF.

[18]  Luís M. Santos,et al.  IDENTIFICATION BY GABOR FEATURES , 2002 .

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Lynne Boddy,et al.  Support vector machines for identifying organisms: a comparison with strongly partitioned radial basis function networks , 2001 .

[21]  Jos B. T. M. Roerdink,et al.  Mixed-method identifications , 2002 .

[22]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[23]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[24]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Michael H. F. Wilkinson,et al.  Automatic Diatom Identification , 2002 .

[26]  Saso Dzeroski,et al.  Decision trees for hierarchical multi-label classification , 2008, Machine Learning.

[27]  Horst Bunke,et al.  IDENTIFICATION USING CLASSICAL AND NEW FEATURES IN COMBINATION WITH DECISION TREE ENSEMBLES , 2002 .

[28]  Saso Dzeroski,et al.  Predicting gene function using hierarchical multi-label decision tree ensembles , 2010, BMC Bioinformatics.

[29]  Jos B T M Roerdink,et al.  Automatic segmentation of diatom images for classification , 2004, Microscopy research and technique.

[30]  Adrian Ciobanu,et al.  IDENTIFICATION BY CONTOUR PROFILING AND LEGENDRE POLYNOMIALS , 2002 .