Visual Recognition Software for Binary Classification and Its Application to Spruce Pollen Identification

Discriminating between black and white spruce (Picea mariana and Picea glauca) is a difficult palynological classification problem that, if solved, would provide valuable data for paleoclimate reconstructions. We developed an open-source visual recognition software (ARLO, Automated Recognition with Layered Optimization) capable of differentiating between these two species at an accuracy on par with human experts. The system applies pattern recognition and machine learning to the analysis of pollen images and discovers general-purpose image features, defined by simple features of lines and grids of pixels taken at different dimensions, size, spacing, and resolution. It adapts to a given problem by searching for the most effective combination of both feature representation and learning strategy. This results in a powerful and flexible framework for image classification. We worked with images acquired using an automated slide scanner. We first applied a hash-based “pollen spotting” model to segment pollen grains from the slide background. We next tested ARLO’s ability to reconstruct black to white spruce pollen ratios using artificially constructed slides of known ratios. We then developed a more scalable hash-based method of image analysis that was able to distinguish between the pollen of black and white spruce with an estimated accuracy of 83.61%, comparable to human expert performance. Our results demonstrate the capability of machine learning systems to automate challenging taxonomic classifications in pollen analysis, and our success with simple image representations suggests that our approach is generalizable to many other object recognition problems.

[1]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[2]  Jonathon Shlens,et al.  Fast, Accurate Detection of 100,000 Object Classes on a Single Machine , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yoshua Bengio,et al.  Gradient-Based Optimization of Hyperparameters , 2000, Neural Computation.

[4]  Scott A. Elias,et al.  Encyclopedia of quaternary science , 2007 .

[5]  S. A. Cain Palynological Studies at Sodon Lake: I. Size-Frequency Study of Fossil Spruce Pollen. , 1948, Science.

[6]  Y. Kaya,et al.  An automatic identification method for the comparison of plant and honey pollen based on GLCM texture features and artificial neural network , 2013 .

[7]  L. Brubaker,et al.  An evaluation of statistical techniques for discriminating Picea glauca from Picea mariana pollen in northern Alaska , 1987 .

[8]  S. Punyasena,et al.  On the Taxonomic Resolution of Pollen and Spore Records of Earth’s Vegetation , 2014, International Journal of Plant Sciences.

[9]  D. Engstrom,et al.  A comparison of numerical and qualitative methods of separating pollen of black and white spruce , 1985 .

[10]  Charless C. Fowlkes,et al.  Classification of grass pollen through the quantitative analysis of surface ornamentation and texture , 2013, Proceedings of the Royal Society B: Biological Sciences.

[11]  Robert S. Thompson,et al.  Paleoclimate simulations for North America over the past 21,000 years: features of the simulated climate and comparisons with paleoenvironmental data , 1998 .

[12]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[13]  E. C. Stillman,et al.  The needs and prospects for automation in palynology , 1996 .

[14]  J. Praglowski The effects of pre-treatment and the embedding media on the shape of pollen grains , 1970 .

[15]  R. M. Hodgson,et al.  Progress towards an automated trainable pollen location and classifier system for use in the palynology laboratory , 2011 .

[16]  Peter Richtárik,et al.  Parallel coordinate descent methods for big data optimization , 2012, Mathematical Programming.

[17]  B. C. Hansen,et al.  Vegetation and environment in Eastern North America during the Last Glacial Maximum , 2000 .

[18]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[19]  G. Erdtman POLLEN-STATISTICS: A NEW RESEARCH METHOD IN PALEO-ECOLOGY. , 1931, Science.

[20]  H. H. Birks,et al.  Future uses of pollen analysis must include plant macrofossils , 2000 .

[21]  John McCarthy,et al.  Professor Sir James Lighthill, FRS. Artificial Intelligence: A General Survey , 1974, Artif. Intell..

[22]  Harry Melville Science Research Council , 1966 .

[23]  K. Holt,et al.  Principles and methods for automated palynology. , 2014, The New phytologist.

[24]  D. Tcheng,et al.  Classifying black and white spruce pollen using layered machine learning. , 2012, The New phytologist.

[25]  R. Booth Validation of Pollen Studies , 2006 .

[26]  K. Faegri,et al.  Textbook of Pollen Analysis , 1965 .

[27]  Raymond O'Connor,et al.  Morphometric analysis of pollen grains for paleoecological studies: classification of Picea from eastern North America. , 2002, American journal of botany.

[28]  Stephen Blackmore,et al.  Glossary of pollen and spore terminology , 2007 .

[29]  H. Birks,et al.  Identification of Picea pollen of Late Quaternary age in eastern North America: a numerical approach , 1980 .

[30]  Tj. Reitsma Size modification of recent pollen grains under different treatments , 1969 .