Comparison of classification-then-modelling and species-by-species modelling for predicting lake phytoplankton assemblages

Abstract Species distribution models are used for a wide range of ecological applications, such as assessment of ecological status. For many such assessments, predictions of entire communities are preferred. When entire community compositions are modelled, two options are available: (1) to model all of the communities’ species individually and (2) to incorporate community information into the models. Here, we compared the accuracy of these two modelling approaches for predicting boreal lake phytoplankton assemblages and their ability to detect human impact. The modelling approaches tested were specifically classification-then-modelling (here a RIVPACS-type model, using random forest to predict biological group membership) and species-by-species modelling, using a random forest model for each species. The species-by-species models performed better than the RIVPACS model according to the dissimilarity measure BC, the area under curve (AUC) and proportion of true positives. In contrast, the taxonomic completeness index (O/E), commonly used for freshwater assessments, indicated that the RIVPACS model performed better. However, we believe that O/E overestimates model performance, due to the index omitting false negative errors (i.e. errors where species are wrongly predicting as absent). No support was found for our hypothesis that rare species would be better modelled by the RIVPACS model. Indeed, the RIVPACS model predicted common species significantly better than the species-by-species models, whilst the species-by-species models predicted rare species better than the RIVPACS model. Both modelling methods were able to separate impaired sites (acidified and eutrophic) from reference sites. We suggest that classification-then-modelling is evaluated using data-set containing more possible biological interactions, e.g. phytoplankton, zooplankton and fish. We also suggest that AUC is used as a complement to taxonomic completeness when evaluating models for reference condition taxa composition.

[1]  C. Hawkins Quantifying biological integrity by taxonomic completeness: its utility in regional and global assessments. , 2006, Ecological applications : a publication of the Ecological Society of America.

[2]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[3]  Richard K. Johnson Development of a prediction system for lake stony-bottom littoral macroinvertebrate communities , 2003 .

[4]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[5]  David P. Larsen,et al.  A null model for the expected macroinvertebrate assemblage in streams , 2005, Journal of the North American Benthological Society.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[8]  Mike T. Furse,et al.  RIVPACS and alternative statistical modelling techniques: accuracy and soundness of principles. , 2000 .

[9]  J. Økland,et al.  The effects of acid deposition on benthic animals in lakes and streams , 1986, Experientia.

[10]  M. Araújo,et al.  Individualistic vs community modelling of species distributions under climate change. , 2009 .

[11]  D. Carlisle,et al.  Biological assessments of Appalachian streams based on predictive models for fish, macroinvertebrate, and diatom assemblages , 2008, Journal of the North American Benthological Society.

[12]  David R. B. Stockwell,et al.  ANNA: A new prediction method for bioassessment programs , 2005 .

[13]  L. Carrascal,et al.  Species-specific traits associated to prediction errors in bird habitat suitability modelling , 2005 .

[14]  Antoine Guisan,et al.  Spatial modelling of biodiversity at the community level , 2006 .

[15]  J. F. Wright,et al.  Α comparison of alternative techniques for prediction of the fauna of running‐water sites in Great Britain , 1999 .

[16]  R. P. McIntosh H. A. GLEASON'S ‘INDIVIDUALISTIC CONCEPT’ AND THEORY OF ANIMAL COMMUNITIES: A CONTINUING CONTROVERSY , 1995, Biological reviews of the Cambridge Philosophical Society.

[17]  John L Stoddard,et al.  Setting expectations for the ecological condition of streams: the concept of reference condition. , 2005, Ecological applications : a publication of the Ecological Society of America.

[18]  Mike T. Furse,et al.  The prediction of the macro‐invertebrate fauna of unpolluted running‐water sites in Great Britain using environmental data , 1987 .

[19]  Mathieu Marmion,et al.  The performance of state-of-the-art modelling techniques depends on geographical distribution of species. , 2009 .

[20]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[21]  S. Manel,et al.  Evaluating presence-absence models in ecology: the need to account for prevalence , 2001 .

[22]  An index of compositional dissimilarity between observed and expected assemblages , 2008 .

[23]  T. Hastie,et al.  Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions , 2006 .

[24]  J. Olden A Species‐Specific Approach to Modeling Biological Communities and Its Potential for Conservation , 2003 .

[25]  A. Prasad,et al.  Newer Classification and Regression Tree Techniques: Bagging and Random Forests for Ecological Prediction , 2006, Ecosystems.

[26]  E. Willén Phytoplankton in Water Quality Assessment — An Indicator Concept , 2007 .

[27]  R. Clarke,et al.  Effects of locally rare taxa on the precision and sensitivity of RIVPACS bioassessment of freshwaters , 2006 .

[28]  T. Sørensen,et al.  A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on Danish commons , 1948 .

[29]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[30]  M. Luoto,et al.  Uncertainty of bioclimate envelope models based on the geographical distribution of species , 2005 .

[31]  Hector Macpherson,et al.  Preservation and storage , 1940 .

[32]  Julian D Olden,et al.  Rediscovering the species in community-wide predictive modeling. , 2006, Ecological applications : a publication of the Ecological Society of America.

[33]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .