A comparison of artificial neural networks and random forests to predict native fish species richness in Mediterranean rivers

Machine learning (ML) techniques have become important to support decision making in management and conservation of freshwater aquatic ecosystems. Given the large number of ML techniques and to improve the understanding of ML utility in ecology, it is necessary to perform comparative studies of these techniques as a preparatory analysis for future model applications. The objectives of this study were (i) to compare the reliability and ecological relevance of two predictive models for fish richness, based on the techniques of artificial neural networks (ANN) and random forests (RF) and (ii) to evaluate the conformity in terms of selected important variables between the two modelling approaches. The effectiveness of the models were evaluated using three performance metrics: the determination coefficient (R 2 ), the mean squared error (MSE) and the adjusted determination coefficient (R 2 adj and both models were developed using a k -fold crossvalidation procedure. According to the results, both techniques had similar validation performance (R 2  = 68% for RF and R 2  = 66% for ANN). Although the two methods selected different subsets of input variables, both models demonstrated high ecological relevance for the conservation of native fish in the Mediterranean region. Moreover, this work shows how the use of different modelling methods can assist the critical analysis of predictions at a catchment scale.

[1]  Vera Kurková,et al.  Kolmogorov's theorem and multilayer neural networks , 1992, Neural Networks.

[2]  J. Bernardo,et al.  Interannual variation of fish assemblage structure in a Mediterranean river: implications of streamflow on the dominance of native or exotic species , 2003 .

[3]  Can Ozan Tan,et al.  Methodological issues in building, training, and testing artificial neural networks in ecological applications , 2005, q-bio/0510017.

[4]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[5]  Francisco Martínez-Capel,et al.  Modelling native fish richness to evaluate the effects of hydromorphological changes and river restoration (Júcar River Basin, Spain). , 2012, The Science of the total environment.

[6]  Young-Seuk Park,et al.  Modelling Community Structure in Freshwater Ecosystems , 2014 .

[7]  J. Olden,et al.  Process-Based Principles for Restoring River Ecosystems , 2010 .

[8]  I. Schlosser,et al.  Effects of multi‐year droughts on fish assemblages of seasonally drying Mediterranean streams , 2007 .

[9]  Carsten F. Dormann,et al.  Modelling Species’ Distributions , 2011 .

[10]  Jonas Johansson,et al.  Comparison of different variable selection methods conducted on NIR transmission measurements on intact tablets , 2003 .

[11]  A. Brenning,et al.  Predictive mapping of reef fish species richness, diversity and biomass in Zanzibar using IKONOS imagery and machine-learning techniques. , 2010 .

[12]  R. Sparks,et al.  THE NATURAL FLOW REGIME. A PARADIGM FOR RIVER CONSERVATION AND RESTORATION , 1997 .

[13]  T. F. Oliveira,et al.  Case study: Comparing the use of nonlinear discriminating analysis and Artificial Neural Networks in the classification of three fish species: acaras (Geophagus brasiliensis), tilapias (Tilapia rendalli) and mullets (Mugil liza) , 2010, Ecol. Informatics.

[14]  Sovan Lek,et al.  Predicting local fish species richness in the garonne river basin , 1998 .

[15]  F. Martínez‐Capel,et al.  Hydrological Classification of Natural Flow Regimes to Support Environmental Flow Assessments in Intensively Regulated Mediterranean Rivers, Segura River Basin (Spain) , 2011, Environmental management.

[16]  Holger R. Maier,et al.  Neural networks for the prediction and forecasting of water resource variables: a review of modelling issues and applications , 2000, Environ. Model. Softw..

[17]  David S. Siroky Navigating Random Forests and related advances in algorithmic modeling , 2009 .

[18]  Juan Carlos Gutiérrez-Estrada,et al.  A heuristic approach to predicting water beetle diversity in temporary and fluctuating waters , 2010 .

[19]  Jos Van Orshoven,et al.  Evaluation of modelling techniques for forest site productivity prediction in contrasting ecoregions using stochastic multicriteria acceptability analysis (SMAA) , 2011, Environ. Model. Softw..

[20]  S. Jørgensen,et al.  11 - Spatial Modelling , 2011 .

[21]  Julian D Olden,et al.  Machine Learning Methods Without Tears: A Primer for Ecologists , 2008, The Quarterly Review of Biology.

[22]  S. Soyupak,et al.  Case studies on the use of neural networks in eutrophication modeling , 2000 .

[23]  M. Gevrey,et al.  Review and comparison of methods to study the contribution of variables in artificial neural network models , 2003 .

[24]  J. Evans,et al.  Modeling Species Distribution and Change Using Random Forest , 2011 .

[25]  C. Pires,et al.  Ecological traits of fish assemblages from Mediterranean Europe and their responses to human disturbance , 2007 .

[26]  J. Evans,et al.  Quantifying Bufo boreas connectivity in Yellowstone National Park with landscape genetics. , 2010, Ecology.

[27]  R. M. Costa,et al.  Assessing hydromorphological and floristic patterns along a regulated Mediterranean river: The Serpis River (Spain) , 2011, Limnetica.

[28]  Luca Pozzi,et al.  Use of different approaches to model presence/absence of Salmo marmoratus in Piedmont (Northwestern Italy) , 2009, Ecol. Informatics.

[29]  Ralf Wieland,et al.  Classification in conservation biology: A comparison of five machine-learning methods , 2010, Ecol. Informatics.

[30]  Donald A. Jackson,et al.  What controls who is where in freshwater fish communities the roles of biotic, abiotic, and spatial factors , 2001 .

[31]  D. Pont,et al.  Patterns in species richness and endemism of European freshwater fish , 2006 .

[32]  D. Hankin,et al.  Basinwide Estimation of Habitat and Fish Populations in Streams , 1993 .

[33]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[34]  W. Fisher Stream Ecology: Structure and Function of Running Waters , 1995 .

[35]  M. R. Vidal-Abarca,et al.  Comparing the sensitivity of diverse macroinvertebrate metrics to a multiple stressor gradient in Mediterranean streams and its influence on the assessment of ecological status , 2010 .

[36]  DEFINING MINIMUM ENVIRONMENTAL FLOWS AT REGIONAL SCALE : APPLICATION OF MESOSCALE HABITAT MODELS AND CATCHMENTS CLASSIFICATION , 2011 .

[37]  M. Gordon Wolman,et al.  Fluvial Processes in Geomorphology , 1965 .

[38]  A. Solera,et al.  Integrating water management, habitat modelling and water quality at the basin scale and environmental flow assessment: case study of the Tormes River, Spain , 2014 .

[39]  Antoine Guisan,et al.  Predictive habitat distribution models in ecology , 2000 .

[40]  K. Fausch,et al.  Landscapes to Riverscapes: Bridging the Gap between Research and Conservation of Stream Fishes , 2002 .

[41]  Bernard De Baets,et al.  Data-driven fuzzy habitat suitability models for brown trout in Spanish Mediterranean rivers , 2011, Environ. Model. Softw..

[42]  D. Strayer,et al.  Freshwater biodiversity conservation: recent progress and future challenges , 2010, Journal of the North American Benthological Society.

[43]  Carlos Granado Lorencio,et al.  Ecología de peces , 1996 .

[44]  Wen‐Jun Zhang,et al.  Comparison of different methods for variable selection , 2001 .

[45]  G. Coenders,et al.  Introduction pathways and establishment rates of invasive aquatic species in Europe , 2005 .

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[48]  D. R. Cutler,et al.  Utah State University From the SelectedWorks of , 2017 .

[49]  A. Malik,et al.  Artificial neural network modeling of the river water quality—A case study , 2009 .

[50]  Young-Seuk Park,et al.  Modelling the factors that influence fish guilds composition using a back-propagation network: Assessment of metrics for indices of biotic integrity , 2003 .

[51]  Daniela Pessani,et al.  Use of decision tree and artificial neural network approaches to model presence/absence of Telestes muticellus in piedmont (North‐Western Italy) , 2009 .

[52]  Mark Beale,et al.  Neural Network Toolbox™ User's Guide , 2015 .

[53]  Sovan Lek,et al.  Applications of artificial neural networks predicting macroinvertebrates in freshwaters , 2007, Aquatic Ecology.

[54]  J. Leclere,et al.  A comparison of modeling techniques to predict juvenile 0+ fish species occurrences in a large river system , 2011, Ecol. Informatics.

[55]  C. Lorencio Ecología de comunidades: el paradigma de los peces de agua dulce , 2000 .

[56]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[57]  R. Carballo,et al.  WFD Indicators and Definition of the Ecological Status of Rivers , 2009 .

[58]  B. Muys,et al.  Comparison and ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests , 2010 .

[59]  V. Hermoso,et al.  Threatening processes and conservation management of endemic freshwater fish in the Mediterranean basin: a review , 2011 .

[60]  N. LeRoy Poff,et al.  The ecological limits of hydrologic alteration (ELOHA): a new framework for developing regional environmental flow standards , 2007 .

[61]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[62]  Casimiro Corbacho,et al.  Patterns of species richness and introduced species in native freshwater fish faunas of a Mediterranean‐type basin: the Guadiana River (southwest Iberian Peninsula) , 2001 .

[63]  Jean-François Guégan,et al.  Global scale patterns of fish species richness in rivers , 1995 .

[64]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[65]  I.S. Isa,et al.  Performance Comparison of Different Multilayer Perceptron Network Activation Functions in Automated Weather Classification , 2010, 2010 Fourth Asia International Conference on Mathematical/Analytical Modelling and Computer Simulation.

[66]  P. Moyle,et al.  Development and evaluation of a fish-based index to assess biological integrity of Mediterranean streams , 2011 .

[67]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[68]  E. García‐Berthou,et al.  Life-history traits of invasive fish in small Mediterranean streams , 2005, Biological Invasions.

[69]  A. Filipe,et al.  BIODIVERSITY RESEARCH: Native and introduced fish species richness in Mediterranean streams: the role of multiple landscape influences , 2010 .

[70]  Y. Wiersma,et al.  Predictive species and habitat modeling in landscape ecology : concepts and applications , 2011 .

[71]  R. Naiman,et al.  The Role of Riparian Corridors in Maintaining Regional Biodiversity. , 1993, Ecological applications : a publication of the Ecological Society of America.

[72]  J. Evans,et al.  Gradient modeling of conifer species using random forests , 2009, Landscape Ecology.

[73]  Rafael Muñoz-Mas,et al.  HABITAT SUITABILITY MODELLING AT MESOHABITAT SCALE AND EFFECTS OF DAM OPERATION ON THE ENDANGERED JúCAR NASE, PARACHONDROSTOMA ARRIGONIS (RIVER CABRIEL, SPAIN) , 2012 .

[74]  Askoa Ibisate,et al.  The IHG index for hydromorphological quality assessment of rivers and streams: updated version , 2011, Limnetica.

[75]  N. Prat,et al.  A simple field method for assessing the ecological quality of riparian habitat in rivers and streams: QBR index , 2003 .

[76]  Mansell,et al.  Biodiversity assessment and conservation strategies , 1998, Science.

[77]  Jennifer A. Miller,et al.  Mapping Species Distributions: Spatial Inference and Prediction , 2010 .

[78]  S. Vincenzi,et al.  Application of a Random Forest algorithm to predict spatial distribution of the potential yield of Ruditapes philippinarum in the Venice lagoon, Italy , 2011 .

[79]  Daniela Pessani,et al.  Importance of feature selection in decision-tree and artificial-neural-network ecological applications. Alburnus alburnus alborella: A practical example , 2010, Ecol. Informatics.

[80]  Sovan Lek,et al.  Predicting fish assemblages and diversity in shallow lakes in the Yangtze River basin , 2012 .

[81]  Mevin B. Hooten,et al.  The State of Spatial and Spatio-Temporal Statistical Modeling , 2011 .

[82]  A. Solera,et al.  IMPLEMENTING ENVIRONMENTAL FLOWS IN COMPLEX WATER RESOURCES SYSTEMS – CASE STUDY: THE DUERO RIVER BASIN, SPAIN , 2013 .

[83]  Yannis Dimopoulos,et al.  Use of some sensitivity criteria for choosing networks with good generalization ability , 1995, Neural Processing Letters.

[84]  Francisco Martínez-Capel,et al.  Mesohabitat heterogeneity in four mediterranean streams of the Jucar river basin (Eastern Spain) , 2011, Limnetica.

[85]  Brian D. Fath,et al.  Fundamentals of Ecological Modelling: Applications in Environmental Management and Research , 2011 .

[86]  P. P. Schollema,et al.  From Sea To Source , 2012 .

[87]  J. Alba-Tercedor MACROINVERTEBRADOS ACUATICOS Y CALIDAD DE LAS AGUAS DE LOS RIOS 1 . , 1996 .

[88]  Sovan Lek,et al.  Predicting assemblages and species richness of endemic fish in the upper Yangtze River. , 2010, The Science of the total environment.

[89]  David W. Armitage,et al.  A comparison of supervised learning techniques in the classification of bat echolocation calls , 2010, Ecol. Informatics.

[90]  J. Olden,et al.  Scientific uncertainty and the assessment of risks posed by non‐native freshwater fishes , 2009 .

[91]  N. LeRoy Poff,et al.  Incorporating ecological knowledge into ecoinformatics: An example of modeling hierarchically structured aquatic communities with neural networks , 2006, Ecol. Informatics.

[92]  Paolo Vezza,et al.  Low Flows Regionalization in North-Western Italy , 2010 .

[93]  W. Darwall,et al.  The Status and Distribution of the Freshwater Fish Endemic to the Mediterranean Basin , 2006 .

[94]  E. Aparicio,et al.  Decline of Native Freshwater Fishes in a Mediterranean Watershed on the Iberian Peninsula: A Quantitative Assessment , 2000, Environmental Biology of Fishes.