Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks

Abstract With the growth of statistical modeling in the ecological sciences, researchers are using more complex methods, such as artificial neural networks (ANNs), to address problems associated with pattern recognition and prediction. Although in many studies ANNs have been shown to exhibit superior predictive power compared to traditional approaches, they have also been labeled a “black box” because they provide little explanatory insight into the relative influence of the independent variables in the prediction process. This lack of explanatory power is a major concern to ecologists since the interpretation of statistical models is desirable for gaining knowledge of the causal relationships driving ecological phenomena. In this study, we describe a number of methods for understanding the mechanics of ANNs (e.g. Neural Interpretation Diagram, Garson's algorithm, sensitivity analysis). Next, we propose and demonstrate a randomization approach for statistically assessing the importance of axon connection weights and the contribution of input variables in the neural network. This approach provides researchers with the ability to eliminate null-connections between neurons whose weights do not significantly influence the network output (i.e. predicted response variable), thus facilitating the interpretation of individual and interacting contributions of the input variables in the network. Furthermore, the randomization approach can identify variables that significantly contribute to network predictions, thereby providing a variable selection method for ANNs. We show that by extending randomization approaches to ANNs, the “black box” mechanics of ANNs can be greatly illuminated. Thus, by coupling this new explanatory power of neural networks with its strong predictive abilities, ANNs promise to be a valuable quantitative tool to evaluate, understand, and predict ecological phenomena.

[1]  Sovan Lek,et al.  Energy availability and habitat heterogeneity predict global riverine fish diversity , 1998, Nature.

[2]  Sovan Lek,et al.  Artificial Neuronal Networks , 2000 .

[3]  Donald A. Jackson,et al.  Fish–Habitat Relationships in Lakes: Gaining Predictive and Explanatory Insight by Using Artificial Neural Networks , 2001 .

[4]  Sovan Lek,et al.  Improved estimation, using neural networks, of the food consumption of fish populations , 1995 .

[5]  Brian D. Ripley,et al.  Statistical Ideas for Selecting Network Architectures , 1995, SNN Symposium on Neural Networks.

[6]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[7]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[8]  Julian D. Olden,et al.  Torturing data for the sake of generality: How valid are our regression models? , 2000 .

[9]  S. Lek,et al.  The use of artificial neural networks to predict the presence of small‐bodied fish in a river , 1997 .

[10]  I. Dimopoulos,et al.  Role of some environmental variables in trout abundance models using neural networks , 1996 .

[11]  G. David Garson,et al.  Interpreting neural-network connection weights , 1991 .

[12]  Sovan Lek,et al.  Predictive models of collembolan diversity and abundance in a riparian habitat , 1999 .

[13]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[14]  Sovan Lek,et al.  Predicting the abundance of minnow Phoxinus phoxinus (Cyprinidae) in the River Ariège (France) using artificial neural networks , 1997 .

[15]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[16]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[17]  Ding-Geng Chen,et al.  A neural network model for forecasting fish stock recruitment , 1999 .

[18]  Nassir El-Jabi,et al.  Predicting conductivity and acidity for small streams using neural networks , 1997 .

[19]  Sovan Lek,et al.  Predicting fish yield of african lakes using neural networks , 1999 .

[20]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[21]  Brian D. Ripley,et al.  Neural Networks and Related Methods for Classification , 1994 .

[22]  S. Manel,et al.  Comparing discriminant analysis, neural networks and logistic regression for predicting species distributions: a case study with a Himalayan river bird , 1999 .

[23]  Martin T. Hagan,et al.  Neural network design , 1995 .

[24]  L. Cooper,et al.  When Networks Disagree: Ensemble Methods for Hybrid Neural Networks , 1992 .

[25]  Bert Kappen,et al.  Neural Networks: Artificial Intelligence and Industrial Applications , 1995, Springer London.

[26]  Sovan Lek,et al.  Predicting the structure and diversity of young‐of‐the‐year fish assemblages in large rivers , 1999 .

[27]  Uygar Özesmi,et al.  An artificial neural network approach to spatial habitat modelling with interspecific interaction , 1999 .

[28]  Michele Scardi,et al.  Developing an empirical model of phytoplankton primary production: a neural network case study , 1999 .

[29]  S. Mitter,et al.  Recursive stochastic algorithms for global optimization in R d , 1991 .

[30]  Patrick van der Smagt,et al.  Introduction to neural networks , 1995, The Lancet.

[31]  D. M. Titterington,et al.  [Neural Networks: A Review from Statistical Perspective]: Rejoinder , 1994 .

[32]  M. Edwards,et al.  The potential for computer-aided identification in biodiversity research. , 1995, Trends in ecology & evolution.

[33]  I. Dimopoulos,et al.  Neural network models to study relationships between lead concentration in grasses and permanent urban descriptors in Athens city (Greece) , 1999 .

[34]  Sovan Lek,et al.  Artificial neural networks as a tool in ecological modelling, an introduction , 1999 .

[35]  A. T. C. Goh,et al.  Back-propagation neural networks for modeling complex systems , 1995, Artif. Intell. Eng..

[36]  S. Lek,et al.  Environmental impact prediction using neural network modelling. An example in wildlife damage , 1999 .

[37]  I. Aoki,et al.  Analysis and prediction of the fluctuation of sardine abundance using a neural network , 1997 .

[38]  Sovan Lek,et al.  Abundance, diversity, and structure of freshwater invertebrates and fish communities: An artificial neural network approach , 2001 .

[39]  Michele Scardi,et al.  Advances in neural network modeling of phytoplankton primary production , 2001 .

[40]  Murray Smith,et al.  Neural Networks for Statistical Modeling , 1993 .

[41]  Fernando Gustavo Tomasel,et al.  Prediction of functional characteristics of ecosystems: a comparison of artificial neural networks and regression models , 1997 .

[42]  Richard J. Mammone,et al.  Artificial neural networks for speech and vision , 1994 .

[43]  S. Lek,et al.  The use of artificial neural networks to assess fish abundance and spatial occupancy in the littoral zone of a mesotrophic lake , 1999 .

[44]  Ingrid M. Schleiter,et al.  Modelling water quality, bioindication and population dynamics in lotic ecosystems using neural networks , 1999 .

[45]  C. K. Minns,et al.  Factors Affecting Fish Species Richness in Ontario Lakes , 1989 .

[46]  Sovan Lek,et al.  Predicting local fish species richness in the garonne river basin , 1998 .

[47]  D. M. Titterington,et al.  Neural Networks: A Review from a Statistical Perspective , 1994 .

[48]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[49]  Julian D. Olden,et al.  An artificial neural network approach for studying phytoplankton succession , 2000, Hydrobiologia.

[50]  I. Dimopoulos,et al.  Application of neural networks to modelling nonlinear relationships in ecology , 1996 .

[51]  Sovan Lek,et al.  Microsatellites and artificial neural networks: tools for the discrimination between natural and hatchery brown trout (Salmo trutta, L.) in Atlantic populations , 1999 .