‘Batteries’ in Machine Learning: A First Experimental Assessment of Inference for Siberian Crane Breeding Grounds in the Russian High Arctic Based on ‘Shaving’ 74 Predictors

The Siberian crane (Leucogeranus leucogeranus,) remains an elusive but highly regarded species of global conservation concern. Breeding regions occur in the Russian high arctic, and two subpopulations are known. Here we present for the first time a machine learning-based summer habitat analysis using nesting data for the eastern population in the breeding grounds employing predictive modeling with 74 GIS predictors. There is a typical desire for parsimony to help increase interpretability of models, but findings generally show that it would not result in greatest improvement to the model and inference. ‘Batteries’ are a new concept in machine learning allowing to test a set of experiments that help to test on predictors and model selection. Here we show 28 of those ‘batteries’ and compared multiple approaches to model runs from iteratively dropping the least or most important predictor (‘variable shaving’) to allow all predictors to contribute. It was found that the generic ‘kitchen sink’ model with TreeNet (an optimized boosting algorithm from Salford Systems Ltd) performs best. However, while the use of ‘batteries’ remain widely underused in wildlife conservation management, ‘shaving’ was of great use to learn about the structure, role and impacts of predictors and their spatial performance supporting non-parsimonious work. Of great interest is the finding that a bundle of low-ranked predictors performs almost equal to, or better than, the so-called top predictors. This is called ‘Predictor swapping’. This is the best and most detailed habitat study and prediction for the Siberian crane in summer, thus far. It is to be used for conservation management and as a generic template for any species while data availability and the environmental crisis are on the rise, specifically for the high Arctic.

[1]  R. Hilborn,et al.  The Ecological Detective: Confronting Models with Data , 1997 .

[2]  S. Stafford,et al.  Multivariate Statistics for Wildlife and Ecology Research , 2000, Springer New York.

[3]  Falk Huettmann,et al.  Rapid multi-nation distribution assessment of a charismatic conservation species using open access ensemble model GIS predictions: Red panda (Ailurus fulgens) in the Hindu-Kush Himalaya region , 2015 .

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  Todd W. Arnold Uninformative Parameters and Model Selection Using Akaike's Information Criterion , 2010 .

[6]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[7]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[8]  J. Friedman Stochastic gradient boosting , 2002 .

[9]  B. Manly,et al.  Resource selection by animals: statistical design and analysis for field studies. , 1994 .

[10]  Xuesong Han,et al.  Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence , 2017, PeerJ.

[11]  Jan de Leeuw,et al.  Will the Three Gorges Dam affect the underwater light climate of Vallisneria spiralis L. and food habitat of Siberian crane in Poyang Lake? , 2009, Hydrobiologia.

[12]  Keiko A. Herrick,et al.  A global model of avian influenza prediction in wild birds: the importance of northern regions , 2013, Veterinary Research.

[13]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[14]  G. Juday,et al.  Modeling the distribution of white spruce (Picea glauca) for Alaska with high accuracy: an open access role-model for predicting tree species in last remaining wilderness areas , 2009, Polar Biology.

[15]  WESLEY M. HOCHACHKA,et al.  Data-Mining Discovery of Pattern and Process in Ecological Systems , 2007 .

[16]  G. Lei,et al.  Nest-Site Selection Analysis of Hooded Crane (Grus monacha) in Northeastern China Based on a Multivariate Ensemble Model , 2014, Zoological science.

[17]  David R. Anderson,et al.  Model selection and multimodel inference : a practical information-theoretic approach , 2003 .

[18]  J. Tobias,et al.  Threatened Birds of Asia: The BirdLife International Red Data Book. Collar, N.J., (Editor-in-chief), Andreev, A.V., Chan, S., Crosby, M.J., Subramanya, S. and Tobias, J.A. Maps by Rudyanto and M. J. Crosby. BirdLife International, Cambridge. 3,038 pages, in two volumes, £55.00. , 2001, Bird Conservation International.

[19]  P. Matthiessen The Birds of Heaven: Travels with Cranes , 2001 .

[20]  A. Townsend Peterson,et al.  Novel methods improve prediction of species' distributions from occurrence data , 2006 .

[21]  Ray Hilborn,et al.  The Ecological Detective , 2013 .

[22]  J. V. Impe ESQUISSE DE L'AVIFAUNE DE LA SIBÉRIE OCCIDENTALE : UNE REVUE BIBLIOGRAPHIQUE , 2013 .

[23]  FRED S. GUTHERY,et al.  INVITED PAPER: INFORMATION THEORY IN WILDLIFE SCIENCE: CRITIQUE AND VIEWPOINT , 2005 .

[24]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[25]  H. Akaike A new look at the statistical model identification , 1974 .

[26]  Y. Kanai,et al.  Migration routes and important resting areas of Siberian cranes (Grus leucogeranus) between northeastern Siberia and China as revealed by satellite tracking , 2002 .

[27]  T. C. Chamberlin The Method of Multiple Working Hypotheses: With this method the dangers of parental affection for a favorite theory can be circumvented. , 1965, Science.

[28]  John Bell,et al.  A review of methods for the assessment of prediction errors in conservation presence/absence models , 1997, Environmental Conservation.

[29]  Alan H. Fielding,et al.  Machine Learning Methods for Ecological Applications , 2012, Springer US.

[30]  F. Huettmann,et al.  Using Stochastic Gradient Boosting to Infer Stopover Habitat Selection and Distribution of Hooded Cranes Grus monacha during Spring Migration in Lindian, Northeast China , 2014, PloS one.