Evaluating machine-learning techniques for recruitment forecasting of seven North East Atlantic fish species

The effect of different factors (spawning biomass, environmental conditions) on recruitment is a subject of great importance in the management of fisheries, recovery plans and scenario exploration. In this study, recently proposed supervised classification techniques, tested by the machine-learning community, are applied to forecast the recruitment of seven fish species of North East Atlantic (anchovy, sardine, mackerel, horse mackerel, hake, blue whiting and albacore), using spawning, environmental and climatic data. In addition, the use of the probabilistic flexible naive Bayes classifier (FNBC) is proposed as modelling approach in order to reduce uncertainty for fisheries management purposes. Those improvements aim is to improve probability estimations of each possible outcome (low, medium and high recruitment) based in kernel density estimation, which is crucial for informed management decision making with high uncertainty. Finally, a comparison between goodness-of-fit and generalization power is provided, in order to assess the reliability of the final forecasting models. It is found that in most cases the proposed methodology provides useful information for management whereas the case of horse mackerel is an example of the limitations of the approach. The proposed improvements allow for a better probabilistic estimation of the different scenarios, i.e. to reduce the uncertainty in the provided forecasts.

[1]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[2]  Valerio Bartolino,et al.  Modelling recruitment dynamics of hake, Merluccius merluccius, in the central Mediterranean in relation to key environmental variables , 2008 .

[3]  B. Efron Bootstrap Methods: Another Look at the Jackknife , 1979 .

[4]  D. H. Cushing,et al.  The Dependence of Recruitment on Parent Stock in Different Groups of Fishes , 1971 .

[5]  Y. Sagarminaga,et al.  Spatio‐temporal distribution of albacore (Thunnus alalunga) catches in the northeastern Atlantic: relationship with the thermal environment , 2010 .

[6]  Xabier Irigoien,et al.  Changes in plankton size structure and composition, during the generation of a phytoplankton bloom, in the central Cantabrian sea. , 2008 .

[7]  Benjamin Planque,et al.  Quantile regression models for fish recruitment-environment relationships : four case studies , 2008 .

[8]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[9]  Laurence T. Kell,et al.  The value of Information in fisheries management: North Sea herring as an example , 2009 .

[10]  Xabier Irigoien,et al.  Egg and larval distributions of seven fish species in north-east Atlantic waters , 2007 .

[11]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[12]  Michel Dreyfus-León,et al.  Recruitment prediction for Pacific herring (Clupea pallasi) on the west coast of Vancouver Island, Canada , 2008, Ecol. Informatics.

[13]  Beatriz A. Roel,et al.  Potential improvements in the management of Bay of Biscay anchovy by incorporating environmental indices as recruitment predictors , 2005 .

[14]  Laura Uusitalo,et al.  Advantages and challenges of Bayesian networks in environmental modelling , 2007 .

[15]  Beatriz A. Roel,et al.  A two-stage biomass dynamic model for Bay of Biscay anchovy: a Bayesian approach , 2008 .

[16]  Iñaki Inza,et al.  Supervised pre-processing approaches in multiple class variables classification for fish recruitment forecasting , 2013, Environ. Model. Softw..

[17]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[18]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[19]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20]  C. D. Kemp,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[21]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[22]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[23]  S. Fiske,et al.  The Handbook of Social Psychology , 1935 .

[24]  R.I.C. Chris Francis,et al.  Measuring the strength of environment–recruitment relationships: the importance of including predictor screening within cross-validations , 2006 .

[25]  W. Ricker Stock and Recruitment , 1954 .

[26]  Philippe Grosjean,et al.  Spring zooplankton distribution in the Bay of Biscay from 1998 to 2006 in relation with anchovy recruitment , 2008 .

[27]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[28]  Ding-Geng Chen,et al.  Recruitment prediction with genetic algorithms with application to the Pacific Herring fishery , 2007 .

[29]  Brian J. Rothschild,et al.  ''Fish stocks and recruitment'': the past thirty years , 2000 .

[30]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[31]  Jose A. Lozano,et al.  Fish recruitment prediction, using robust supervised classification methods , 2010 .

[32]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[33]  Jon Sáenz,et al.  Climate, oceanography, and recruitment: the case of the Bay of Biscay anchovy (Engraulis encrasicolus) , 2008 .

[34]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[35]  Ángel Borja,et al.  Relationships between anchovy (Engraulis encrasicolus L.) recruitment and the environment in the Bay of Biscay , 1996 .

[36]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[37]  Pedro Larrañaga,et al.  Bayesian classifiers based on kernel density estimation: Flexible classifiers , 2009, Int. J. Approx. Reason..

[38]  J. J. Colbert,et al.  Interannual changes in sablefish (Anoplopoma fimbria) recruitment in relation to oceanographic conditions within the California Current System , 2006 .

[39]  Xabier Irigoien,et al.  Modelling the influence of abiotic and biotic factors on plankton distribution in the Bay of Biscay, during three consecutive years (2004–06) , 2008 .

[40]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[41]  Thanh Ha Dang,et al.  Using Entropy to Impute Missing Data in a Classification Task , 2007, 2007 IEEE International Fuzzy Systems Conference.

[42]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[43]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .