Effects of data prevalence on species distribution modelling using a genetic takagi-sugeno fuzzy system

Uncertainties originating from observation data and modelling approaches can affect model accuracy and thus impact on the applicability and reliability of a model. This paper aims to assess the effects of data prevalence (i.e., proportion of presence in the entire data set) on species distribution modelling and habitat preference evaluation using a 0-order genetic Takagi-Sugeno fuzzy model. The effects were evaluated based on the model accuracy and habitat preference curves (HPCs). In order to avoid the data uncertainty, virtual species data were generated using hypothetical HPCs under different assumptions on the interaction between habitat variables and habitat preference of a virtual fish. In total, thirteen data sets under three different interaction scenarios were generated. The model accuracy of resulting models was different according to the data prevalence, whereas different trends between data sets under different interaction scenarios were observed. Although the HPC shapes were similar across data sets, the HPCs were different according to the data prevalence, of which a higher prevalence can result in a uniform HPC. This study demonstrates possible influences of data prevalence on the species distribution modelling. Further study is needed for a better solution to cope with the prevalence-related problems in ecological modelling.

[1]  Shinji Fukuda Assessing transferability of genetic algorithm-optimized fuzzy habitat preference models for Japanese medaka (Oryzias latipes) , 2010, 2010 4th International Workshop on Genetic and Evolutionary Fuzzy Systems (GEFS).

[2]  Hisao Ishibuchi,et al.  Classification and modeling with linguistic information granules - advanced approaches to linguistic data mining , 2004, Advanced information processing.

[3]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[4]  Shinji Fukuda,et al.  Assessing the applicability of fuzzy neural networks for habitat preference evaluation of Japanese medaka (Oryzias latipes) , 2011, Ecol. Informatics.

[5]  Bernard De Baets,et al.  Interpretability-preserving genetic optimization of linguistic terms in fuzzy models for fuzzy ordered classification: An ecological case study , 2007, Int. J. Approx. Reason..

[6]  Francisco Herrera,et al.  Genetic fuzzy systems: taxonomy, current research trends and prospects , 2008, Evol. Intell..

[7]  J. Elith,et al.  Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models , 2009 .

[8]  J. Lobo,et al.  The effect of prevalence and its interaction with sample size on the reliability of species distribution models , 2009 .

[9]  B. Baets,et al.  DO ABSENCE DATA MATTER WHEN MODELLING FISH HABITAT PREFERENCE USING A GENETIC TAKAGI-SUGENO FUZZY MODEL? , 2012 .

[10]  Shinji Fukuda,et al.  A Preliminary Analysis for Improving Model Structure of Fuzzy Habitat Preference Model for Japanese Medaka (Oryzias latipes) , 2009, IFSA/EUSFLAT Conf..

[11]  L. Belbin,et al.  Evaluation of statistical models used for predicting plant species distributions: Role of artificial data and theory , 2006 .

[12]  Ans Mouton,et al.  Ecological relevance of' performance criteria for species distribution models , 2010 .

[13]  Shinji Fukuda,et al.  Assessing the effects of zero abundance data on habitat preference modelling using a genetic Takagi-Sugeno fuzzy model , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[14]  Jane Elith,et al.  The evaluation strip: A new and robust method for plotting predicted responses from species distribution models , 2005 .

[15]  B. Baets,et al.  Effect of model formulation on the optimization of a genetic Takagi–Sugeno fuzzy system for fish habitat suitability evaluation , 2011 .

[16]  Bernard De Baets,et al.  Knowledge-based versus data-driven fuzzy habitat suitability models for river management , 2009, Environ. Model. Softw..

[17]  Bernard De Baets,et al.  Prevalence-adjusted optimisation of fuzzy models for species distribution , 2009 .

[18]  Francisco Herrera,et al.  A Review of the Application of Multiobjective Evolutionary Fuzzy Systems: Current Status and Further Directions , 2013, IEEE Transactions on Fuzzy Systems.

[19]  Shinji Fukuda Effect of data quality on habitat preference evaluation for Japanese medaka (Oryzias latipes) using a simple genetic fuzzy system , 2010, International Conference on Fuzzy Systems.

[20]  Hisao Ishibuchi,et al.  Single-objective and two-objective genetic algorithms for selecting linguistic rules for pattern classification problems , 1997, Fuzzy Sets Syst..

[21]  J. Nash,et al.  River flow forecasting through conceptual models part I — A discussion of principles☆ , 1970 .

[22]  Fukuda Shinji,et al.  Mathematical Characterization of Fuzziness in Fish Habitat Preference of Japanese Medaka (Oryzias latipes) in Agricultural Canal , 2005 .

[23]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[24]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[25]  H. Ishibuchi Genetic fuzzy systems: evolutionary tuning and learning of fuzzy knowledge bases , 2004 .

[26]  Francisco Herrera,et al.  Interpretability of linguistic fuzzy rule-based systems: An overview of interpretability measures , 2011, Inf. Sci..

[27]  Jesús Alcalá-Fdez,et al.  Local identification of prototypes for genetic learning of accurate TSK fuzzy rule‐based systems , 2007, Int. J. Intell. Syst..

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Ning Xiong,et al.  Evolutionary learning of rule premises for fuzzy modelling , 2001, Int. J. Syst. Sci..

[30]  Shinji Fukuda,et al.  Consideration of fuzziness: is it necessary in modelling fish habitat preference of Japanese medaka (Oryzias latipes)? , 2009 .

[31]  Shinji Fukuda,et al.  Effect of aggregation functions on the habitat preference modelling using a genetic Takagi-Sugeno fuzzy system , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[32]  Sébastien Brosse,et al.  Dealing with Noisy Absences to Optimize Species Distribution Models: An Iterative Ensemble Modelling Approach , 2012, PloS one.

[33]  Hisao Ishibuchi,et al.  Application of parallel distributed genetics-based machine learning to imbalanced data sets , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[34]  R. Meentemeyer,et al.  Equilibrium or not? Modelling potential distribution of invasive species in different stages of invasion , 2012 .

[35]  Sovan Lek,et al.  Selecting Variables for Habitat Suitability of Asellus (Crustacea, Isopoda) by Applying Input Variable Contribution Methods to Artificial Neural Network Models , 2010 .

[36]  Shinji Fukuda,et al.  Assessing Nonlinearity in Fish Habitat Preference of Japanese Medaka (Oryzias latipes) Using Genetic Algorithm-Optimized Habitat Prediction Models , 2008 .

[37]  Truly Santika Assessing the effect of prevalence on the predictive performance of species distribution models using simulated data , 2011 .

[38]  Kazuaki Hiramatsu,et al.  Prediction ability and sensitivity of artificial intelligence-based habitat preference models for predicting spatial distribution of Japanese medaka (Oryzias latipes) , 2008 .