Deep Hurdle Networks for Zero-Inflated Multi-Target Regression: Application to Multiple Species Abundance Estimation

A key problem in computational sustainability is to understand the distribution of species across landscapes over time. This question gives rise to challenging large-scale prediction problems since (i) hundreds of species have to be simultaneously modeled and (ii) the survey data are usually inflated with zeros due to the absence of species for a large number of sites. The problem of tackling both issues simultaneously, which we refer to as the zero-inflated multi-target regression problem, has not been addressed by previous methods in statistics and machine learning. In this paper, we propose a novel deep model for the zero-inflated multi-target regression problem. To this end, we first model the joint distribution of multiple response variables as a multivariate probit model and then couple the positive outcomes with a multivariate log-normal distribution. By penalizing the difference between the two distributions' covariance matrices, a link between both distributions is established. The whole model is cast as an end-to-end learning framework and we provide an efficient learning algorithm for our model that can be fully implemented on GPUs. We show that our model outperforms the existing state-of-the-art baselines on two challenging real-world species distribution datasets concerning bird and fish populations.

[1]  Rebecca L. Selden,et al.  Climate-Driven Shifts in Marine Species Ranges: Scaling from Organisms to Communities. , 2020, Annual review of marine science.

[2]  Victor S. Sheng,et al.  An Empirical Comparison on Multi-Target Regression Learning , 2018 .

[3]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[4]  James A. Carton,et al.  SODA3: A New Ocean Climate Reanalysis , 2018, Journal of Climate.

[5]  Fabien L. Gandon,et al.  Proceedings of the 2018 World Wide Web Conference , 2018 .

[6]  Di Chen,et al.  End-to-End Learning for the Deep Multivariate Probit Model , 2018, ICML.

[7]  Rebecca L. Selden,et al.  Projecting shifts in thermal habitat for 686 species on the North American continental shelf , 2018, PloS one.

[8]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[9]  Jessica C. Stanton,et al.  Decline of the North American avifauna , 2019, Science.

[10]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[11]  Ivor W. Tsang,et al.  Survey on Multi-Output Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[12]  B. C. Brookes,et al.  Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[13]  Grigorios Tsoumakas,et al.  Multi-target Regression via Random Linear Target Combinations , 2014, ECML/PKDD.

[14]  Xinqi Zhu,et al.  An efficient gradient-based model selection algorithm for multi-output least-squares support vector regression machines , 2018, Pattern Recognit. Lett..

[15]  Vojislav Kecman,et al.  Multi-target support vector regression via correlation regressor chains , 2017, Inf. Sci..

[16]  Padhraic Smyth,et al.  Prediction of Sparse User-Item Consumption Rates with Zero-Inflated Poisson Regression , 2018, WWW.

[17]  M. Pinsky,et al.  Marine defaunation: Animal loss in the global ocean , 2015, Science.

[18]  Suming Jin,et al.  Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information , 2015 .

[19]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[20]  K. Mohammad,et al.  Multilevel zero-inflated Generalized Poisson regression modeling for dispersed correlated count data , 2016 .

[21]  K. Wannemuehler,et al.  On the Use of Zero-Inflated and Hurdle Models for Modeling Vaccine Adverse Event Count Data , 2006, Journal of biopharmaceutical statistics.

[22]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.