Joint models for cause-of-death mortality in multiple populations

We investigate jointly modeling Age-specific rates of various causes of death in a multinational setting. We apply Multi-Output Gaussian Processes (MOGP), a spatial machine learning method, to smooth and extrapolate multiple cause-of-death mortality rates across several countries and both genders. To maintain flexibility and scalability, we investigate MOGPs with Kronecker-structured kernels and latent factors. In particular, we develop a custom multi-level MOGP that leverages the gridded structure of mortality tables to efficiently capture heterogeneity and dependence across different factor inputs. Results are illustrated with datasets from the Human Cause-of-Death Database (HCD). We discuss a case study involving cancer variations in three European nations, and a US-based study that considers eight top-level causes and includes comparison to all-cause analysis. Our models provide insights into the commonality of cause-specific mortality trends and demonstrate the opportunities for respective data fusion. 1 Background and motivation In-depth modeling of the evolution of human mortality necessitates analysis of the prevalent causes of death. This is doubly so for making mortality forecasts into the future across different age groups, populations and genders. In this article we develop a methodology for probabilistic forecasting of cause-specific mortality in a multi-population (primarily interpreted as a multi-national) context. Thus, we simultaneously fit multiple cause-specific longevity surfaces via a spatio-temporal model that accounts for the complex dependencies across causes and countries and across the Age-Year dimensions. While there have been many works on modeling mortality across several populations (Dong et al. 2020, Enchev et al. 2017, Guibert et al. 2019, Hyndman et al. 2013, Kleinow 2015, Li and Lu 2017, Tsai and Zhang 2019), as well as an active literature on cause-of-death mortality, there are very few that do both simultaneously. As we detail below, there are many natural reasons for building such a joint model, and this gap is arguably driven by the underlying “Big Data” methodological challenge. Indeed, with dozens of mortality datasets that are indexed by countries, causes-of-death, genders, etc., developing a scalable approach is daunting. We demonstrate that this issue may be overcome by adapting machine learning approaches, specifically techniques from multi-task learning (Bonilla et al. 2008, Caruana 1997, Letham and Bakshy 2019, Williams et al. 2009). To this end, we employ multi-output Gaussian Processes (MOGP) combined with linear coregionalization. GPs are a kernel-based data-driven regression framework that translates mortality modeling into smoothing and extrapolating an input-output response surface based on noisy observations. It yields a full Department of Statistics and Applied Probability, University of California at Santa Barbara Department of Statistics and Applied Probability, University of California at Santa Barbara, ludkovski@pstat. ucsb.edu ar X iv :2 11 1. 06 63 1v 1 [ st at .A P] 1 2 N ov 2 02 1

[1]  P. Ekamper,et al.  Improving Overall Mortality Forecasts by Analysing Cause-of-Death, Period and Cohort Effects in Trends , 1999 .

[2]  H M Rosenberg,et al.  Comparability of cause of death between ICD-9 and ICD-10: preliminary estimates. , 2001, National vital statistics reports : from the Centers for Disease Control and Prevention, National Center for Health Statistics, National Vital Statistics System.

[3]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[4]  Sethu Vijayakumar,et al.  Multi-task Gaussian Process Learning of Robot Inverse Dynamics , 2008, NIPS.

[5]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[6]  Eytan Bakshy,et al.  Bayesian Optimization for Policy Search via Online-Offline Experimentation , 2019, J. Mach. Learn. Res..

[7]  Daniel H. Alai,et al.  Mind the Gap: A Study of Cause-Specific Mortality by Socioeconomic Circumstances , 2018 .

[8]  Hong Li,et al.  COHERENT FORECASTING OF MORTALITY RATES: A SPARSE VECTOR-AUTOREGRESSION APPROACH , 2017 .

[9]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[10]  J. Wilmoth,et al.  Are mortality projections always more pessimistic when disaggregated by cause of death? , 1995, Mathematical population studies.

[11]  M. Ludkovski,et al.  GAUSSIAN PROCESS MODELS FOR MORTALITY RATES AND IMPROVEMENT FACTORS , 2016, ASTIN Bulletin.

[12]  A. Rogers,et al.  Forecasting cause-specific mortality using time series methods , 1992 .

[13]  J. Vaupel,et al.  Coherent Forecasts of Mortality with Compositional Data Analysis , 2017 .

[14]  Rob J Hyndman,et al.  Coherent Mortality Forecasting: The Product-Ratio Method With Functional Time Series Models , 2013, Demography.

[15]  Ralf A. Wilke,et al.  A copula model for dependent competing risks , 2009 .

[16]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[17]  Eyal Oren,et al.  The global, regional, and national burden of stomach cancer in 195 countries, 1990–2017: a systematic analysis for the Global Burden of Disease study 2017 , 2019, The lancet. Gastroenterology & hepatology.

[18]  S. Haberman,et al.  Multi-population mortality forecasting using tensor decomposition , 2020 .

[19]  Elad Gilboa,et al.  Scaling Multidimensional Inference for Structured Gaussian Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[21]  Shandian Zhe,et al.  Scalable High-Order Gaussian Process Regression , 2019, AISTATS.

[22]  Nhan Huynh,et al.  Multi-output Gaussian processes for multi-population longevity modelling , 2020, Annals of Actuarial Science.

[23]  Nan Li,et al.  Coherent mortality forecasts for a group of populations: An extension of the lee-carter method , 2005, Demography.

[24]  Y. Zhang A multi-dimensional Bühlmann credibility approach to modeling multi-population mortality rates , 2019, Scandinavian Actuarial Journal.

[25]  Alexander J. Smola,et al.  Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods , 2015, ICML.

[26]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[27]  M. Sherris,et al.  Forecasting Mortality Trends Allowing for Cause-of-Death Mortality Dependence , 2013 .

[28]  Torsten Kleinow,et al.  A common age effect model for the mortality of multiple populations , 2015 .

[29]  O. Lopez,et al.  Forecasting mortality rate improvements with a high-dimensional VAR , 2019, Insurance: Mathematics and Economics.

[30]  J. Oeppen,et al.  Forecasting causes of death by using compositional data analysis: the case of cancer deaths , 2018, Journal of the Royal Statistical Society: Series C (Applied Statistics).

[31]  Vladimir K. Kaishev,et al.  Dependent competing risks: Cause elimination and its impact on survival , 2013 .

[32]  Austin Carter,et al.  Forecasting life expectancy, years of life lost, and all-cause and cause-specific mortality for 250 causes of death: reference and alternative scenarios for 2016–40 for 195 countries and territories , 2018, The Lancet.

[33]  Hong Li,et al.  Modeling cause-of-death mortality using hierarchical Archimedean copula , 2018, Scandinavian Actuarial Journal.

[34]  Haitao Liu,et al.  When Gaussian Process Meets Big Data: A Review of Scalable GPs , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  D. Godlewski,et al.  Predictions of cancer mortality in Poland in 2020 , 2014 .

[36]  R. Mcnown,et al.  Changing causes of death and the sex differential in the USA , 1993 .

[37]  Marco Marsili,et al.  How Useful Are the Causes of Death When Extrapolating Mortality Trends. An Update , 2019, Demographic Research Monographs.

[38]  T. Kleinow,et al.  Multi-population mortality models: fitting, forecasting and comparisons , 2017 .