Variational Disentanglement for Rare Event Modeling

Combining the increasing availability and abundance of healthcare data and the current advances in machine learning methods have created renewed opportunities to improve clinical decision support systems. However, in healthcare risk prediction applications, the proportion of cases with the condition (label) of interest is often very low relative to the available sample size. Though very prevalent in healthcare, such imbalanced classification settings are also common and challenging in many other scenarios. So motivated, we propose a variational disentanglement approach to semi-parametrically learn from rare events in heavily imbalanced classification problems. Specifically, we leverage the imposed extreme-distribution behavior on a latent space to extract information from low-prevalence events, and develop a robust prediction arm that joins the merits of the generalized additive model and isotonic neural nets. Results on synthetic studies and diverse real-world datasets, including mortality prediction on a COVID-19 cohort, demonstrate that the proposed approach outperforms existing alternatives.

[1]  Lawrence Carin,et al.  Supercharging Imbalanced Data Learning With Causal Representation Transfer , 2020, ArXiv.

[2]  Fan Li,et al.  Evaluating the causal effects of cellphone distraction on crash risk using propensity score methods. , 2020, Accident; analysis and prevention.

[3]  Linbin Zhang,et al.  A Class Imbalance Loss for Imbalanced Object Recognition , 2020, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing.

[4]  A. Lopes,et al.  Rare and extreme events: the case of COVID-19 pandemic , 2020, Nonlinear dynamics.

[5]  Chenyang Tao,et al.  Variational learning of individual survival distributions , 2020, CHIL.

[6]  Lawrence Carin,et al.  Survival cluster analysis , 2020, CHIL.

[7]  Brian McCloskey,et al.  SARS to novel coronavirus – old lessons and new lessons , 2020, Epidemiology and Infection.

[8]  G. Leung,et al.  Nowcasting and forecasting the potential domestic and international spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study , 2020, The Lancet.

[9]  Ting Yu,et al.  Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study , 2020, The Lancet.

[10]  P. Vollmar,et al.  Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany , 2020, The New England journal of medicine.

[11]  Jing Zhao,et al.  Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia , 2020, The New England journal of medicine.

[12]  Moritz U G Kraemer,et al.  Potential for global spread of a novel coronavirus from China , 2020, Journal of travel medicine.

[13]  Z. Memish,et al.  The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health — The latest 2019 novel coronavirus outbreak in Wuhan, China , 2020, International Journal of Infectious Diseases.

[14]  Rebecca C. Steorts,et al.  Development, Implementation, and Evaluation of an In-Hospital Optimized Early Warning Score for Patient Deterioration , 2020, MDM policy & practice.

[15]  Antoine Wehenkel,et al.  Unconstrained Monotonic Neural Networks , 2019, BNAIC/BENELEARN.

[16]  Jiaoe Wang,et al.  The Evolution of China’s International Aviation Markets from a Policy Perspective on Air Passenger Flows , 2019, Sustainability.

[17]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[18]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[19]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Graham Neubig,et al.  Lagging Inference Networks and Posterior Collapse in Variational Autoencoders , 2019, ICLR.

[21]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[22]  Le Song,et al.  Coupled Variational Bayes via Optimization Embedding , 2018, NeurIPS.

[23]  Cesare Alippi,et al.  Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[24]  D. Brautigam,et al.  What Kinds of Chinese ‘Geese’ Are Flying to Africa? Evidence from Chinese Manufacturing Firms , 2018, Journal of African Economies.

[25]  Alexander Binder,et al.  Deep One-Class Classification , 2018, ICML.

[26]  E. Candès,et al.  A modern maximum-likelihood theory for high-dimensional logistic regression , 2018, Proceedings of the National Academy of Sciences.

[27]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[28]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[30]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[32]  Paul M. Thompson,et al.  Generalized reduced rank latent factor regression for high dimensional tensor fields, and neuroimaging-genetic applications , 2017, NeuroImage.

[33]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[35]  T. Allen,et al.  FLIRT-ing with Zika: A Web Application to Predict the Movement of Infected Travelers Validated Against the Current Zika Virus Epidemic , 2016, PLoS currents.

[36]  Kalliopi Mylona,et al.  Meta‐analysis of clinical trials with rare events , 2015, Biometrical journal. Biometrische Zeitschrift.

[37]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[38]  Zhigang Hu,et al.  Tracing The Largest Seasonal Migration on Earth , 2014, 1411.0983.

[39]  Stefan Sperlich,et al.  Generalized Additive Models , 2014 .

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Trevor Hastie,et al.  LOCAL CASE-CONTROL SAMPLING: EFFICIENT SUBSAMPLING IN IMBALANCED DATA SETS. , 2013, Annals of statistics.

[42]  Yiqiang Chen,et al.  Weighted extreme learning machine for imbalance learning , 2013, Neurocomputing.

[43]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jesse Davis,et al.  Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation , 2012, ICML.

[45]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[46]  Michael J. Pencina,et al.  Arterial Stiffness and Cardiovascular Events: The Framingham Heart Study , 2010, Circulation.

[47]  Yue-Shi Lee,et al.  Cluster-based under-sampling approaches for imbalanced data distributions , 2009, Expert Syst. Appl..

[48]  Karsten M. Borgwardt,et al.  Covariate Shift by Kernel Mean Matching , 2009, NIPS 2009.

[49]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[50]  Dirk P. Kroese,et al.  An Efficient Algorithm for Rare-event Probability Estimation, Combinatorial Optimization, and Counting , 2008 .

[51]  Lih-Yuan Deng,et al.  The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning , 2006, Technometrics.

[52]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[53]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[54]  Christopher H. Schmid,et al.  Multivariate Classification Rules: Calibration and Discrimination , 2005 .

[55]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[56]  S. Coles,et al.  An Introduction to Statistical Modeling of Extreme Values , 2001 .

[57]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[58]  Gary M. Weiss,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[59]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[60]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[61]  P. Diggle,et al.  Additive isotonic regression models in epidemiology. , 2000, Statistics in medicine.

[62]  Jean-François Richard,et al.  Methods of Numerical Integration , 2000 .

[63]  J. Samet,et al.  The Sleep Heart Health Study: design, rationale, and methods. , 1997, Sleep.

[64]  Joseph Sill,et al.  Monotonic Networks , 1997, NIPS.

[65]  A. McNeil Estimating the Tails of Loss Severity Distributions Using Extreme Value Theory , 1997, ASTIN Bulletin.

[66]  O.K. Ersoy,et al.  Neural network learning of low-probability events , 1996, IEEE Transactions on Aerospace and Electronic Systems.

[67]  J. Hüsler,et al.  Laws of Small Numbers: Extremes and Rare Events , 1994 .

[68]  Philip Heidelberger,et al.  Fast simulation of rare events in queueing and reliability models , 1993, TOMC.

[69]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[70]  Richard L. Smith,et al.  Models for exceedances over high thresholds , 1990 .

[71]  Peter Bacchetti,et al.  Additive Isotonic Models , 1989 .

[72]  Hari Mukerjee,et al.  Monotone Nonparametric Regression , 1988 .

[73]  Michel C. Jeruchim,et al.  Developments in the Theory and Application of Importance Sampling , 1987, IEEE Trans. Commun..

[74]  J. Friedman,et al.  Estimating Optimal Transformations for Multiple Regression and Correlation. , 1985 .

[75]  J. Hammersley SIMULATION AND THE MONTE CARLO METHOD , 1982 .

[76]  Francisco J. Aranda-Ordaz,et al.  On Two Families of Transformations to Additivity for Binary Response Data , 1981 .

[77]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[78]  L. Haan,et al.  Residual Life Time at Great Age , 1974 .

[79]  M. Bryson Heavy-Tailed Distributions: Properties and Tests , 1974 .

[80]  H. D. Brunk,et al.  Statistical inference under order restrictions : the theory and application of isotonic regression , 1973 .

[81]  Lawrence Carin,et al.  Reconsidering Generative Objectives For Counterfactual Reasoning , 2020, NeurIPS.

[82]  Jianfeng Feng,et al.  On Fenchel Mini-Max Learning , 2019, NeurIPS.

[83]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[84]  Rebecca C. Steorts,et al.  Minimal Impact of Implemented Early Warning Score and Best Practice Alert for Patient Deterioration* , 2019, Critical care medicine.

[85]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[86]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[87]  E. Feuer,et al.  Cancer survival among adults: US SEER Program, 1988-2001: patient and tumor characteristics. , 2007 .

[88]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[89]  Robert C. Holte,et al.  C4.5, Class Imbalance, and Cost Sensitivity: Why Under-Sampling beats Over-Sampling , 2003 .

[90]  J. Palous,et al.  Machine Learning and Data Mining , 2002 .

[91]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[92]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[93]  Daniel McFadden,et al.  Modelling the Choice of Residential Location , 1977 .

[94]  J. Pickands Statistical Inference Using Extreme Order Statistics , 1975 .