Health Insurance Market Risk Assessment: Covariate Shift and k-Anonymity

Health insurance companies prefer to enter new markets in which individuals likely to enroll in their plans have a low annual cost. When deciding which new markets to enter, health cost data for the new markets is unavailable to them, but health cost data for their own enrolled members is available. To address the problem of assessing risk in new markets, i.e., estimating the cost of likely enrollees, we pose a regression problem with demographic data as predictors combined with a novel three-population covariate shift. Since this application deals with health data that is protected by privacy laws, we cannot use the raw data of the insurance company’s members directly for training the regression and covariate shift. Therefore, to construct a full solution, we also develop a novel method to achieve k-anonymity with the workload-driven quality of data distribution preservation achieved through dithered quantization and Rosenblatt’s transformation. We illustrate the efficacy of the solution using real-world, publicly available data.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Klaus-Robert Müller,et al.  Covariate Shift Adaptation by Importance Weighted Cross Validation , 2007, J. Mach. Learn. Res..

[3]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[4]  David L. Neuhoff,et al.  Quantization , 2022, IEEE Trans. Inf. Theory.

[5]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[6]  M. Rosenblatt Remarks on a Multivariate Transformation , 1952 .

[7]  John Vanderkooy,et al.  Quantization and Dither: A Theoretical Survey , 1992 .

[8]  Ulrike von Luxburg,et al.  Density-preserving quantization with application to graph downsampling , 2014, COLT.

[9]  W. Bastiaan Kleijn,et al.  Distribution Preserving Quantization With Dithering and Transformation , 2010, IEEE Signal Processing Letters.

[10]  Willard G. Manning,et al.  Issues for the Next Generation of Health Care Cost Analyses , 2009, Medical care.

[11]  David G. Messerschmitt,et al.  Quantizing for maximum output entropy (Corresp.) , 1971, IEEE Trans. Inf. Theory.

[12]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[13]  Jordi Forné,et al.  A modification of the Lloyd algorithm for k-anonymous quantization , 2013, Inf. Sci..

[14]  Jun Wang,et al.  Privacy and Regression Model Preserved Learning , 2014, AAAI.

[15]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[16]  Qi Wang,et al.  On the privacy preserving properties of random data perturbation techniques , 2003, Third IEEE International Conference on Data Mining.

[17]  Michael J. Swain,et al.  Color indexing , 1991, International Journal of Computer Vision.

[18]  Karthikeyan Natesan Ramamurthy,et al.  Multiplicative regression via constrained least squares , 2014, 2014 IEEE Workshop on Statistical Signal Processing (SSP).

[19]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[20]  Bradley Malin,et al.  Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule , 2011, J. Am. Medical Informatics Assoc..

[21]  J. Aitchison On the Distribution of a Positive Random Variable Having a Discrete Probability Mass at the Origin , 1955 .

[22]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[23]  D Y Lin,et al.  Methods for analyzing health care utilization and costs. , 1999, Annual review of public health.