Applying Triple-Matrix Masking for Privacy Preserving Data Collection and Sharing in HIV Studies.

BACKGROUND Many HIV research projects are plagued by the high missing rate of selfreported information during data collection. Also, due to the sensitive nature of the HIV research data, privacy protection is always a concern for data sharing in HIV studies. METHODS This paper applies a data masking approach, called triple-matrix masking [1], to the context of HIV research for ensuring privacy protection during the process of data collection and data sharing. RESULTS Using a set of generated HIV patient data, we show step by step how the data are randomly transformed (masked) before leaving the patients' individual data collection device (which ensures that nobody sees the actual data) and how the masked data are further transformed by a masking service provider and a data collector. We demonstrate that the masked data retain statistical utility of the original data, yielding the exactly same inference results in the planned logistic regression on the effect of age on the adherence to antiretroviral therapy and in the Cox proportional hazard model for the age effect on time to viral load suppression. CONCLUSION Privacy-preserving data collection method may help resolve the privacy protection issue in HIV research. The individual sensitive data can be completely hidden while the same inference results can still be obtained from the masked data, with the use of common statistical analysis methods.

[1]  James P. Kelly,et al.  Balancing Quality and Confidentiality for Multivariate Tabular Data , 2004, Privacy in Statistical Databases.

[2]  D F Phillips,et al.  Institutional Review Boards under stress: will they explode or change? , 1996, JAMA.

[3]  Kun Liu,et al.  Random projection-based multiplicative data perturbation for privacy preserving distributed data mining , 2006, IEEE Transactions on Knowledge and Data Engineering.

[4]  M. Antoni,et al.  The importance of cognitive self-report in early HIV-1 infection: validation of a cognitive functional status subscale , 2002, AIDS.

[5]  Jay-J. Kim A METHOD FOR LIMITING DISCLOSURE IN MICRODATA BASED ON RANDOM NOISE AND , 2002 .

[6]  Deepalika Chakravarty,et al.  Relationship characteristics associated with sexual risk behavior among MSM in committed relationships. , 2012, AIDS patient care and STDs.

[7]  Stephen E. Fienberg,et al.  Random orthogonal matrix masking methodology for microdata release , 2008, Int. J. Inf. Comput. Secur..

[8]  Ulf Böckenholt,et al.  Item Randomized-Response Models for Measuring Noncompliance: Risk-Return Perceptions, Social Influences, and Self-Protective Responses , 2007 .

[9]  Ruiguang Song,et al.  Direct and Unbiased Multiple Imputation Methods for Missing Values of Categorical Variables , 2021 .

[10]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[11]  Evan Wood,et al.  Young age predicts poor antiretroviral adherence and viral load suppression among injection drug users. , 2012, AIDS patient care and STDs.

[12]  Long Zhang,et al.  A New Data Collection Technique for Preserving Privacy , 2018, J. Priv. Confidentiality.

[13]  S L Warner,et al.  Randomized response: a survey technique for eliminating evasive answer bias. , 1965, Journal of the American Statistical Association.

[14]  C. Winslow,et al.  Sexual Behavior in the Human Male , 1948 .

[15]  M. Mimiaga,et al.  A systematic review of behavioral and treatment outcome studies among HIV-infected men who have sex with men who abuse crystal methamphetamine. , 2012, AIDS patient care and STDs.

[16]  J. Bauermeister,et al.  Individual and contextual factors of sexual risk behavior in youth perinatally infected with HIV. , 2012, AIDS patient care and STDs.

[17]  Michael Schomaker,et al.  Mortality in Patients with HIV-1 Infection Starting Antiretroviral Therapy in South Africa, Europe, or North America: A Collaborative Analysis of Prospective Studies , 2014, PLoS medicine.

[18]  S. Freguia,et al.  SEXUAL BEHAVIOR IN THE HUMAN FEMALE , 1955 .

[19]  Rathindra Sarathy,et al.  Data Shuffling - A New Masking Approach for Numerical Data , 2006, Manag. Sci..

[20]  Jim Burridge,et al.  Information preserving statistical obfuscation , 2003, Stat. Comput..

[21]  Ruiguang Song,et al.  Risk Factor Redistribution of the National HIV/AIDS Surveillance Data: An Alternative Approach , 2008, Public health reports.

[22]  R. Ness Influence of the HIPAA Privacy Rule on health research. , 2007, JAMA.

[23]  Stefan Stremersch,et al.  Analysis of sensitive questions across cultures: an application of multigroup item randomized response theory to sexual attitudes and behavior. , 2012, Journal of personality and social psychology.

[24]  David M Rindskopf,et al.  Assessing the Consequences of Using Self-report Data to Determine the Correlates of HIV Status: Conditional and Marginal Approaches , 2003, Multivariate behavioral research.

[25]  Ada Hamosh,et al.  Problematic variation in local institutional review of a multicenter genetic epidemiology study. , 2003, JAMA.