Erratum to: Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities

Commercial health plans need member racial/ethnic information to address disparities, but often lack it. We incorporate the U.S. Census Bureau’s latest surname list into a previous Bayesian method that integrates surname and geocoded information to better impute self-reported race/ethnicity. We validate this approach with data from 1,921,133 enrollees of a national health plan. Overall, the new approach correlated highly with self-reported race-ethnicity (0.76), which is 19% more efficient than its predecessor (and 41% and 108% more efficient than single-source surname and address methods, respectively, P < 0.05 for all). The new approach has an overall concordance statistic (area under the Receiver Operating Curve or ROC) of 0.93. The largest improvements were in areas where prior performance was weakest (for Blacks and Asians). The new Census surname list accounts for about three-fourths of the variance explained in the new estimates. Imputing Native American and multiracial identities from surname and residence remains challenging.

[1]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[2]  Yi Zeng,et al.  Causes and implications of the recent increase in the reported sex ratio at birth in China. , 1993 .

[3]  Surname analysis for estimating local concentration of Hispanics and Asians , 1994 .

[4]  R. Jirousek,et al.  On the effective implementation of the iterative proportional fitting procedure , 1995 .

[5]  Diane S. Lauderdale,et al.  Asian American ethnic identification by surname , 2000 .

[6]  B. Smedley,et al.  Unequal Treatment: Con-fronting Racial and Ethnic Disparities in Health Care , 2002 .

[7]  Nathaniel Schenker,et al.  From single‐race reporting to multiple‐race reporting: using imputation methods to bridge the transition , 2003, Statistics in medicine.

[8]  E. Perrin,et al.  Eliminating Health Disparities: Measurement and Data Needs , 2004 .

[9]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[10]  J. Escarce,et al.  Use of geocoding in managed care settings to identify quality disparities. , 2005, Health affairs.

[11]  K. Fiscella,et al.  Use of geocoding and surname analysis to estimate race and ethnicity. , 2006, Health services research.

[12]  M. Elliott,et al.  Sample designs for measuring the health of small racial/ethnic subgroups , 2008, Statistics in medicine.

[13]  M. Elliott,et al.  A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity. , 2008, Health services research.

[14]  Daniel F McCaffrey,et al.  Power of tests for a dichotomous independent variable measured with error. , 2008, Health services research.

[15]  M. Elliott,et al.  Composite Estimates from Incomplete and Complete Frames for Minimum-Mse Estimation in a Rare Population An Application to Families with Young Children , 2009 .

[16]  N. Denton,et al.  Hypersegregation in U.S. Metropolitan Areas: Black and Hispanic Segregation Along Five Dimensions , 1989, Demography.