What's in a Name? A Method for Extracting Information about Ethnicity from Names

Questions about racial or ethnic group identity feature centrally in many social science theories, but detailed data on ethnic composition are often difficult to obtain, out of date, or otherwise unavailable. The proliferation of publicly available geocoded person names provides one potential source of such data'if researchers can effectively link names and group identity. This article examines that linkage and presents a methodology for estimating local ethnic or racial composition using the relationship between group membership and person names. Common approaches for linking names and identity groups perform poorly when estimating group proportions. I have developed a new method for estimating racial or ethnic composition from names which requires no classification of individual names. This method provides more accurate estimates than the standard approach and works in any context where person names contain information about group membership. Illustrations from two very different contexts are provided: the United States and the Republic of Kenya.

[1]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[2]  S. Mueller Kenya and the International Criminal Court (ICC): politics, the election and the law , 2014 .

[3]  P. Mateos A review of name-based ethnicity classification methods and their potential in population studies , 2007 .

[4]  Cinderella Omondi YET ANOTHER COMMISSION OF INQUIRY? ANALYZING THE COMMISSION OF INQUIRY INTO THE 2007 POST- ELECTION VIOLENCE IN KENYA:WAKI COMMISSION , 2009 .

[5]  Steven Skiena,et al.  Name-ethnicity classification from open sources , 2009, KDD.

[6]  Donald Goldfarb,et al.  A numerically stable dual method for solving strictly convex quadratic programs , 1983, Math. Program..

[7]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[8]  D. Goldfarb,et al.  Dual and primal-dual methods for solving strictly convex quadratic programs , 1982 .

[9]  D. Greiner Ecological Inference in Voting Rights Act Disputes: Where are We Now, and Where Do We Want to Be? , 2007 .

[10]  J. Klopp,et al.  Violence and Elections: Will Kenya Collapse? , 2007 .

[11]  Kimuli Kasara Separate and Suspicious: Local Social and Political Context and Ethnic Tolerance in Kenya , 2013 .

[12]  Emma Lochery,et al.  Violence and Exodus in Kenya's Rift Valley, 2008: Predictable and Preventable? , 2008 .

[13]  D. Hopkins Politicized Places: Explaining Where and When Immigrants Provoke Local Opposition , 2010, American Political Science Review.

[14]  Kevin P. Byrne,et al.  What’s in a name? Using surnames as data for party research , 2013 .

[15]  A. Coldman,et al.  The classification of ethnic status using name information. , 1988, Journal of epidemiology and community health.

[16]  Ying Lu,et al.  Verbal Autopsy Methods with Multiple Causes of Death , 2008, 0808.0645.

[17]  I. Rosenwaike Surname Analysis as a Means of Estimating Minority Elderly , 1994 .

[18]  Kevin M. Quinn,et al.  R×C ecological inference: bounds, correlations, flexibility and transparency of assumptions , 2009 .

[19]  C. Sriram,et al.  THE BIG FISH WON'T FRY THEMSELVES: CRIMINAL ACCOUNTABILITY FOR POST-ELECTION VIOLENCE IN KENYA , 2012 .

[20]  Raphael Susewind,et al.  What’s in a Name? Probabilistic Inference of Religious Community from South Asian Names , 2015 .

[21]  Ryan D. Enos What the Demolition of Public Housing Teaches Us about the Impact of Racial Threat on Political Behavior , 2016 .

[22]  R. Cook Detection of influential observation in linear regression , 2000 .

[23]  J. Douglas Faires,et al.  Numerical Analysis , 1981 .

[24]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[25]  C. Lee Giles,et al.  Name-Ethnicity Classification and Ethnicity-Sensitive Name Matching , 2012, AAAI.