Regional surnames and genetic structure in Great Britain

Following the increasing availability of DNA‐sequenced data, the genetic structure of populations can now be inferred and studied in unprecedented detail. Across social science, this innovation is shaping new bio‐social research agendas, attracting substantial investment in the collection of genetic, biological and social data for large population samples. Yet genetic samples are special because the precise populations that they represent are uncertain and ill‐defined. Unlike most social surveys, a genetic sample's representativeness of the population cannot be established by conventional procedures of statistical inference, and the implications for population‐wide generalisations about bio‐social phenomena are little understood. In this paper, we seek to address these problems by linking surname data to a censored and geographically uneven sample of DNA scans, collected for the People of the British Isles study. Based on a combination of global and local spatial correspondence measures, we identify eight regions in Great Britain that are most likely to represent the geography of genetic structure of Great Britain's long‐settled population. We discuss the implications of this regionalisation for bio‐social investigations. We conclude that, as the often highly selective collection of DNA and biomarkers becomes a more common practice, geography is crucial to understanding variation in genetic information within diverse populations.

[1]  P. Longley,et al.  Family names as indicators of Britain’s changing regional geography , 2009 .

[2]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[3]  D. F. Roberts,et al.  The History and Geography of Human Genes , 1996 .

[4]  G. Lasker Isonymy (recurrence of the same surnames in affinal relatives): a comparison of rates calculated from pedigrees, grave markers and death and birth registers. , 1969, Human biology.

[5]  Pedro M. Valero-Mora,et al.  ggplot2: Elegant Graphics for Data Analysis , 2010 .

[6]  Paul A. Longley,et al.  Creating a regional geography of Britain through the spatial analysis of surnames , 2011 .

[7]  D. Falush,et al.  Inference of Population Structure using Dense Haplotype Data , 2012, PLoS genetics.

[8]  I. Barrai,et al.  Surname frequency and the isonymy structure of Venezuela , 2000, American journal of human biology : the official journal of the Human Biology Council.

[9]  D. Goldstein,et al.  A Y Chromosome Census of the British Isles , 2003, Current Biology.

[10]  J. Cheshire Analysing surnames as geographic data. , 2014, Journal of anthropological sciences = Rivista di antropologia : JASS.

[11]  P. Mateos A review of name-based ethnicity classification methods and their potential in population studies , 2007 .

[12]  J. Burt,et al.  Elementary statistics for geographers , 1995 .

[13]  W. Bodmer,et al.  Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[14]  A. Magurran,et al.  Measuring Biological Diversity , 2004 .

[15]  Lasker Gw Isonymy (recurrence of the same surnames in affinal relatives): a comparison of rates calculated from pedigrees, grave markers and death and birth registers. , 1969 .

[16]  P. Donnelly,et al.  People of the British Isles: preliminary analysis of genotypes and surnames in a UK-control population , 2011, European Journal of Human Genetics.

[17]  Adrian Baddeley,et al.  spatstat: An R Package for Analyzing Spatial Point Patterns , 2005 .

[18]  M. Pirinen,et al.  The fine-scale genetic structure of the British population , 2015, Nature.

[19]  M A Jobling,et al.  In the name of the father: surnames and genetics. , 2001, Trends in genetics : TIG.

[20]  Colin W. Rundel,et al.  Interface to Geometry Engine - Open Source (GEOS) , 2015 .

[21]  R. Cann The history and geography of human genes , 1995, The Journal of Asian Studies.

[22]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[23]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[24]  Paul A. Longley,et al.  Identifying spatial concentrations of surnames , 2012, Int. J. Geogr. Inf. Sci..

[25]  G. Lasker A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. , 1977, Human biology.

[26]  I. Barrai,et al.  Surnames in Honduras: A Study of the Population of Honduras through Isonymy , 2014, Annals of human genetics.

[27]  Franz Manni,et al.  Isonymy and Isolation by Distance in the Netherlands , 2002, Human biology.

[28]  E. Devor Surnames and genetic structure. , 1986 .

[29]  D. Pettener,et al.  General Method to Unravel Ancient Population Structures through Surnames, Final Validation on Italian Data , 2012, Human biology.

[30]  Jorge Mateu,et al.  Hybrids of Gibbs Point Process Models and Their Implementation , 2013 .

[31]  R. Bivand,et al.  Tools for Reading and Handling Spatial Objects , 2016 .

[32]  Humphrey Southall,et al.  Rebuilding the Great Britain Historical GIS, Part 2: A Geo-Spatial Ontology of Administrative Units , 2012 .

[33]  Humphrey Southall,et al.  Rebuilding the Great Britain Historical GIS, Part 3:Integrating Qualitative Content for a Sense of Place , 2014 .

[34]  C. Raffoux,et al.  Analysis of the French National Registry of unrelated bone marrow donors, using surnames as a tool for improving geographical localisation of HLA haplotypes , 2003, European Journal of Human Genetics.

[35]  T. Nakaya,et al.  Japanese surname regions , 2014 .

[36]  Genome geographies: mapping national ancestry and diversity in human population genetics , 2013 .

[37]  L. Cardon,et al.  Association study designs for complex diseases , 2001, Nature Reviews Genetics.

[38]  G. Kirov,et al.  Population structure and genome-wide patterns of variation in Ireland and Britain , 2010, European Journal of Human Genetics.

[39]  Laura J. Scott,et al.  Stratifying Type 2 Diabetes Cases by BMI Identifies Genetic Risk Variants in LAMA1 and Enrichment for Risk Variants in Lean Compared to Obese Cases , 2012, PLoS genetics.

[40]  Jonathan Burton,et al.  Implementing the biosocial component of Understanding Society – nurse collection of biomeasures , 2012 .

[41]  J. Dipierri,et al.  A Study of the Population of Paraguay through Isonymy , 2011, Annals of human genetics.

[42]  Hadley Wickham,et al.  Reshaping Data with the reshape Package , 2007 .

[43]  D. Hartl,et al.  Principles of population genetics , 1981 .

[44]  Kurt Hornik,et al.  A CLUE for CLUster Ensembles , 2005 .

[45]  I. Barrai,et al.  Isonymy and the genetic structure of Switzerland. II. Isolation by distance. , 1998, Annals of human biology.

[46]  Simon C. Potter,et al.  Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls , 2007, Nature.

[47]  E. Minikel,et al.  Ascertainment bias causes false signal of anticipation in genetic prion disease. , 2014, American journal of human genetics.

[48]  I. Barrai,et al.  Isonymy structure of USA population. , 2001, American journal of physical anthropology.

[49]  C. Tyler-Smith,et al.  Where west meets east: the complex mtDNA landscape of the southwest and Central Asian corridor. , 2004, American journal of human genetics.

[50]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[51]  P. Hanks,et al.  The Family Name as Socio-Cultural Feature and Genetic Metaphor: From Concepts to Methods , 2012, Human biology.

[52]  Ahmed Albatineh,et al.  On Similarity Indices and Correction for Chance Agreement , 2006, J. Classif..

[53]  Shameek Biswas,et al.  Genome-wide insights into the patterns and determinants of fine-scale population structure in humans. , 2009, American journal of human genetics.

[54]  K. Clayton,et al.  Transactions of the Institute of British Geographers , 1959 .