Inferring the Most Likely Geographical Origin of mtDNA Sequence Profiles

In a number of practical cases it is important to determine the likely geographical origin of an individual or a biological sample. A dead body, old bones or a sample of semen may be available. Information on where the sample might come from can assist investigation or research. The first part of this paper is independent of specific data structure. We formulate the problem as a classification problem. Bayes' theorem allows different sources of information or data to be reconciled conveniently. The main part of the paper involves high dimensional data for which simple, standard methods are not likely to work properly. Mitochondrial DNA (mtDNA) data is a typical example of such data. We propose a procedure involving essentially two steps. First, principal component analysis is used to reduce the dimension of the data. Next, quadratic discriminant analysis performs the actual classification. A cross validation procedure is implemented to select the optimal number of principal components. The importance of using separate data sets for model fitting and testing is emphasized. This method distinguishes well between individuals with a self reported European (Icelandic or German) origin and SE Africans. In this case the error rate is 2.0%.

[1]  Á. Carracedo,et al.  The making of the African mtDNA landscape. , 2002, American journal of human genetics.

[2]  M. Stoneking,et al.  Assessing ethnicity from human mitochondrial DNA types determined by hybridization with sequence-specific oligonucleotides. , 1994, Journal of forensic sciences.

[3]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[4]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[5]  H. Bandelt,et al.  The fingerprint of phantom mutations in mitochondrial DNA data. , 2002, American journal of human genetics.

[6]  R. Ward,et al.  Informativeness of genetic markers for inference of ancestry. , 2003, American journal of human genetics.

[7]  J. Gulcher,et al.  mtDNA and the origin of the Icelanders: deciphering signals of recent population history. , 2000, American journal of human genetics.

[8]  D. Wallace,et al.  mtDNA variation in the South African Kung and Khwe-and their genetic relationships to other African populations. , 2000, American journal of human genetics.

[9]  P. Forster,et al.  An annotated mtDNA database , 2001, International Journal of Legal Medicine.

[10]  H. Bandelt,et al.  Mitochondrial footprints of human expansions in Africa. , 1997, American journal of human genetics.

[11]  P. Forster,et al.  The results of an mtDNA study of 1200 inhabitants of a German village in comparison to other Caucasian databases and its relevance for forensic casework , 2001, International Journal of Legal Medicine.

[12]  S. Sherry,et al.  Patterns of human diversity, within and among continents, inferred from biallelic DNA polymorphisms. , 2002, Genome research.

[13]  I W Evett,et al.  Inferring ethnic origin by means of an STR profile. , 2001, Forensic science international.

[14]  H. Bandelt,et al.  The ancestry of Brazilian mtDNA lineages. , 2000, American journal of human genetics.

[15]  D. Turnbull,et al.  Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA , 1999, Nature Genetics.

[16]  A. Di Rienzo,et al.  Tracing European founder lineages in the Near Eastern mtDNA pool. , 2000, American journal of human genetics.

[17]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .