Evaluation of computational programs to predict HLA genotypes from genomic sequencing data

Abstract Motivation Despite being essential for numerous clinical and research applications, high-resolution human leukocyte antigen (HLA) typing remains challenging and laboratory tests are also time-consuming and labour intensive. With next-generation sequencing data becoming widely accessible, on-demand in silico HLA typing offers an economical and efficient alternative. Results In this study we evaluate the HLA typing accuracy and efficiency of five computational HLA typing methods by comparing their predictions against a curated set of > 1000 published polymerase chain reaction-derived HLA genotypes on three different data sets (whole genome sequencing, whole exome sequencing and transcriptomic sequencing data). The highest accuracy at clinically relevant resolution (four digits) we observe is 81% on RNAseq data by PHLAT and 99% accuracy by OptiType when limited to Class I genes only. We also observed variability between the tools for resource consumption, with runtime ranging from an average of 5 h (HLAminer) to 7 min (seq2HLA) and memory from 12.8 GB (HLA-VBSeq) to 0.46 GB (HLAminer) per sample. While a minimal coverage is required, other factors also determine prediction accuracy and the results between tools do not correlate well. Therefore, by combining tools, there is the potential to develop a highly accurate ensemble method that is able to deliver fast, economical HLA typing from existing sequencing data.

[1]  T E Klein,et al.  Clinical Pharmacogenetics Implementation Consortium (CPIC) Guidelines for CYP2C9 and HLA-B Genotype and Phenytoin Dosing , 2014 .

[2]  P. Kwan,et al.  Association between HLA‐B*1502 Allele and Antiepileptic Drug‐Induced Cutaneous Reactions in Han Chinese , 2007, Epilepsia.

[3]  James Robinson,et al.  The IPD and IMGT/HLA database: allele variant databases , 2014, Nucleic Acids Res..

[4]  M. Ni,et al.  Inference of high resolution HLA types using genome-wide RNA or DNA sequencing reads , 2014, BMC Genomics.

[5]  H. Kim,et al.  HLA Haplotyping from RNA-seq Data Using Hierarchical Read Weighting , 2013, PloS one.

[6]  Loren Gragert,et al.  Measuring Ambiguity in HLA Typing Methods , 2012, PloS one.

[7]  J. Noble,et al.  Concordance of next generation sequence-based and sequence specific oligonucleotide probe-based HLA-DRB1 genotyping. , 2015, Human immunology.

[8]  Pedro G. Ferreira,et al.  Transcriptome and genome sequencing uncovers functional variation in humans , 2013, Nature.

[9]  N. Lennon,et al.  Next-generation sequencing for HLA typing of class I loci , 2011, BMC Genomics.

[10]  S. Krishnakumar,et al.  High-throughput, high-fidelity HLA genotyping with deep sequencing , 2012, Proceedings of the National Academy of Sciences.

[11]  M. Kamoun,et al.  HLA Class I typing of volunteers for a bone marrow registry: QC analysis by DNA-based methodology identifies serological typing discrepancies in the assignment of HLA-A and B antigens. , 2002, Tissue antigens.

[12]  J. Castle,et al.  HLA typing from RNA-Seq sequence reads , 2012, Genome Medicine.

[13]  Pardis C Sabeti,et al.  A high-resolution HLA and SNP haplotype map for disease association studies in the extended human MHC , 2006, Nature Genetics.

[14]  W. Bodmer,et al.  Nomenclature for factors of the HLA system, 2010 , 2010, Tissue antigens.

[15]  P. Parham,et al.  16th IHIW : Review of HLA typing by NGS , 2013, International journal of immunogenetics.

[16]  Huanming Yang,et al.  Deep sequencing of the MHC region in the Chinese population contributes to studies of complex disease , 2016, Nature Genetics.

[17]  J. Schmitz,et al.  HLA Typing Using Molecular Methods , 2006 .

[18]  Szilveszter Juhos,et al.  HLA Typing from 1000 Genomes Whole Genome and Whole Exome Illumina Data , 2013, PloS one.

[19]  T. Karlsen,et al.  Development of a high-resolution NGS-based HLA-typing and analysis pipeline , 2015, Nucleic acids research.

[20]  K. Cibulskis,et al.  Detection of somatic mutations in human leukocyte antigen (HLA) genes using whole-exome sequencing , 2015 .

[21]  Ituro Inoue,et al.  Phase-defined complete sequencing of the HLA genes by next-generation sequencing , 2013, BMC Genomics.

[22]  Knut Reinert,et al.  RazerS 3: Faster, fully sensitive read mapping , 2012, Bioinform..

[23]  M. Zody,et al.  ATHLATES: accurate typing of human leukocyte antigen through exome sequencing , 2013, Nucleic acids research.

[24]  P. Sham,et al.  HLAreporter: a tool for HLA typing from next generation sequencing data , 2015, Genome Medicine.

[25]  T. Williams Human leukocyte antigen gene polymorphism and the histocompatibility laboratory. , 2001, The Journal of molecular diagnostics : JMD.

[26]  Masao Nagasaki,et al.  HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data , 2015, BMC Genomics.

[27]  Richard A. Moore,et al.  Derivation of HLA types from shotgun sequence datasets , 2012, Genome Medicine.

[28]  Benjamin Schubert,et al.  OptiType: precision HLA typing from next-generation sequencing data , 2014, Bioinform..

[29]  P. Kwan,et al.  HLA‐B alleles associated with severe cutaneous reactions to antiepileptic drugs in Han Chinese , 2013, Epilepsia.

[30]  N. Schmitz,et al.  Filgrastim-mobilized peripheral blood progenitor cells versus bone marrow transplantation for treating leukemia: 3-year results from the EBMT randomized trial. , 2005, Haematologica.

[31]  H. Erlich,et al.  HLA DNA typing: past, present, and future. , 2012, Tissue antigens.

[32]  Buhm Han,et al.  Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens , 2013, PloS one.

[33]  P. Kwan,et al.  New testing approach in HLA genotyping helps overcome barriers in effective clinical practice. , 2009, Clinical chemistry.

[34]  Sue Povey,et al.  Gene map of the extended human MHC , 2004, Nature Reviews Genetics.

[35]  B S Weir,et al.  HIBAG—HLA genotype imputation with attribute bagging , 2013, The Pharmacogenomics Journal.

[36]  M. Feolo,et al.  HLA Diversity in the 1000 Genomes Dataset , 2014, PloS one.