Improving Single-Nucleotide Polymorphism-Based Fetal Fraction Estimation of Maternal Plasma Circulating Cell-Free DNA Using Bayesian Hierarchical Models

The recent advances in next-generation sequencing (NGS) technologies have enabled the development of effective high-throughput noninvasive prenatal screening (NIPS) assays for fetal genetic abnormalities using maternal circulating cell-free DNA (ccfDNA). An important NIPS quality assurance is quantifying the fetal proportion of the sampled ccfDNA. For methods using allelic read count ratios from targeted sequencing of single-nucleotide polymorphisms (SNPs), systematic biases and errors may reduce accuracy and diminish assay performance. We collected ccfDNA NIPS MiSeq sequencing data from an amplicon-based 92 SNP panel along with complementary low-depth whole-genome sequencing (WGS) on 243 normal male fetus pregnancies along with additional 144 nonpregnant female donor samples. Using fetal fraction estimates based on X and Y chromosome WGS coverage as gold standard, we compared an existing SNP-based approach, FetalQuant, to a more flexible Bayesian hierarchical modeling strategy that borrows information across interrogated SNPs to character SNP-level error rates and biases to improve fetal fraction estimates. Posterior distributions for SNP-level model parameters indicate most SNPs exhibited modest to moderate extrabinomial variation and a consistent underrepresentation of fetal alleles, with some extreme outliers in both regards. Fetal fraction estimates using FetalQuant, naive to these SNP properties, were relatively poor (R2 = 0.14, root mean squared error [RMSE] = 0.050), particularly when the true fetal fraction was low (<5%). In contrast, by quantifying SNP-level biases and error rates, our proposed approach demonstrated improved performance by reducing the bias and variability in fetal fraction estimates (R2 = 0.794, RMSE = 0.025). Using high-depth targeted SNP sequencing data, we identified a high degree of variability in distributional properties across SNP allelic read counts. These results highlight the benefits of leveraging hierarchical modeling for SNP-based fetal quantification assays (FQAs) and the need to properly calibrate FQAs dependent on NGS allelic ratio data.