Allelic bias when performing in-solution enrichment of ancient human DNA

Files

hdl_139555.pdf (4.47 MB)
  (Published version)

Date

2023

Authors

Davidson, R.
Williams, M.P.
Roca-Rada, X.
Kassadjikova, K.
Tobler, R.
Fehren-Schmitz, L.
Llamas, B.

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Journal article

Citation

Molecular Ecology Resources, 2023; 23(8):1823-1840

Statement of Responsibility

Roberta Davidson, Matthew P. Williams, Xavier Roca-Rada, Kalina Kassadjikova, Raymond Tobler, Lars Fehren-Schmitz, Bastien Llamas

Conference Name

Abstract

In-solution hybridisation enrichment of genetic variation is a valuable methodology in human paleogenomics. It allows enrichment of endogenous DNA by targeting genetic markers that are comparable between sequencing libraries. Many studies have used the 1240k reagent-which enriches 1,237,207 genome-wide SNPs-since 2015, though access was restricted. In 2021, Twist Biosciences and Daicel Arbor Biosciences independently released commercial kits that enabled all researchers to perform enrichments for the same 1240 k SNPs. We used the Daicel Arbor Biosciences Prime Plus kit to enrich 132 ancient samples from three continents. We identified a systematic assay bias that increases genetic similarity between enriched samples and that cannot be explained by batch effects. We present the impact of the bias on population genetics inferences (e.g. Principal Components Analysis, ƒ-statistics) and genetic relatedness (READ). We compare the Prime Plus bias to that previously reported of the legacy 1240k enrichment assay. In ƒ-statistics, we find that all Prime-Plus-generated data exhibit artefactual excess shared drift, such that within-continent relationships cannot be correctly determined. The bias is more subtle in READ, though interpretation of the results can still be misleading in specific contexts. We expect the bias may affect analyses we have not yet tested. Our observations support previously reported concerns for the integration of different data types in paleogenomics. We also caution that technological solutions to generate 1240k data necessitate a thorough validation process before their adoption in the paleogenomic community.

School/Discipline

Dissertation Note

Provenance

Description

First published: 15 September 2023

Access Status

Rights

© 2023 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd. This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

License

Call number

Persistent link to this record