Detecting de novo Insertions of Transposable Elements in the Human Genome

Date

2024

Authors

McConnell, Joseph John

Editors

Advisors

Adelson, David
Kortschak, R Daniel

Journal Title

Journal ISSN

Volume Title

Type:

Thesis

Citation

Statement of Responsibility

Conference Name

Abstract

Transposable elements (TEs) are genetic sequences capable of self-replication and insertion into new genomic locations, contributing to genetic diversity and evolution. First discovered by Barbara McClintock in maize, these elements comprise an estimated 45% of the human genome. While mostly repressed, there are approximately 100 potentially active TEs in the human genome, which sporadically enter the germline. TE activity can disrupt gene function, create new regulatory elements, or cause chromosomal rearrangements. The advent of long-read sequencing has enabled renewed analysis of TE activity. This study aims to develop methods for detecting de novo TE insertions in the human genome from long reads. Our objectives include: 1. Generating insertion profiles of TEs. 2. Simulating TE insertions to evaluate detectability at various allelic rates. 3. Assessing error correction for improving the detection of TE's. 4. Benchmarking TE detection using a truth set of known variants. 5. Developing methods to detect de novo TE insertions in parent-child trios. A pseudo genome consisting of a pair of two human CHM13 reference chromosomes was constructed, and a set of TEs were inserted having a range of profiles. Error free and with error sets of reads were simulated, having varied allelic ratios of heterozygous insertions. Performance of alternative aligners and variant callers on the simulated read sets revealed traits to help guide subsequent real data set analysis. Error correction of the with-error read set indicated improvements in assembly, however there was marginal improvement in variant calling. Transitioning to a long read human data set from the Genome in a Bottle Consortium increased the complexity of results and interpretation. Benchmarking variant calling against the manually curated data for this genome revealed combining results improved detection accuracy but at the cost of false calls. To conclude our study, we obtained three sets of parent child trio long read data, and building on our previous results, methods were developed to detect de novo TE insertions, and one probable candidate was found. The developed methodology identifies potential de novo TE insertions within parent-child trios, considering mosaic appearances in parental genomes and heterozygous or mosaic presentations in offspring. This approach advances methods for estimation of TE propagation rates in human populations and further uncovers associated challenges. As additional trio genome data becomes available, our framework can serve as a basis for refining the rate of transposable elements propagating in the human population. Moreover, our methodologies can be adapted for targeted variant calling and simulation configurations, providing a versatile framework for future genomic studies involving TEs.

School/Discipline

School of Biological Sciences

Dissertation Note

Thesis (MPhil) -- University of Adelaide, School of Biological Sciences, 2024

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record