K-mer Based Methods for Measuring and Predicting DNA-Binding Specificity of Transcription Factors

dc.contributor.advisor

Gordân, Raluca

dc.contributor.author

Mielko, Zachery

dc.date.accessioned

2023-06-08T18:23:33Z

dc.date.issued

2023

dc.department

Genetics and Genomics

dc.description.abstract

Transcription factors (TFs) are proteins that bind DNA based on the sequence and structure to regulate gene expression. They are fundamental components of genomic function, present in all known forms of life. Thus, understanding the conditions required for TF-DNA interactions is a longstanding and active field of study. With the advent of comprehensive k-mer based measurements using protein binding microarrays, the binding profiles of hundreds of TFs have been measured. This dissertation addresses two major problems. First, the information from these comprehensive measurements are used to create simplistic models of binding that capture only the high affinity range. In a biological context, weak binding sites are often the most important in developmental and regulatory processes and can be missed by models targeting high affinity binding sites. Second, that the vast majority of measurements are on structurally unmodified DNA. TF binding occurs in complex and dynamic systems where the DNA structure can be significantly altered due to sources such as DNA damage. First, we look at how DNA shape influences binding through the study of UV induced photoproducts, DNA adducts formed from UV light exposure that distort the shape of pyrimidine dinucleotides. We developed a new k-mer based method for measuring TF binding to UV-irradiated DNA, UV-Bind. Using this technology, we find that the UV-induced changes in DNA structure from pyrimidine dinucleotide photoproducts can change the specificity of TFs. Using high-throughput k-mer measurements, we also found non-canonical sequences that show an increase in binding signal after UV-irradiation. We then introduce a new algorithm for calling TF binding sites using k-mers, CtrlF-TF. CtrlF-TF takes high-throughput k-mer measurements from PBMs and outputs aligned, ranked consensus sites that can be searched in a genome. These sites compare favorably to traditional position weight matrix defined sites via in vivo and in vitro benchmarks.

dc.identifier.uri

https://hdl.handle.net/10161/27723

dc.subject

Bioinformatics

dc.subject

Biochemistry

dc.subject

Genetics

dc.subject

K-mer

dc.subject

Photoproduct

dc.subject

Protein-DNA

dc.subject

Transcription factor

dc.subject

UV Damage

dc.title

K-mer Based Methods for Measuring and Predicting DNA-Binding Specificity of Transcription Factors

dc.type

Dissertation

duke.embargo.months

24

duke.embargo.release

2025-05-24T00:00:00Z

Files

Collections