Somatic Mutagenesis at Transcription Factor Binding Sites
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Whole-genome sequencing has provided maps of somatic mutations for thousands of human tumors across several tissues. These maps have revealed a wide variation in somatic mutation rates across the genome, providing insights into mutagenic processes and tumor evolution. Accurate modeling of these processes and mutation rates is key to identifying potential driver events in cancer genomes. This is especially challenging in the non-coding genome, where the majority of mutations lie, and where identifying signals of selection is more challenging. The non-coding genome contains regions that are recognized by transcription factor (TF) proteins, which bind specific DNA sequences to regulate gene expression.Recent work has reported that binding sites of several TFs show somatic hypermutation, which is not explained by selection processes in the tumor. This has led to the hypothesis that TF binding can be mutagenic, which is a potential confounder for models of selection and mutagenesis in the regulatory genome. Existing studies of hypermutation at TF binding sites have chiefly focused on UV-driven melanomas, where TF binding is thought to increase UV-induced damage and impede the repair of lesions created by this damage. In non-melanoma tumors, binding sites of certain TFs were reported to also show hypermutation. However, mutation patterns for the vast majority of human TFs have not been examined in these tumors. In addition, previous studies have focused primarily on reporting this trend, but do not provide a pathway for further experimental studies to understand and model the underlying mutagenic mechanisms. In this work, we present novel computational and experimental analysis that seeks to characterize mutation patterns in TF binding sites, focused on discerning whether TF binding has an associated mutagenic cost. In Chapter 2, we present a comprehensive, pan-cancer analysis of somatic mutation enrichment at TF binding sites. This work profiles mutation patterns at hundreds of TF binding sites, using somatic mutation maps from 36 ICGC projects across 15 tissue types. We analyze mutation patterns in a context-specific manner, profiling these patterns separately for each TF-tumor pair examined. We model background mutation rates in binding sites as a Poisson-binomial process and use this model to systematically identify TFs that show greater than expected mutational load in their binding sites. Our work represents the most comprehensive analysis of hypermutation at TF binding sites to date. Similar to previous work, we find pervasive hypermutation at TF binding sites in melanomas. However, hypermutation at binding sites is seen across several tumor types, expanding our current knowledge of the potential role TFs might play in mutagenesis in these tumors. Crucially, enrichment patterns at binding sites are highly tissue-specific and TF-specific, suggesting that our systematic evaluation is necessary for understanding TF-induced mutagenesis. Using mutational signature decomposition, we then identify potential mechanisms for the observed enrichment. In addition to inhibition of UV repair, we observe enrichment resulting from diverse mutagenic sources, prominent among which are oxidative damage (SBS 17) in esophageal tumors and deamination (SBS 1), active in various tumors. Overall, our catalog enables the prioritization of specific TFs and DNA repair processes potentially responsible for hypermutation in regulatory DNA, a critical step in designing follow-up studies and mechanistic models of mutagenesis in the non-coding genome. In Chapter 3, we provide novel approaches that enable experimental characterization of mutagenesis at TF binding sites. While TF binding is proposed to impede the repair of lesioned DNA, it is not known if TFs can indeed bind DNA containing lesions, or if their binding can impede DNA recognition by repair enzymes. We tackle this by studying DNA mismatches, which act as intermediates for mutations arising from replication errors, as well as from deamination (SBS 1). We develop a novel high-throughput approach that enables the measurement and quantification of TF binding affinities for DNA with mismatches. Strikingly, we find several mismatches in TF binding sites, and even in non-specific sequences, that enhance TF binding, in ways that are not predictable from existing models of Watson-Crick DNA binding. We then design a library to study competition between TFs and MutSα, the repair enzyme involved in recognition of DNA mismatches. We show that TF binding to mismatched DNA is strong enough to compete with MutSα, leading to reduced repair efficiency. This reduction in repair efficiency translates to a higher rate of mutagenesis at TF-bound mismatches in-vivo, thus providing validation of TF-induced mutagenesis experimentally. However, our results showcase that such experimental characterization requires precise formulation of specific hypotheses that target positions within binding sites where lesion repair (or damage formation) is likely to be impacted. Our studies suggest that the impact of TF binding on DNA damage and repair is specific to the nature of repair processes and TFs under consideration. We provide novel computational and experimental frameworks to characterize these interactions and identify potential mutagenic processes that can be affected by TF binding. This work will aid in further studies of mutagenesis in TF binding sites, which is important for modeling selection and variation in the regulatory genome.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Sahay, Harshit (2024). Somatic Mutagenesis at Transcription Factor Binding Sites. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30861.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.