High-throughput characterization of the mismatch binding specificity of DNA repair enzymes

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Somatic DNA mutations play critical roles in human disease, especially during carcinogenesis and tumor development. Two major sources of mutations are the T-G mismatches resulting from spontaneous deamination of 5-methylcytosine (5meC) and various types of mismatches owing to DNA replication errors. To maintain genome stability, DNA repair enzymes (REs) are responsible for recognizing the mismatches and initiating repair. For spontaneous deamination of 5meC, the Thymine DNA Glycosylase (TDG) is one of the specialized enzymes that recognize the resulting T-G mismatches, excises the thymine to create abasic site (AP site), and initiates base excision repair (BER). For replication errors, various types of mismatches are recognized by the DNA mismatch repair protein MutS, which initiates the downstream mismatch repair (MMR) pathway. The DNA sequence flanking the mismatches (i.e. their context) is known to have an important effect on the specificities of both TDG and MutS, and consequently can influence the specificity of repair. However, the sequence context effects for both REs are poorly understood.To address this gap, we developed high-throughput in-vitro assays to quantitatively measure the repair enzymes’ binding and excision activity for DNA mismatches in tens of thousands of different sequence contexts, in a cell-free system. We found that two base-pairs 5' and three base-pairs 3' of the mismatch have significant effects on TDG binding and activity, whereas four base-pairs 5' and three base-pairs 3' of the mismatch are important for MutS binding specificity. The results are consistent with structural data for both REs. Moreover, we show that predictive modeling can be used to enable high accuracy predictions of RE binding to any new DNA sequence. The sequence context specificity of both REs is highly relevant in living cells. For TDG, DNA sequences that are most methylated in the genome, and thus most prone to deamination and subsequent mutation, are also the ones where T-G mismatches are best recognized by TDG. For MutS, we show that its specificity shapes the mutation landscape by influencing the genetic mutation spectra. More surprisingly, we found that mutations observed in late replicating regions in S-phase are more likely to come from lower-affinity mismatches than mutations observed in early replicating regions. The results indicate that the high mutation rates in late replicating regions are due not only to the insufficient time to repair replications errors in these regions, but also to the relatively long time required for MutS to identify and bind to low affinity mismatches in order to initiate their repair. Therefore, the sequence specificity for both REs is directly relevant to the in vivo genomic landscape of somatic mutations.






Hou, Yuze (2022). High-throughput characterization of the mismatch binding specificity of DNA repair enzymes. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/26817.


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.