Browsing by Author "Gordân, Raluca"
Results Per Page
Sort Options
Item Open Access A nucleosome-guided map of transcription factor binding sites in yeast.(PLoS Comput Biol, 2007-11) Narlikar, Leelavati; Gordân, Raluca; Hartemink, Alexander JFinding functional DNA binding sites of transcription factors (TFs) throughout the genome is a crucial step in understanding transcriptional regulation. Unfortunately, these binding sites are typically short and degenerate, posing a significant statistical challenge: many more matches to known TF motifs occur in the genome than are actually functional. However, information about chromatin structure may help to identify the functional sites. In particular, it has been shown that active regulatory regions are usually depleted of nucleosomes, thereby enabling TFs to bind DNA in those regions. Here, we describe a novel motif discovery algorithm that employs an informative prior over DNA sequence positions based on a discriminative view of nucleosome occupancy. When a Gibbs sampling algorithm is applied to yeast sequence-sets identified by ChIP-chip, the correct motif is found in 52% more cases with our informative prior than with the commonly used uniform prior. This is the first demonstration that nucleosome occupancy information can be used to improve motif discovery. The improvement is dramatic, even though we are using only a statistical model to predict nucleosome occupancy; we expect our results to improve further as high-resolution genome-wide experimental nucleosome occupancy data becomes increasingly available.Item Open Access Deciphering the Quantitative Effects of Cooperativity and Mutations on Transcription Factor Binding(2022) Martin, VincentiusTranscription factor (TF) proteins bind to DNA in a sequence specific manner to regulate gene expression. The binding affinity of TFs for individual sites is well characterized and can be represented using DNA motif models such as position weight matrices. However, there are many factors influencing TF-DNA recognition in the cell, leading to complexities than cannot be captured by motif models alone. Here, we present our studies on two factors: cooperative TF binding and alterations in TF binding due to DNA mutations. Both factors require quantitative and rigorous approaches to distinguish real effects from random noise.
First, we present a new method for characterizing cooperative binding of TFs to DNA. This method addresses the issue that TF binding sites located in close proximity, which occurs frequently across the human genome, are not necessarily bound cooperatively. To distinguish between cooperative and independent binding, we developed a high-throughput on-chip binding assay designed specifically to measure TF binding to neighboring sites. Using the experimental data from our assay, we trained machine learning models to differentiate between cooperative and independent binding of TFs. This method enabled us to reveal molecular mechanisms used by TFs to bind DNA cooperatively.
Second, we introduce QBiC-Pred (Quantitative Predictions of TF Binding Changes Due to Sequence Variants), an ordinary least squares based method to predict the magnitude of the effect of DNA mutations on TF-DNA recognition. We implemented QBiC-Pred as a web service: qbic.genome.duke.edu, which allows users to run our models through a user-friendly web interface. We used this method to identify non-recurring putative regulatory driver mutations in cancer. Our approach is novel because we prioritize mutations based on their effects on transcription factor (TF) binding, instead of relying on the recurrence of the mutations among tumor samples---which is often difficult to perform as individual non-coding mutations are rarely seen in more than one donor. Focusing on the functional effects of non-coding mutations across regulatory regions, we identified dozens of genes whose regulation in tumor cells is likely to be significantly perturbed by non-coding mutations.
Item Open Access DNA mismatches reveal conformational penalties in protein-DNA recognition.(Nature, 2020-11) Afek, Ariel; Shi, Honglue; Rangadurai, Atul; Sahay, Harshit; Senitzki, Alon; Xhani, Suela; Fang, Mimi; Salinas, Raul; Mielko, Zachery; Pufall, Miles A; Poon, Gregory MK; Haran, Tali E; Schumacher, Maria A; Al-Hashimi, Hashim M; Gordân, RalucaTranscription factors recognize specific genomic sequences to regulate complex gene-expression programs. Although it is well-established that transcription factors bind to specific DNA sequences using a combination of base readout and shape recognition, some fundamental aspects of protein-DNA binding remain poorly understood1,2. Many DNA-binding proteins induce changes in the structure of the DNA outside the intrinsic B-DNA envelope. However, how the energetic cost that is associated with distorting the DNA contributes to recognition has proven difficult to study, because the distorted DNA exists in low abundance in the unbound ensemble3-9. Here we use a high-throughput assay that we term SaMBA (saturation mismatch-binding assay) to investigate the role of DNA conformational penalties in transcription factor-DNA recognition. In SaMBA, mismatched base pairs are introduced to pre-induce structural distortions in the DNA that are much larger than those induced by changes in the Watson-Crick sequence. Notably, approximately 10% of mismatches increased transcription factor binding, and for each of the 22 transcription factors that were examined, at least one mismatch was found that increased the binding affinity. Mismatches also converted non-specific sites into high-affinity sites, and high-affinity sites into 'super sites' that exhibit stronger affinity than any known canonical binding site. Determination of high-resolution X-ray structures, combined with nuclear magnetic resonance measurements and structural analyses, showed that many of the DNA mismatches that increase binding induce distortions that are similar to those induced by protein binding-thus prepaying some of the energetic cost incurred from deforming the DNA. Our work indicates that conformational penalties are a major determinant of protein-DNA recognition, and reveals mechanisms by which mismatches can recruit transcription factors and thus modulate replication and repair activities in the cell10,11.Item Open Access Finding regulatory DNA motifs using alignment-free evolutionary conservation information.(Nucleic Acids Res, 2010-04) Gordân, Raluca; Narlikar, Leelavati; Hartemink, Alexander JAs an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do.Item Open Access HDAC inhibitors cause site-specific chromatin remodeling at PU.1-bound enhancers in K562 cells.(Epigenetics Chromatin, 2016) Frank, Christopher L; Manandhar, Dinesh; Gordân, Raluca; Crawford, Gregory EBACKGROUND: Small molecule inhibitors of histone deacetylases (HDACi) hold promise as anticancer agents for particular malignancies. However, clinical use is often confounded by toxicity, perhaps due to indiscriminate hyperacetylation of cellular proteins. Therefore, elucidating the mechanisms by which HDACi trigger differentiation, cell cycle arrest, or apoptosis of cancer cells could inform development of more targeted therapies. We used the myelogenous leukemia line K562 as a model of HDACi-induced differentiation to investigate chromatin accessibility (DNase-seq) and expression (RNA-seq) changes associated with this process. RESULTS: We identified several thousand specific regulatory elements [~10 % of total DNase I-hypersensitive (DHS) sites] that become significantly more or less accessible with sodium butyrate or suberanilohydroxamic acid treatment. Most of the differential DHS sites display hallmarks of enhancers, including being enriched for non-promoter regions, associating with nearby gene expression changes, and increasing luciferase reporter expression in K562 cells. Differential DHS sites were enriched for key hematopoietic lineage transcription factor motifs, including SPI1 (PU.1), a known pioneer factor. We found PU.1 increases binding at opened DHS sites with HDACi treatment by ChIP-seq, but PU.1 knockdown by shRNA fails to block the chromatin accessibility and expression changes. A machine-learning approach indicates H3K27me3 initially marks PU.1-bound sites that open with HDACi treatment, suggesting these sites are epigenetically poised. CONCLUSIONS: We find HDACi treatment of K562 cells results in site-specific chromatin remodeling at epigenetically poised regulatory elements. PU.1 shows evidence of a pioneer role in this process by marking poised enhancers but is not required for transcriptional activation.Item Open Access High-throughput characterization of the mismatch binding specificity of DNA repair enzymes(2022) Hou, YuzeSomatic DNA mutations play critical roles in human disease, especially during carcinogenesis and tumor development. Two major sources of mutations are the T-G mismatches resulting from spontaneous deamination of 5-methylcytosine (5meC) and various types of mismatches owing to DNA replication errors. To maintain genome stability, DNA repair enzymes (REs) are responsible for recognizing the mismatches and initiating repair. For spontaneous deamination of 5meC, the Thymine DNA Glycosylase (TDG) is one of the specialized enzymes that recognize the resulting T-G mismatches, excises the thymine to create abasic site (AP site), and initiates base excision repair (BER). For replication errors, various types of mismatches are recognized by the DNA mismatch repair protein MutS, which initiates the downstream mismatch repair (MMR) pathway. The DNA sequence flanking the mismatches (i.e. their context) is known to have an important effect on the specificities of both TDG and MutS, and consequently can influence the specificity of repair. However, the sequence context effects for both REs are poorly understood.To address this gap, we developed high-throughput in-vitro assays to quantitatively measure the repair enzymes’ binding and excision activity for DNA mismatches in tens of thousands of different sequence contexts, in a cell-free system. We found that two base-pairs 5' and three base-pairs 3' of the mismatch have significant effects on TDG binding and activity, whereas four base-pairs 5' and three base-pairs 3' of the mismatch are important for MutS binding specificity. The results are consistent with structural data for both REs. Moreover, we show that predictive modeling can be used to enable high accuracy predictions of RE binding to any new DNA sequence. The sequence context specificity of both REs is highly relevant in living cells. For TDG, DNA sequences that are most methylated in the genome, and thus most prone to deamination and subsequent mutation, are also the ones where T-G mismatches are best recognized by TDG. For MutS, we show that its specificity shapes the mutation landscape by influencing the genetic mutation spectra. More surprisingly, we found that mutations observed in late replicating regions in S-phase are more likely to come from lower-affinity mismatches than mutations observed in early replicating regions. The results indicate that the high mutation rates in late replicating regions are due not only to the insufficient time to repair replications errors in these regions, but also to the relatively long time required for MutS to identify and bind to low affinity mismatches in order to initiate their repair. Therefore, the sequence specificity for both REs is directly relevant to the in vivo genomic landscape of somatic mutations.
Item Open Access Human-chimpanzee differences in a FZD8 enhancer alter cell-cycle dynamics in the developing neocortex.(Curr Biol, 2015-03-16) Boyd, J Lomax; Skove, Stephanie L; Rouanet, Jeremy P; Pilaz, Louis-Jan; Bepler, Tristan; Gordân, Raluca; Wray, Gregory A; Silver, Debra LThe human neocortex differs from that of other great apes in several notable regards, including altered cell cycle, prolonged corticogenesis, and increased size [1-5]. Although these evolutionary changes most likely contributed to the origin of distinctively human cognitive faculties, their genetic basis remains almost entirely unknown. Highly conserved non-coding regions showing rapid sequence changes along the human lineage are candidate loci for the development and evolution of uniquely human traits. Several studies have identified human-accelerated enhancers [6-14], but none have linked an expression difference to a specific organismal trait. Here we report the discovery of a human-accelerated regulatory enhancer (HARE5) of FZD8, a receptor of the Wnt pathway implicated in brain development and size [15, 16]. Using transgenic mice, we demonstrate dramatic differences in human and chimpanzee HARE5 activity, with human HARE5 driving early and robust expression at the onset of corticogenesis. Similar to HARE5 activity, FZD8 is expressed in neural progenitors of the developing neocortex [17-19]. Chromosome conformation capture assays reveal that HARE5 physically and specifically contacts the core Fzd8 promoter in the mouse embryonic neocortex. To assess the phenotypic consequences of HARE5 activity, we generated transgenic mice in which Fzd8 expression is under control of orthologous enhancers (Pt-HARE5::Fzd8 and Hs-HARE5::Fzd8). In comparison to Pt-HARE5::Fzd8, Hs-HARE5::Fzd8 mice showed marked acceleration of neural progenitor cell cycle and increased brain size. Changes in HARE5 function unique to humans thus alter the cell-cycle dynamics of a critical population of stem cells during corticogenesis and may underlie some distinctive anatomical features of the human brain.Item Embargo K-mer Based Methods for Measuring and Predicting DNA-Binding Specificity of Transcription Factors(2023) Mielko, ZacheryTranscription factors (TFs) are proteins that bind DNA based on the sequence and structure to regulate gene expression. They are fundamental components of genomic function, present in all known forms of life. Thus, understanding the conditions required for TF-DNA interactions is a longstanding and active field of study. With the advent of comprehensive k-mer based measurements using protein binding microarrays, the binding profiles of hundreds of TFs have been measured. This dissertation addresses two major problems. First, the information from these comprehensive measurements are used to create simplistic models of binding that capture only the high affinity range. In a biological context, weak binding sites are often the most important in developmental and regulatory processes and can be missed by models targeting high affinity binding sites. Second, that the vast majority of measurements are on structurally unmodified DNA. TF binding occurs in complex and dynamic systems where the DNA structure can be significantly altered due to sources such as DNA damage. First, we look at how DNA shape influences binding through the study of UV induced photoproducts, DNA adducts formed from UV light exposure that distort the shape of pyrimidine dinucleotides. We developed a new k-mer based method for measuring TF binding to UV-irradiated DNA, UV-Bind. Using this technology, we find that the UV-induced changes in DNA structure from pyrimidine dinucleotide photoproducts can change the specificity of TFs. Using high-throughput k-mer measurements, we also found non-canonical sequences that show an increase in binding signal after UV-irradiation. We then introduce a new algorithm for calling TF binding sites using k-mers, CtrlF-TF. CtrlF-TF takes high-throughput k-mer measurements from PBMs and outputs aligned, ranked consensus sites that can be searched in a genome. These sites compare favorably to traditional position weight matrix defined sites via in vivo and in vitro benchmarks.
Item Open Access Methods for Comparative Analysis of Chromatin Accessibility and Gene Expression, With Applications to Cellular Reprogramming(2018) Manandhar, DineshCellular reprogramming processes remain poorly characterized at the level of genome- wide chromatin and gene expression changes. Specifically, the extent to which re- programmed cells differ quantitatively from both the starting cells and the target cells is unknown for most reprogramming systems. In addition, direct comparisons between the genome-wide reprogramming efficiencies in systems driven by the over- expression of endogenous versus exogenous master regulator(s) are rarely performed. This thesis presents methods for comparative analyses of genome-wide gene expres- sion and chromatin accessibility data, applied to myogenic reprogramming systems in order to assess reprogramming efficiency and generate testable hypotheses for improving the reprogramming process. First, gene expression and chromatin acces- sibility profiles of MyoD-induced transdifferentiated primary human skin fibroblasts are compared to fibroblasts and myoblasts. Second, similar genome-wide changes are assessed for myogenic conversion of iPS cells driven by overexpression of en- dogenous MyoD versus exogeneous MyoD. Both studies show that (i) while many muscle marker genes are reprogrammed after MyoD overexpression, the genome-wide accessibility and gene expression profiles are still different from those of primary my- oblast or myotube cells; (ii) MyoD induces a continuum of changes in chromatin accessibility, with only a fraction of myogenic chromatin sites gaining a completely reprogrammed accessibility status; and (iii) chromatin-remodeling deficiencies are strongly correlated with incomplete gene expression reprogramming. Classification analyses comparing reprogrammed and non-reprogrammed genes or chromatin sites revealed discriminatory genetic and epigenetic features, suggesting ways to poten- tially improve the reprogramming efficiency. Genomic analysis of transgene MyoD overexpression in iPS cells, compared to endogenous MyoD activation, also showed that MyoD is more “aggressive” in its chromatin opening behavior, showing a large number of off-target chromatin opening events. To further investigate the effects chromatin remodeling events on gene expression in reprogramming studies, a novel cross-cell type gene expression prediction framework (CPGex) is also developed. By integrating and modeling the non-linear combinatorial effects of chromatin accessi- bility as well as the expression levels of regulatory TFs, CPGex is able to weigh the importance of regulatory sites or factors for downstream targeted reprogramming of specific gene(s). The methods described in this thesis can be applied to any cellular reprogramming system in order to quantitatively assess the efficiency of reprogram- ming at the chromatin accessibility and gene expression levels, as well as to generate testable hypothesis for improved genome-wide reprogramming.
Item Open Access Mutational processes in cancer preferentially affect binding of particular transcription factors.(Scientific reports, 2021-02-08) Liu, Mo; Boot, Arnoud; Ng, Alvin WT; Gordân, Raluca; Rozen, Steven GProtein binding microarrays provide comprehensive information about the DNA binding specificities of transcription factors (TFs), and can be used to quantitatively predict the effects of DNA sequence variation on TF binding. There has also been substantial progress in dissecting the patterns of mutations, i.e., the "mutational signatures", generated by different mutational processes. By combining these two layers of information we can investigate whether certain mutational processes tend to preferentially affect binding of particular classes of TFs. Such preferential alterations of binding might predispose to particular oncogenic pathways. We developed and implemented a method, termed "Signature-QBiC", that integrates protein binding microarray data with the signatures of mutational processes, with the aim of predicting which TFs' binding profiles are preferentially perturbed by particular mutational processes. We used Signature-QBiC to predict the effects of 47 signatures of mutational processes on 582 human TFs. Pathway analysis showed that binding of TFs involved in NOTCH1 signaling is strongly affected by the signatures of several mutational processes, including exposure to ultraviolet radiation. Additionally, toll-like-receptor signaling pathways are also vulnerable to disruption by this exposure. This study provides a novel overview of the effects of mutational processes on TF binding and the potential of these processes to activate oncogenic pathways through mutating TF binding sites.Item Open Access Predicting In Vivo Transcription Factor Occupancy from In Vitro Binding(2014) Stamatov, RumenThe spatial pattern of transcription factor (TF) binding and the level of TF occupancy at individual sites across the genome determine how a TF regulates its targets. Consequently, predicting the location and level of TF binding genome-wide is of great importance and has received much attention recently. Protein-binding microarray (PBM) technology has become the golden standard for studying TF-DNA interactions in vitro, while Chromatin Immunoprecipitation followed by DNA Sequencing (ChIP-seq) is the standard method for inferring TF binding in vivo. However, direct interpretation of in vitro results in an in vivo context is challenging and to-date remains scarce. In this study, we focus on the E2F family of paralogous TFs, whose mode of binding to DNA has been controversial. Previous studies have shown that E2F factors bind to the TTTSSCGCG motif, where S can be a C or a G. Still, only a small fraction of in vivo targets are reported to contain this motif, hinting at indirect recruitment of the protein. We observed that genomic occupancy of E2F factors directly correlates with their in vitro binding affinities. By using data from universal PBM experiments, we show that E2F factors likely bind to DNA through direct sequence recognition and not through cofactor interaction. Furthermore, we developed a kinetic binding model using the PBM data to describe competition between different members of the E2F family and successfully distinguished between their unique targets. Overall, these results demonstrate how the straightforward and simple in vitro PBM experiments can be used for inferring the complex in vivo landscape of TF binding and elucidate the mechanism of E2F-DNA interaction.
Item Open Access Punctuated evolution and transitional hybrid network in an ancestral cell cycle of fungi.(Elife, 2016-05-10) Medina, Edgar M; Turner, Jonathan J; Gordân, Raluca; Skotheim, Jan M; Buchler, Nicolas EAlthough cell cycle control is an ancient, conserved, and essential process, some core animal and fungal cell cycle regulators share no more sequence identity than non-homologous proteins. Here, we show that evolution along the fungal lineage was punctuated by the early acquisition and entrainment of the SBF transcription factor through horizontal gene transfer. Cell cycle evolution in the fungal ancestor then proceeded through a hybrid network containing both SBF and its ancestral animal counterpart E2F, which is still maintained in many basal fungi. We hypothesize that a virally-derived SBF may have initially hijacked cell cycle control by activating transcription via the cis-regulatory elements targeted by the ancestral cell cycle regulator E2F, much like extant viral oncogenes. Consistent with this hypothesis, we show that SBF can regulate promoters with E2F binding sites in budding yeast.Item Open Access Sequence and Structural Determinants of Specificity Differences between Paralogous Transcription Factors(2016) Shen, NingTranscription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Item Open Access Stability selection for regression-based models of transcription factor-DNA binding specificity.(Bioinformatics, 2013-07-01) Mordelet, Fantine; Horton, John; Hartemink, Alexander J; Engelhardt, Barbara E; Gordân, RalucaMOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.Item Open Access Transcription Factor-Centric Approaches to Identify Regulatory Driver Mutations in Cancer(2020) Zhao, JingkangMost previous efforts to identify cancer driver mutations have focused on protein-coding genes. In recent years, the decreasing costs of DNA sequencing have enabled whole-genome sequencing (WGS) studies of thousands of tumor samples, making it possible to systematically survey non-coding regions for potential driver events. From these studies, millions of somatic mutations in cancer have been identified, the majority of which are non-coding. However, driver identification remains a far greater challenge in non-coding regions than in coding genes, primarily due to the incomplete annotation of the non-coding genome and the unknown functional impact of non-coding mutations.
In this work, we present new approaches to identify putative regulatory driver mutations in cancer, based on new methodology for predicting the quantitative effects of single nucleotide variants on transcription factor (TF) binding. Unlike most of the previous work on driver identification, our method does not require the driver mutations to be highly recurrent; instead, we assess the mutations’ significance by testing if they cause larger TF binding changes than expected in the case of completely random mutations. Since gene regulation relies on the cooperation of multiple regulatory elements, we have devised a way to combine the effects of all regulatory mutations of a gene in order to identify genes whose regulation is likely to be significantly perturbed by the mutations observed in their regulatory elements, through changes in TF binding.
We have applied our TF-centric approaches to analyze single nucleotide variants identified in a liver cancer data set from the International Cancer Genome Consortium (ICGC), and identified potentially dysregulated genes whose regulatory mutations could trigger significant TF binding changes. Notably, the genes identified by us are different from the ones prioritized by recurrence-based approaches. However, most of the potentially dysregulated genes we have identified have large changes in gene expression and/or are cancer prognostic genes. Our results suggest that regulatory mutations should be investigated further, not just by their recurrence, but also by their functional effects such as TF binding changes, to uncover dysregulated genes that may drive tumorigenesis.
Item Open Access Transcription Factors as Competitors in Gene Regulation and DNA Damage Repair(2022) Zhang, YuningTranscription factors (TFs) bind genomic DNA to regulate gene expression. In the cell, the genome is decorated with numerous proteins, including nucleosomes and proteins involved in processes such as DNA repair and replication, which could compete with TFs. While the competition with nucleosomes is well studied, TFs can also compete with other DNA-binding proteins (e.g. other TFs, DNA repair enzymes, polymerases). The rules and the impact of such competition remain largely unknown. Here, we investigate how TFs compete with each other and with repair enzymes, and we reveal the significant role TFs play as competitors in multiple pathways.To capture the binding profiles of competing TFs, we designed a quantitative cell-free assay that we applied to study Cbf1-Pho4 competition in yeast and MYC-MAD competition in human. We found that TFs greatly influence each other’s occupancy, in a way that is dictated by the proteins’ divergence in DNA-binding specificity. Analyses of ChIP-seq data confirmed that the patterns of TF-TF competition, as observed in vitro, are preserved in the nuclear environment. Furthermore, gene expression data suggests that Cbf1-Pho4 competition plays a critical role in the specific activation of target genes in the cell. In the MYC-MAD system, we found that quantitative in vitro knowledge facilitates the interpretation of in vivo ChIP-seq data and reveals subtle signals in gene regulatory networks, demonstrating the advantage of combining in vitro quantification with in vivo detection. Next, we adapted our assay to study the competition between TFs and DNA repair enzymes. Recently (Afek et al. 2020) we showed that TFs bind with high affinity to mismatches, which can result from replication errors. We thus hypothesized that TFs can compete with TDG, the glycosylase that recognize T-G mismatch and initiates base excision repair, and MutS, the mismatch-binding enzyme that initiates mismatch repair. Our high-throughput competition assay showed that, as predicted, the binding of both repair enzymes to DNA decreases significantly in the presence of TFs. In addition, the magnitude of the decreases in repair enzyme binding correlates well with the TF binding levels, indicating specific competition. This suggests that, in the cell, TFs bound to mismatches may affect repair and lead to increased mutagenesis at regulatory sites. Overall, our study proposes an approach for studying competition between DNA-binding proteins in a quantitative and high-throughput manner, and highlights the significance of this competition not only for gene regulation (where TFs are known to play an important role), but also in DNA repair.