Browsing by Subject "Transcription factor"
Results Per Page
Sort Options
Item Open Access Computational Methods For Functional Motif Identification and Approximate Dimension Reduction in Genomic Data(2011) Georgiev, StoyanUncovering the DNA regulatory logic in complex organisms has been one of the important goals of modern biology in the post-genomic era. The sequencing of multiple genomes in combination with the advent of DNA microarrays and, more recently, of massively parallel high-throughput sequencing technologies has made possible the adoption of a global perspective to the inference of the regulatory rules governing the context-specific interpretation of the genetic code that complements the more focused classical experimental approaches. Extracting useful information and managing the complexity resulting from the sheer volume and the high-dimensionality of the data produced by these genomic assays has emerged as a major challenge which we attempt to address in this work by developing computational methods and tools, specifically designed for the study of the gene regulatory processes in this new global genomic context.
First, we focus on the genome-wide discovery of physical interactions between regulatory sequence regions and their cognate proteins at both the DNA and RNA level. We present a motif analysis framework that leverages the genome-wide
evidence for sequence-specific interactions between trans-acting factors and their preferred cis-acting regulatory regions. The utility of the proposed framework is demonstarted on DNA and RNA cross-linking high-throughput data.
A second goal of this thesis is the development of scalable approaches to dimension reduction based on spectral decomposition and their application to the study of population structure in massive high-dimensional genetic data sets. We have developed computational tools and have performed theoretical and empirical analyses of their statistical properties with particular emphasis on the analysis of the individual genetic variation measured by Single Nucleotide Polymorphism (SNP) microrarrays.
Item Open Access Engineering Transcription Factors to Program Cell Fate Decisions(2015) Kabadi, Ami MedaTechnologies for engineering new functions into proteins are advancing biological research, biotechnology, and medicine at an astounding rate. Building on fundamental research of natural protein structure and function, scientists are identifying new protein domains with previously undescribed properties and engineering new proteins with expanded functionalities. Such tools are enabling the precise study of fundamental aspects of cellular behavior and the development of a new class of gene therapies that manipulate the expression of endogenous genes. The applications of these gene regulation technologies include but are not limited to controlling cell fate decisions, reprogramming cell lineage commitment, monitoring cellular states, and stimulating expression of therapeutic factors.
While the field has come a long way in the past 20 years, there are still many limitations. Historically, gene therapy and gene replacement therapies have relied on over-expression of natural transcription factors that activate specific endogenous gene networks. However, natural transcription factors are often inadequate for generating efficient, fast, and homogenous cellular responses. Furthermore, most natural transcription factors have complex structures and functions that are difficult to improve or alter by rational design. This thesis presents three novel and widely applicable methods for engineering transcription factors for programming cell fate decisions in primary human cells. MyoD is the master transcription factor defining the myogenic lineage. Expression of MyoD in certain non-myogenic lineages induces a coordinated change in differentiation state. We use MyoD as a model for developing our protein engineering techniques because myogenesis is a well-studied pathway that is characterized by an easily detected change in phenotype from mono-nucleated to multinucleated cells. Furthermore, efficient generation of myocytes in vitro presents an attractive patient-specific method by which to treat muscle-wasting diseases such as muscular dystrophy.
We first demonstrate that we can improve the ability of MyoD to convert human dermal fibroblasts and human adipose-derived stem cells into myocyte-like cells. By fusing potent modular activation domains to the MyoD protein, we increased myogenic gene expression, myofiber formation, cell fusion, and global reprogramming of the myogenic gene network. The engineered MyoD transcription factor induced myogenisis in a little as ten days, a process that takes three or more weeks with the natural MyoD protein.
While increasing the potency of transcriptional activation is one mechanism by which to improve transcription factor function, there are many other possible routes such as increasing DNA-binding affinity, increasing protein stability, altering interactions with co-factors, or inducing post-translational modifications. Endogenous regulatory pathways are complex, and it is difficult to predict specific amino acid changes that will produce the desired outcome. Therefore, we designed and implemented a high-throughput directed evolution system in mammalian cells that allowed us to enrich for MyoD variants that are successful at inducing expression of the myogenic gene network. Directed evolution presents a well-established and currently unexplored approach for uncovering amino acid substitutions that improve the intrinsic properties of transcription factors themselves without any prior knowledge. After ten rounds of selection, we identified amino acid substitutions in MyoD that increase expression of a subset of myogenic gene markers in primary human cells.
Rather than guide cell fate decisions by expressing an exogenous factor, it may be beneficial to activate expression of the endogenous gene locus. In comparison to delivering the transcription factor cDNA, expression from the endogenous locus may induce chromatin remodeling and activation of positive feedback loops to stimulate autologous expression more quickly. Recent discoveries of the principles of protein-DNA interactions in various species and systems has guided the development of methods for engineering designer enzymes that can be targeted to any DNA target site. We make use of the RNA-guided Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)/Cas9 system to induce expression of the endogenous MyoD gene in human induced pluripotent stem cells (iPSCs). Through complementary base pairing, chimeric guide RNAs (gRNAs) direct a Cas9 transcriptional activator to a target DNA sequence, leading to endogenous gene expression. A current limitation of CRISPR/Cas9-based gene regulation is the potency of transcriptional activation and delivery of the CRISPR/Cas9 components. To address these limitations, we first developed a platform to express Cas9 and up to four gRNAs from a single lentiviral vector. We then optimized the gRNAs and Cas9 transcriptional activator to induce endogenous MyoD expression and differentiate iPSCs into myocyte-like cells.
In summary, the objective of this work is to develop protein engineering techniques to improve both natural and synthetic transcription factor function for programming cell fate decisions in primary human cells. While we focus on myogenesis, each method can be easily adapted to other transcription factors and gene networks. Engineered transcription factors that induce fast and efficient remodeling of gene networks have widespread applications in the fields of biotechnology and regenerative medicine. Continuing to develop these tools for modulating gene expression will lead to an expanded number of disease models and eventually the efficient generation of patient-specific cellular therapies.
Item Embargo Interactions between the microbiota and host transcription factor HNF4A in the intestinal epithelium regulate intestinal inflammation throughout the lifespan(2023) Kelly, CeceliaThe inflammatory bowel diseases (IBD) occur in genetically susceptible individuals that mount inappropriate immune responses to their microbiota leading to chronic intestinal inflammation. The natural history of IBD progression includes early subclinical stages of disease occurring before disease is diagnosed in the clinic. There is evidence in first degree relatives of IBD patients and members of the general population who go on to develop IBD, that these stages are characterized by increased gut barrier permeability, increased levels of inflammation biomarkers, detection of microbiota-specific antibodies in sera, and changes in microbiota composition. Mouse models can be a useful tool in studying disease dynamics during these early stages. The transcription factor Hepatocyte nuclear factor 4 alpha (HNF4A) has been associated with human IBD, and deletion of Hnf4a in intestinal epithelial cells (IEC) in mice (Hnf4aΔIEC) leads to spontaneous colonic inflammation by 6-12 months of age. However, early stages of disease in this mouse model were not well defined, and the role of microbiota in promoting disease was also unclear. Here I tested if pathology in Hnf4aΔIEC mice begins earlier in life and if microbiota contribute to that process, as well as later inflammatory stages of disease. Longitudinal analysis revealed that Hnf4aΔIEC mice reared in specific pathogen-free (SPF) conditions develop episodically elevated fecal lipocalin 2 (Lcn2) and episodic loose stools beginning by 4-5 weeks of age. Lifetime cumulative Lcn2 levels correlated with histopathological features of colitis at 12 months of age. Antibiotic and gnotobiotic tests showed that these phenotypes in Hnf4aΔIEC mice were dependent on microbiota. Fecal 16S rRNA gene sequencing in SPF Hnf4aΔIEC and control mice disclosed that genotype significantly contributed to differences in microbiota composition by 12 months, and longitudinal analysis of the Hnf4aΔIEC mice with the highest lifetime cumulative Lcn2 revealed that microbial community differences emerged early in life when elevated fecal Lcn2 was first detected. These microbiota differences included enrichment of a novel phylogroup of Akkermansia muciniphila in Hnf4aΔIEC mice. I conclude that HNF4A functions in IEC to shape composition of the gut microbiota, and protect against episodic inflammation induced by microbiota throughout the lifespan. Lastly, I discuss future directions for this work, including using single cell RNA sequencing of the colonic epithelium to identify genes regulated by HNF4A in distinct colonic epithelial cell types, gnotobiotic studies using the strain of Akkermansia muciniphila we isolated to test the hypothesis that it can promote disease in Hnf4aΔIEC mice, testing clinically relevant disease triggers using this mouse IBD model, and further immune cell profiling.
Item Embargo K-mer Based Methods for Measuring and Predicting DNA-Binding Specificity of Transcription Factors(2023) Mielko, ZacheryTranscription factors (TFs) are proteins that bind DNA based on the sequence and structure to regulate gene expression. They are fundamental components of genomic function, present in all known forms of life. Thus, understanding the conditions required for TF-DNA interactions is a longstanding and active field of study. With the advent of comprehensive k-mer based measurements using protein binding microarrays, the binding profiles of hundreds of TFs have been measured. This dissertation addresses two major problems. First, the information from these comprehensive measurements are used to create simplistic models of binding that capture only the high affinity range. In a biological context, weak binding sites are often the most important in developmental and regulatory processes and can be missed by models targeting high affinity binding sites. Second, that the vast majority of measurements are on structurally unmodified DNA. TF binding occurs in complex and dynamic systems where the DNA structure can be significantly altered due to sources such as DNA damage. First, we look at how DNA shape influences binding through the study of UV induced photoproducts, DNA adducts formed from UV light exposure that distort the shape of pyrimidine dinucleotides. We developed a new k-mer based method for measuring TF binding to UV-irradiated DNA, UV-Bind. Using this technology, we find that the UV-induced changes in DNA structure from pyrimidine dinucleotide photoproducts can change the specificity of TFs. Using high-throughput k-mer measurements, we also found non-canonical sequences that show an increase in binding signal after UV-irradiation. We then introduce a new algorithm for calling TF binding sites using k-mers, CtrlF-TF. CtrlF-TF takes high-throughput k-mer measurements from PBMs and outputs aligned, ranked consensus sites that can be searched in a genome. These sites compare favorably to traditional position weight matrix defined sites via in vivo and in vitro benchmarks.
Item Open Access Novel Protein Regulators of Heat Shock Transcription Factor 1 During Stress and Disease(2019) Burchfiel, Eileen Therese MalloyHeat Shock Transcription Factor 1 (HSF1) is a critical regulator of transcription that facilitates cellular stress protection in response to protein misfolding, rapid cell proliferation, and other stressful conditions. Defective HSF1 regulation is observed in cellular and animal models of cancer, where hyperactive and dysregulated HSF1 supports cancer survival, and in neurodegenerative disease, where HSF1 function is compromised, further exacerbating protein misfolding. HSF1 is tightly regulated through intramolecular interactions, post-translational modifications, and protein-protein interactions; however, little is known about ho HSF1 regulation differs in response to stresses such as acute or chronic protein misfolding.
We identified one mechanism that contributes to the diminution of HSF1 in chronic protein misfolding in the context of Huntington’s Disease involving inappropriate interactions of HSF1 with CK2α’ and FBXW7 E3 ligase. We found these protein-protein interactions coordinate the abnormal phosphorylation-dependent degradation of HSF1. Importantly, inhibition of this aberrant HSF1 degradation attenuates the biochemical defects and protein misfolding in Huntington’s Disease. To further elucidate how HSF1-interacting proteins regulate HSF1 in acute and chronic stress, we carried out quantitative proteomics studies of the HSF1 interactome under control, acute heat shock, and in a cell model of Huntington’s Disease. We recapitulated many previously described interaction partners of HSF1 and identified several novel HSF1-interacting proteins that encompass a wide variety of cellular functions, including roles in DNA repair, mRNA processing, and regulation of RNA polymerase II. We further report on the interaction of HSF1 with CCCTC binding factor (CTCF), which modulates target gene activation and repression function of HSF1 by facilitating DNA binding at CTCF and HSF1 co-regulated loci. Given the role and elevated expression of both pro-inflammatory proteins and Tau in Huntington’s Disease, and their defective repression by HSF1, understanding the mechanisms of HSF1 repression is of great interest. The studies presented in this thesis expand our understanding of HSF1-mediated gene activation and repression, and the regulation of HSF1 via protein-protein interactions.
Item Open Access Sequence and Structural Determinants of Specificity Differences between Paralogous Transcription Factors(2016) Shen, NingTranscription factors (TFs) control the temporal and spatial expression of target genes by interacting with DNA in a sequence-specific manner. Recent advances in high throughput experiments that measure TF-DNA interactions in vitro and in vivo have facilitated the identification of DNA binding sites for thousands of TFs. However, it remains unclear how each individual TF achieves its specificity, especially in the case of paralogous TFs that recognize distinct target genomic sites despite sharing very similar DNA binding motifs. In my work, I used a combination of high throughput in vitro protein-DNA binding assays and machine-learning algorithms to characterize and model the binding specificity of 11 paralogous TFs from 4 distinct structural families. My work proves that even very closely related paralogous TFs, with indistinguishable DNA binding motifs, oftentimes exhibit differential binding specificity for their genomic target sites, especially for sites with moderate binding affinity. Importantly, the differences I identify in vitro and through computational modeling help explain, at least in part, the differential in vivo genomic targeting by paralogous TFs. Future work will focus on in vivo factors that might also be important for specificity differences between paralogous TFs, such as DNA methylation, interactions with protein cofactors, or the chromatin environment. In this larger context, my work emphasizes the importance of intrinsic DNA binding specificity in targeting of paralogous TFs to the genome.
Item Open Access Systematic Examination of Epigenomic Regulation of Neuronal Plasticity(2022) Minto, Melyssa SThe epigenome underlies cell type and state and in post-mitotic neurons, and it regulates the ability for rapid response to activity. Since neurons exit the cell cycle early in development and are long lived, remodeling of brain function requires that neurons show transcriptional plasticity to let then change in function in response to stimuli including psychostimulants and developmental cues. This response is driven by the epigenomic regulation in a cell-type-specific manner. Many studies assessing experience driven genomic responses have been carried out in bulk tissues so cell-type-specific genomic responses to stimuli that drive neuronal plasticity remains poorly understood. To understand the epigenomic and transcriptomic mechanisms driving neuronal plasticity, here we study multi-omic genomic data from two contexts in the mouse brain: 1) psychostimulant responses in the nucleus accumbens and 2)the postnatal and postmitotic maturation of developing cerebellar granule neurons. In both systems, I implemented integrative bioinformatic approaches to predict transcription factor (TF) activity in regulating the transcriptome. I elucidated cell-type-specific amphetamine induced transcriptomic responses, identified canonical activity regulated transcription factors regulating those responses, and determined collaborators and developmental targets of the Zic family TFs, revealing novel roles of Zics regulating migration and synaptic maturation in CGN development. The studies reveal novel mechanistic insights into neuronal plasticity in different neuronal cell types by using integrative computational approaches to model chromatin topology, chromatin accessibility, gene expression, and TF binding.
Item Open Access The role of HEB and E2A in the regulation of T Lymphocyte development and proliferation(2007-05-10T16:02:36Z) Wojciechowski, JasonThymocyte development is a complex process that requires precise regulation of differentiation and proliferation. Basic helix-loop-helix (bHLH) transcription factors have been shown to be crucial for proper T cell development. HEB and E2A are structurally and functionally related E proteins of the bHLH family. These proteins directly regulate the expression of a number of genes essential for lymphocyte development in a lineage- and stage-specific manner. Abrogation or compromise of their function results in the manifestation of B and T cell developmental defects. Genetic and biochemical studies have provided evidence of a significant degree of functional redundancy among E proteins. The existence of compensational abilities among different E proteins has hampered the investigation and elucidation of E protein function. As such, single gene knockouts demonstrate only limited defects in lymphocyte development. Double E2A-HEB knockouts that could eliminate E protein redundancy are embryonic lethal. In addition, conventional gene knockouts are not well-suited for discerning between intrinsic and extrinsic defects caused by E protein disruption. To eliminate functional compensation and to test the T cell intrinsic roles of E proteins during thymocyte development, we developed a conditional HEB-E2A double knockout. Specifically, we employed a loxP/Lck-Cre recombinase system to drive E protein deletion during early thymocyte development. Using this approach, we were able to reveal overlapping roles for HEB and E2A in thymocyte development that had been obscured in previous single gene knockout studies. We find that simultaneous deletion of HEB and E2A results in a severe block in thymocyte development at the DN to DP stage transition. This developmental block is accompanied by a dramatic decrease in total thymic cellularity, an increase in apoptosis, and a reduction of pTα expression. These developmentally arrested thymocytes exhibit increased proliferation in vivo and dramatic expansion ex vivo in response to IL-7 signaling. Our findings suggest that E2A and HEB are not only critical for the regulation of T cell differentiation but are also necessary to retain developing thymocytes in cell cycle arrest prior to pre-TCR expression. Together, these results imply that E proteins are required to coordinate thymocyte differentiation and proliferation.Item Open Access Tracking Transcription Factors on the Genome by their DNase-seq Footprints(2014) Yardimci, Galip GurkanAbstract
Transcription factors control numerous vital processes in the cell through their ability to control gene expression. Dysfunctional regulation by transcription factors lead to disorders and disease. Transcription factors regulate gene expression by binding to DNA sequences (motifs) on the genome and altering chromatin. DNase-seq footprinting is a well-established assay for identification of DNA sequences that bind to transcription factors. We developed computational techniques to analyze footprints and predict transcription factor binding. These transcription factor specific predictive models are able to correct for DNase sequence bias and characterize variation in DNA binding sequence. We found that DNase-seq footprints are able to identify cell-type or condition specific transcription factor activity and may offer information about the type of the interaction between DNA and transcription factor. Our DNase-seq footprint model is able to accurately discover high confidence transcription factor binding sites and discover alternative interactions between transcription factors and DNA. DNase-seq footprints can be used with ChIP-seq data to discover true binding sites and better understand transcription regulation.
Item Open Access Transcription Factor-Centric Approaches to Identify Regulatory Driver Mutations in Cancer(2020) Zhao, JingkangMost previous efforts to identify cancer driver mutations have focused on protein-coding genes. In recent years, the decreasing costs of DNA sequencing have enabled whole-genome sequencing (WGS) studies of thousands of tumor samples, making it possible to systematically survey non-coding regions for potential driver events. From these studies, millions of somatic mutations in cancer have been identified, the majority of which are non-coding. However, driver identification remains a far greater challenge in non-coding regions than in coding genes, primarily due to the incomplete annotation of the non-coding genome and the unknown functional impact of non-coding mutations.
In this work, we present new approaches to identify putative regulatory driver mutations in cancer, based on new methodology for predicting the quantitative effects of single nucleotide variants on transcription factor (TF) binding. Unlike most of the previous work on driver identification, our method does not require the driver mutations to be highly recurrent; instead, we assess the mutations’ significance by testing if they cause larger TF binding changes than expected in the case of completely random mutations. Since gene regulation relies on the cooperation of multiple regulatory elements, we have devised a way to combine the effects of all regulatory mutations of a gene in order to identify genes whose regulation is likely to be significantly perturbed by the mutations observed in their regulatory elements, through changes in TF binding.
We have applied our TF-centric approaches to analyze single nucleotide variants identified in a liver cancer data set from the International Cancer Genome Consortium (ICGC), and identified potentially dysregulated genes whose regulatory mutations could trigger significant TF binding changes. Notably, the genes identified by us are different from the ones prioritized by recurrence-based approaches. However, most of the potentially dysregulated genes we have identified have large changes in gene expression and/or are cancer prognostic genes. Our results suggest that regulatory mutations should be investigated further, not just by their recurrence, but also by their functional effects such as TF binding changes, to uncover dysregulated genes that may drive tumorigenesis.