Browsing by Subject "Gene regulation"
Results Per Page
Sort Options
Item Open Access A Semi-Supervised Predictive Model to Link Regulatory Regions to Their Target Genes(2015) Hafez, Dina MohamedNext generation sequencing technologies have provided us with a wealth of data profiling a diverse range of biological processes. In an effort to better understand the process of gene regulation, two predictive machine learning models specifically tailored for analyzing gene transcription and polyadenylation are presented.
Transcriptional enhancers are specific DNA sequences that act as ``information integration hubs" to confer regulatory requirements on a given cell. These non-coding DNA sequences can regulate genes from long distances, or across chromosomes, and their relationships with their target genes are not limited to one-to-one. With thousands of putative enhancers and less than 14,000 protein-coding genes, detecting enhancer-gene pairs becomes a very complex machine learning and data analysis challenge.
In order to predict these specific-sequences and link them to genes they regulate, we developed McEnhancer. Using DNAseI sensitivity data and annotated in-situ hybridization gene expression clusters, McEnhancer builds interpolated Markov models to learn enriched sequence content of known enhancer-gene pairs and predicts unknown interactions in a semi-supervised learning algorithm. Classification of predicted relationships were 73-98% accurate for gene sets with varying levels of initial known examples. Predicted interactions showed a great overlap when compared to Hi-C identified interactions. Enrichment of known functionally related TF binding motifs, enhancer-associated histone modification marks, along with corresponding developmental time point was highly evident.
On the other hand, pre-mRNA cleavage and polyadenylation is an essential step for 3'-end maturation and subsequent stability and degradation of mRNAs. This process is highly controlled by cis-regulatory elements surrounding the cleavage site (polyA site), which are frequently constrained by sequence content and position. More than 50\% of human transcripts have multiple functional polyA sites, and the specific use of alternative polyA sites (APA) results in isoforms with variable 3'-UTRs, thus potentially affecting gene regulation. Elucidating the regulatory mechanisms underlying differential polyA preferences in multiple cell types has been hindered by the lack of appropriate tests for determining APAs with significant differences across multiple libraries.
We specified a linear effects regression model to identify tissue-specific biases indicating regulated APA; the significance of differences between tissue types was assessed by an appropriately designed permutation test. This combination allowed us to identify highly specific subsets of APA events in the individual tissue types. Predictive kernel-based SVM models successfully classified constitutive polyA sites from a biologically relevant background (auROC = 99.6%), as well as tissue-specific regulated sets from each other. The main cis-regulatory elements described for polyadenylation were found to be a strong, and highly informative, hallmark for constitutive sites only. Tissue-specific regulated sites were found to contain other regulatory motifs, with the canonical PAS signal being nearly absent at brain-specific sites. We applied this model on SRp20 data, an RNA binding protein that might be involved in oncogene activation and obtained interesting insights.
Together, these two models contribute to the understanding of enhancers and the key role they play in regulating tissue-specific expression patterns during development, as well as provide a better understanding of the diversity of post-transcriptional gene regulation in multiple tissue types.
Item Open Access Elucidation of the Molecular Mechanisms Underlying Estrogen-Mediated Estrogen Receptor Activation(2017) Coons, Laurel AubrieEvery cell in our body contains the same genetic material, but what differentiates one cell type from another is the way in which that material is interpreted. Hidden in the 98% of non-coding DNA sequences, once referred to as “junk DNA,” are the instructions for how to turn genes on and off (i.e., the operating system). As in any other language, decoding the instructions between DNA and gene expression is the key for understanding transcriptional regulation. Without understanding the grammar of transcriptional regulation, we cannot tell which sequence changes affect gene expression and how.
In this study, we have focused on defining the mechanisms mediating gene expression in response to steroid hormones (predominantly 17β-estradiol) as a model system for other non-steroid transcription systems that can be exploited to define general principals governing steroid responsiveness upon target genes. In particular, we have demonstrated that DNA sequence constraints define the functionally active steroid nuclear receptor (sNR) gene regulatory elements in the genome, and this functionality is restricted to elements that vary from the consensus palindromic elements by one or two nucleotides, named nuclear receptor functional enhancers (NRFEs). At NRFEs, the chromatin binding of steroid nuclear receptors is not only correlated with active eRNA production, but also RNAPII occupancy as well as hormone-dependent coregulator and transcription factor (TF) recruitment. Moreover, steroid nuclear receptors with mutated DNA binding domains (DBD), were shown to still interact with chromatin, yet lack hormone-dependent transcriptional activity, highlighting the fact that steroid nuclear receptors can interact with chromatin in a transcriptionally inactive state (i.e., the majority of sNR chromatin interacting events identified in ChIP studies are not linked directly to transcriptional events) (Chapter 2). We further demonstrate that the palindromic architecture of the regulatory element is the underlying mechanism that governs chromatin interaction by steroid nuclear receptors, and the non-functional chromatin interacting sites (non-NRFEs) observed in ChIP-seq studies are subject to the same rules and constraints as NRFEs. Thus, NRFE vs. non-NRFE binding is dictated at the individual nucleotide level, and the residence time (or strength of binding) is determined by the number and locations of variants within the consensus element. The basis of these rules and DNA constraints follow specific algebraic relationships. These findings quantitatively define how steroid nuclear receptors select the ‘appropriate’ regulatory targets out of a very large number of highly similar sequences in the genome, thus eliciting a specific cellular response (Chapter 3).
Estradiol is a potent mitogen in the mouse uterus, a well-characterized tissue used to study the underlying mechanisms of estrogen-mediated transcriptional regulation. Previously shown, estradiol-mediated transcriptional regulation in the mouse uterus is biphasic and can be divided into initial (early) phase and subsequent (late) phase transcriptional events. In this study, we demonstrate that late phase estradiol-mediated transcription requires the early phase transcripts and low-affinity estrogen receptor α (ERα) ligands cannot sustain late phase hormone-mediated transcriptional events. In addition, the interaction of ERα with chromatin is (1) immediate, (2) does not change locations after initial contact, (3) is retained the longest at NRFEs, and (4) is depleted prior to the late phase transcriptional events. Collectively, this indicates that estradiol-mediated late phase transcripts are regulated secondary to early induced transcripts (i.e., the early induced transcripts activate other transcription factors which are responsible for producing the late phase transcripts). Furthermore, the AF2 coactivator surface of ERα is not required for hormone-dependent ERα recruitment to NRFEs, estrogen-independent basal transcription requires ERα binding at NRFEs and the growth factor insulin-like growth factor 1 (IGF-1) activates ERα by recruitment of ERα to NRFEs (Chapter 4).
Our understanding of the physiology and transcriptional regulation of steroid hormones was significantly advanced following the generation of mutant mouse models possessing disruptions (knockouts) of the steroid nuclear receptor genes. In this study, we identified the molecular defects caused by a homozygous missense mutation in ERα identified in an 18 year-old woman with complete estrogen insensitivity syndrome; a clinical presentation similar to those of ERα knockout (αERKO) female mice. From these studies, we identified a potential therapeutic, Diethylstilbestrol (DES), for treating this estrogen insensitivity condition. Treatment of this patient with DES is underway and we remain involved as collaborators in this clinical study. Our studies also characterize the molecular defects caused by a different homozygous mutation in ERα identified in two sisters and a brother that likewise exhibit complete estrogen insensitivity (Chapter 5).
Hormone-dependent transcriptional regulation requires the recruitment of coregulators to the regulatory regions of target genes. This recruitment is determined by the overall surface topography of the sNR. Phage display is a widely-used research technique for screening highly diverse peptide libraries to enrich for sNR-binding clones. It requires two primary components for affinity selection: (1) a phage display cDNA library, and (2) purified recombinant sNR protein. High level expression of soluble biologically-active sNR protein is particularly challenging due to its largely hydrophobic ligand binding domain. In this study, we overcame this challenge by constructing a protein expression system that provides the factors responsible for protein folding of sNRs (Hsp90, Hsp40, Hsp70, Hop and p23) at levels comparable with the amount of over expressed sNR (Chapter 6). Affinity selection in phage display involves panning of a phage library to enrich for sNR-binding clones followed by their amplification. This amplification step enriches for clones that have a growth advantage, introducing bias into the selection that favors faster growing clones regardless of the selection pressure. To eliminate this bias, individual phage must be separated into different growth chambers so they cannot compete for bacterial hosts. To do this, we used microfluidic flow-focusing technology (MFFT) to generate monodisperse droplet based compartments to encapsulate individual phage clones and achieve non-competitive amplification of millions of phage clones having different growth characteristics. The elimination of growth-based competition ensures that selection of binding clones is driven only by the binding strength of each clone for the sNR. The successful development of a MFFT platform and proof of principal demonstration, allowed us to then implement a high throughput MFFT droplet system. This project was the first application to introduce this novel technology to the National Institute of Environmental Health Sciences (NIEHS) (Chapter 6).
Transcriptional regulation takes place at the level of single cells. However, many traditional techniques involve homogenizing tissue samples composed of millions of cells, and thus can only deal with population averages. Single-cell sequencing reveals the inherent properties of a single cell from the large scale of the genome, information critical for understanding cellular heterogeneity in cancer and response/resistance to therapy. Using our new high throughput MFFT droplet system, genome-wide gene expression profiling of individual cells can be done by separating thousands of individual cells into nanoliter-sized aqueous droplets, associating a different barcode with each cell’s RNAs, and sequencing them all together. This results in transcripts from thousands of individual cells that are all identified by their cell of origin. Here, we establish a droplet microfluidic method to sequence genomes of single cells from dissociated, complex tissues using a custom fluorosurfactant (i.e., a triblock copolymer consisting of a polyethylene glycol (PEG) center block covalently bound to two perfluorinated polyether (PFPE) blocks by amide linking groups), to address two of the major challenges in performing biological, drop-based assays: to stabilize aqueous droplets in fluorocarbon oils and to make the droplets compatible with biological molecules and cells (Chapter 6).
Post-translational modification by SUMO is an important mechanism to regulate transcription. Tamoxifen is used in the treatment and prevention of ER positive breast cancer. Tamoxifen is metabolized predominantly by the cytochrome P450 system to several primary and secondary metabolites, some of which exhibit more antiestrogenic effects than tamoxifen itself. In this study, we demonstrate that the more antiestrogenic effect of endoxifen versus tamoxifen is due post-translational modification by SUMO and inhibition of SUMO derepresses endoxifen’s anti-estrogenic activity. This mechanism of transcriptional repression was also demonstrated in other antiestrogens including fulvestrant, raloxifene, bazedoxifene, idoxifene and lasofoxifene (Chapter 7).
Item Open Access Evolution and Mechanisms of Plasticity in Wild Baboons (Papio cynocephalus)(2017) Lea, Amanda JeanneIn many species, early life experiences have striking effects on health, reproduction, and survival in adulthood. Thus, early life conditions shape a range of evolutionarily relevant traits, and in doing so alter the genotype-phenotype relationship and the phenotypic distribution on which selection acts. Because of the key role early life effects play in generating variation in fitness-related traits, understanding their evolution and mechanistic basis is crucial. To gain traction on these topics, my dissertation draws on ecological, demographic, and genomic data from a long-term study population of wild baboons in Amboseli, Kenya to address three major themes: (i) the adaptive significance of early life effects, (ii) the molecular mechanisms that connect early life experiences with later life traits, and (iii) the development of laboratory tools for understanding the role of one particular mechanism—DNA methylation—in translating environmental inputs into phenotypic variation. In chapter one, I empirically test two competing explanations for how early life effects evolve, providing novel insight into the evolution of developmental plasticity in long-lived species. In chapter two, I address the degree to which ecological effects on fitness-related traits are potentially mediated by changes in DNA methylation. Finally, in chapter three, I develop a high-throughput assay to improve our knowledge of the phenotypic relevance of changes in the epigenome. Together, this work provides some of the first empirical data on the genes and mechanisms involved in sensing and responding to environmental variation in wild mammals, and more generally addresses several critical gaps in our understanding of how early experiences affect evolutionarily relevant traits.
Item Open Access Evolution of Floral Color Patterning in Chilean Mimulus(2008-12-05) Cooley, Arielle MarieEvolution can be studied at many levels, from phenotypic to molecular, and from a variety of disciplines. An integrative approach can help provide a more complete understanding of the complexities of evolutionary change. This dissertation examines the ecology, genetics, and molecular mechanisms of the evolution of floral anthocyanin pigmentation in four species of Mimulus native to central Chile. Anthocyanins, which create red and purple colors in many plants, are a valuable model for studying evolutionary processes. They are ecologically important and highly variable both within and between species, and the underlying biosynthetic pathway is well characterized. The focus of this dissertation is dramatic diversification in anthocyanin coloration, in four taxa that are closely related to the genomic model system M. guttatus. I posed three primary questions: (1) Is floral diversification associated with pollinator divergence? (2) What is the genetic basis of the floral diversification? (3) What is the molecular mechanism of the increased production of anthocyanin pigment? The first question was addressed by evaluating patterns of pollinator visitation in natural populations of all four study taxa. The second question was explored using segregation analysis for a series of inter- and intraspecific crosses. One trait, increased petal anthocyanins in M. cupreus, was further dissected at the molecular level, using candidate gene testing and quantitative gene expression analysis. Pollinator studies showed little effect of flower color on pollinator behavior, implying that pollinator preference probably did not drive pigment evolution in this group. However, segregation analyses revealed that petal anthocyanin pigmentation has evolved three times independently in the study taxa, suggesting an adaptive origin. In addition to pollinator attraction, anthocyanins and their biochemical precursors protect against a variety of environmental stressors, and selection may have acted on these additional functions. Molecular analysis of petal anthocyanins in M. cupreus revealed that this single-locus trait maps to a transcription factor, McAn1, which is differentially expressed in high- versus low-pigmented flowers. Expression of the anthocyanin structural genes is tightly correlated with McAn1 expression. The results suggest that M. cupreus pigmentation evolved by a mutation cis to McAn1 that alters the intensity of anthocyanin biosynthesis.
Item Open Access Evolution of Gene Regulation in Papio Baboons(2019) Vilgalys, Tauras PatrickChanges in gene regulation are thought to play an important role in primate evolution and divergence. In support of this hypothesis, comparative evidence clearly demonstrate that gene expression patterns differ between closely related species and tend to evolve under selective constraint. However, we know little about the evolutionary forces that shape gene regulation across primates, particularly outside of humans and the other great apes. To address this gap, my dissertation draws on population and functional genomic variation between baboon species and within an admixed wild baboon population to address two themes: (i) how is gene regulatory divergence related to genetic divergence? and (ii) to what extent has natural selection shaped regulatory variation? Using interspecific comparative approaches, I show that changes in DNA methylation accumulate with increasing sequence divergence. While most changes in methylation can be explained by genetic drift, a subset are likely to have evolved under positive selection. Then, using genomic data from admixed baboons, I show that interspecific changes in DNA methylation are linked to genetic effects on DNA methylation (i.e., methylation quantitative trait loci, meQTL) and differences in allele frequency between baboon species. I also show that changes in DNA methylation are associated with changes in gene expression. Finally, I identify genomic evidence for selection against admixture in baboons, especially near genes that are differentially expressed between species. Together, my work highlights the close relationship between genetic and gene regulatory divergence in baboons. It also emphasizes the importance of natural selection in shaping genetic and regulatory variation throughout primate evolution, including in a living model for admixture in our own lineage.
Item Open Access Genome Engineering Tools to Dissect Gene Regulation(2019) Kocak, Daniel DewranOver the past several years genome and epigenome engineering has been propelled forward by CRISPR-Cas technologies. These prokaryotic defense systems work well in mammalian cells in a manner that is remarkably robust: they are non-toxic, fold into a catalytically active state, localize to targeted cellular compartments, and act on the eukaryotic genome, which is heavily compacted in chromatin. While all these are true, CRISPR-cas nucleases did not evolve to function as highly specific genome engineering tools. Thus, the major goals of the work presented herein are to i) refine the specificity of CRISPR-Cas enzymes, ii) develop methods that facilitate genome engineering in human cells, and iii) apply these technologies toward outstanding problems in human gene regulation. With regard to the first goal, we set out to develop a method that could be easily applied to increase the specificity of diverse CRISPR systems. Adopting RNA-engineering to achieve this goal, we modulate the kinetics of DNA strand invasion to increase the specificity of Cas enzymes. Since the guide RNA is a feature that is common across all CRISPR systems, we expect that this new method to tune the activity and specificity of Cas enzymes will be broadly useful. To address the second goal, we set out to develop an experimental pipeline for the high throughput, precise modification of mammalian genomes. Specifically, we modify the C-termini of genes to include an epitope tag for the genome-wide profiling of transcription factor binding sites. We apply this method to over 30 genes, encoding a variety of transcription factors, chromatin modifying enzymes, and gene regulatory proteins. Out of the large number of genes we focus particularly on members of the AP-1 transcription factor family and nuclear receptor co-activator and co-repressor families. Using this ChIP-seq data, which profiles genome wide binding, and integrating a variety of other genomic information, including chromatin modifications, chromatin accessibility, other TF binding, and inherent regulatory activity, we investigate the dimerization preferences of AP-1 subunits, their genomic binding patterns, and the regulatory potential of theses subunits. Toward addressing the third goal, we decided to focus on the glucocorticoid receptor (GR). The dual activating and repressive function of the GR is incompletely understood, and this duality is a property of many other stimuli responsive transcriptional responses (e.g. NFKB signaling). Thus, how one transcription factor is biochemically endowed with the ability to both activate and repress gene expression is an outstanding problem in gene regulation. It is hypothesized that the GR recruits a variety of distinct protein complexes in order to mediate its diverse function. We used CRISPR based loss of function screening in order to discover new GR cofactors. Using this method, we find a number of cofactors, both canonical and novel, that regulate this response in A549 cells. Ongoing work investigates how general these cofactors are across the transcriptome and whether they provide an avenue to decouple GR’s dual function, which has been a major goal in drug development. Through these studies we have found a way to make CRISPR systems more specific, developed and applied CRISPR based method to define AP-1 binding and function, and used unbiased CRISPR based screens to discover novel regulators of the glucocorticoid drug response.
Chapter 1 broadly introduces this work, its motivations, and aims of research presented herein.
Chapter 2 provides an introduction to both genome engineering and gene regulation. Specifically, it describes the development and application of CRISPR-cas tools and details outstanding problems in gene regulation through the lens of nuclear receptors.
Chapter 3 describes the purification of Cas9 protein and its characterization biochemically. Specifically, we use AFM to determine the DNA binding properties of Cas9 in vitro.
Chapter 4 introduces a new method to modulate the specificity of CRISPR systems in human cells. Therein we show that RNA secondary structure can be applied to diverse CRISPR systems to tune their activity.
Chapter 5 details a method for the high throughput tagging of transcription factors. It specifically investigates members of the AP-1 transcription factor complex.
Chapter 6 is an investigation of the glucocorticoid receptor and its cofactors. We apply a variety of genome engineering and genomic methods to characterize known co-factors and discover new ones.
Chapter 7 is an outlook on the fields of genome-engineering and gene regulation. It describes key questions that are still unanswered and possible lines of attack to address them.
Item Open Access Genomic Basis for a Developmental Life History Switch in the Sea Urchin Heliocidaris erythrogramma(2021) Davidson, Phillip LukeLecithotrophic (non-feeding) larval development has independently evolved numerous times in marine invertebrates from an ancestral, planktotrophic (feeding) larval state. The evolution of this developmental mode in a species is accompanied by dramatic changes in ecology and development, including lower fecundity, higher maternal investment per offspring, changes in egg composition, alteration of embryonic fate specification, morphologically simple larvae, and reduced time to metamorphosis. Thus, the evolutionary switch between lecithotrophy and planktotrophy serves as an exemplary system for investigating the effect of changing ecological pressures on the evolution of novel developmental phenotypes. The sea urchin genus Heliocidaris represents one of the best studied examples of this switch, in which H. erythrogramma evolved lecithotrophy around five million years ago. Over the past several decades, previous work has documented phenotypes distinguishing development of this species from the ancestral, planktotrophic condition. These phenotypes range from increased sperm size and hypertrophy of lipid deposition in the egg, to changes in embryonic axis determination, delayed blastomere specification, and alterations to spatial and temporal expression of key developmental network genes. Although much is known about what phenotypes are associated with the evolution of lecithotrophy in this species, much less is known of the regulatory mechanisms for how these changes arose in the first place. This gap in knowledge is the subject of my thesis: to gain a better understanding of the genomic and molecular basis for the evolution of lecithotrophy in H. erythrogramma. To accomplish this, I carried out a set of physiological and genomic comparisons between H. erythrogramma, a closely-related planktotrophic congener H. tuberculata, and a distantly-related planktotroph Lytechinus variegatus in order to identify specific molecules and genomic loci underlying lecithotrophic development. In Chapter One, I analyzed lipid and protein content of eggs and larvae from these three species using mass spectrometry to characterize metabolic differences in egg provisioning and embryogenesis in H. erythrogramma. In Chapter 2, I present a chromosome-level assembly of L. variegatus, highlighting a genome assembly and annotation method that will be applied to the two Heliocidaris species and the utility of a high-quality genome assembly for functional genomic analysis. In Chapter 3, I compare the genome assemblies of H. erythrogramma and H tuberculata to show that a conserved developmental network controlling sea urchin development has been dramatically modified in H. erythrogramma through genic and non-coding modifications. In Chapter 4, I compare the chromatin landscapes of these three species through development using ATAC-seq to access how cis-regulatory mechanisms have evolved during the acquisition of lecithotrophic development. From this work, I found that the enormous lipid provisioning of H. erythrogramma eggs is composed primarily of diacylglycerol ether lipids and that these lipids are not metabolized for pre-metamorphic development, but instead provisioned to promote post-metamorphic survivorship of juvenile individuals. Instead, upregulated glycolysis proteins suggest this pathway may be driving rapid pre-metamorphic development. Comparative genomic analyses demonstrate positive selection and changes to chromatin accessibility have modified the regulatory genome of H. erythrogramma, especially near developmental network genes, and that these changes are associated with temporal and spatial differences in embryonic gene expression. Furthermore, the Pmar1 transcription factor family has likely lost its ancestral function in specifying the primary mesenchyme lineage in this species, a cell type responsible for larval skeletal development and patterning of the embryo. Finally, development has one of the largest effects on changes in chromatin accessibility in each species, but particularly near developmental genes, embryonic chromatin dynamics is highly associated with the life history strategy of each species. Future work identifying examples of convergent or novel pathways driving evolution of lecithotrophy in other echinoids will provide valuable insight into general principles governing how derived developmental phenotypes can evolve at short evolutionary timescales.
Item Open Access Glucocorticoid-Mediated Transcriptional Regulation in the Human Genome(2021) Seo, JungkyunGlucocorticoids (GCs) are a class of steroid hormones released from adrenal gland to mediate multiple physiological processes including the immune responses, cognitive functions and development. Glucocorticoids exert their gene regulatory effects through a ligand-activated transcription factor, glucocorticoid receptor (GR). Upon GC activation, GRs are particularly recruited to promoter-distal regulatory elements enriched with other transcription factors (TFs) and co-regulators including active protein 1 (AP-1). AP-1 is a heterodimeric TF potentially composed of subunits belongs to JUN, FOS and activating protein TF family. While the genomic function of AP-1 in response to GCs is well studied, the effect of specific configurations of AP-1 subunits on GR-mediated transcription remains unknown. In chapter 1, I introduce various regulatory components for transcriptional regulation. In chapter 2, I demonstrate that AP-1 subunits may not form preferential dimers between specific subunits, but rather bind each other promiscuously. I further show that the convergence of AP-1 subunits to enhancers is a key determinant for GR-mediated transcription and, by extension, cell-type specific environmental responses. GR binds DNA both directly and indirectly. While genome wide binding activity of GR can be effectively characterized by ChIP-seq, the binding mode (i.e. direct vs. indirect) at a specific site can’t be directly inferred. In chapter 3, I describe a machine learning approach to predict direct and indirect GR-DNA interactions using Protein Binding Microarray data. I demonstrate that motif-directed GR binding remains to be persistent after stimulus whereas indirect GR binding is likely transient. I further illustrate that robust transcriptional activation requires persistent GR binding and direct GR binding have the higher regulatory potential than indirect GR binding. GR activation represses certain genes. Along with the regulatory actions of GR cofactors, histone deacetylation is thought to regulate gene expression, especially gene repression. Therefore, I hypothesize that limiting histone deacetylases (HDAC) activity promotes robust gene activation. To test this hypothesis, in chapter 4, I delve into the GC-mediated transcriptional change after inhibiting the activity of histone deacetylases by HDACi to determine HDAC effect on GC-meditated transcriptional outputs. I demonstrate that the inhibition of HDAC activity reduces the magnitude of GC-mediated repression as well as activation in transcription. I also show that HDACi x GR-mediated intronic changes quantified from RNA-seq are minimally confounded by mRNA half-life linked to exonic changes, thereby accurately capturing transcriptional activity. Throughout the dissertation, I investigate GR-mediated transcriptional regulation by integrative analyses for numerous functional genomic datasets and by a predictive modeling for differential GR-DNA binding modes. In particular, the dissertation demonstrates that TF cooperativity, especially from subunits of AP-1 TF family, is a key determinant that drives the control of transcriptional output in cell-type specific manner. The dissertation further shows the potential for the generality of this regulatory mechanism beyond glucocorticoid stimulus and human cell types, suggesting that the TF convergence to a site, especially from the same TF family, may determine their functional specificity in a given cellular context.
Item Open Access Mechanisms of Eukaryotic Copper Homeostasis(2010) Wood, Lawrence KentCopper (Cu) is a co-factor that is essential for oxidative phosphorylation, protection from oxidative stress, angiogenesis, signaling, iron acquisition, peptide hormone maturation, and a number of other cellular processes. However, excess copper can lead to membrane damage, protein oxidation, and DNA cleavage. To balance the need for copper with the necessity to prevent accumulation to toxic levels, cells have evolved sophisticated mechanisms to regulate copper acquisition, distribution, and storage. The basic components of these regulatory systems are remarkably conserved in most eukaryotes, and this has allowed the use of a variety of model organisms to further our understanding of how Cu is taken into the cell and utilized.
While the components involved in Cu uptake, distribution, and storage are similar in many eukaryotes, evolution has led to differences in how these processes are regulated. For instance, fungi regulate the components involved in Cu uptake and detoxification primarily at the level of transcription while mammals employ a host of post-translational homeostatic mechanisms. In Saccharomyces cerevisiae, transcriptional responses to copper deficiency are mediated by the copper-responsive transcription factor Mac1. Although Mac1 activates the transcription of genes involved in high affinity copper uptake during periods of deficiency, little is known about the mechanisms by which Mac1 senses or responds to reduced copper availability. In the first part of this work, we show that the copper-dependent enzyme Sod1 (Cu,Zn superoxide dismutase) and its intracellular copper chaperone Ccs1 function in the activation of Mac1 in response to an external copper deficiency. Genetic ablation of either CCS1 or SOD1 results in a severe defect in the ability of yeast cells to activate the transcription of Mac1 target genes. The catalytic activity of Sod1 is essential for Mac1 activation and promotes a regulated increase in binding of Mac1 to copper response elements in the promoter regions of genomic Mac1 target genes. Although there is precedent for additional roles of Sod1 beyond protection of the cell from oxygen radicals, the involvement of this protein in copper-responsive transcriptional regulation has not previously been observed.
Higher eukaryotes including mice and humans regulate Cu uptake predominately by means of post-translational control of the localization and stability of the Cu transport proteins. One of these proteins, Ctr1, is the primary means of Cu uptake into the cell, and members of the highly conserved Ctr family of Cu ion channels have been shown to mediate high affinity Cu(I) uptake into cells. In yeast and cultured human cells, Ctr1 functions as a homo-trimer with each monomer harboring an amino-terminal extracellular domain, three membrane spanning domains, a cytoplasmic loop, and a cytoplasmic tail. In addition to the highly conserved Ctr1 Cu ion importer, the baker's yeast S. cerevisiae expresses a related protein called Ctr2. Experimental evidence demonstrates that unlike yeast and mammalian Ctr1, yeast Ctr2 is localized to the vacuolar membrane where it mobilizes Cu stores to the cytoplasm under conditions of Cu limitation.
In mice and humans a gene encoding a protein with significant similarity to the Ctr family has been identified, denoted Ctr2. Publications from others suggest that mammalian Ctr2 may either be a low affinity Cu importer at the plasma membrane or, similar to yeast Ctr2, may mobilize Cu from intracellular organelles such as the lysosome to the cytosol. In agreement with a previous report we found that a fraction of mouse Ctr2 is localized to the plasma membrane and that its membrane topology is the same as Ctr1. Interestingly, over-expression of Ctr2 by stable transfection results in decreased intracellular bioavailable Cu. To begin to understand the physiological role of Ctr2, mice bearing a systemic deletion of the Ctr2 gene were generated. The Ctr2-/- mice are viable but hyper-accumulate Cu in all tissues analyzed. Moreover, protein levels of the Ctr1 Cu importer are dramatically altered in tissues from the Ctr2 knock out mice, and over-expression of Ctr2 in cultured mammalian cells enhances processing of the Ctr1 protein into a less active form. Taken together these results suggest that mammalian Ctr2 functions in the cell as a negative regulator of Cu import via Ctr1.
Item Open Access Mechanistic Modeling and Experiments on Cell Fate Specification in the Sea Urchin Embryo(2012) Cheng, XianruiDuring embryogenesis, a single zygote gives rise to a multicellular embryo with distinct spatial territories marked by differential gene expression. How is this patterning process organized? How robust is this function to perturbations? Experiments that examine normal and regulative development will provide direct evidence for reasoning out the answers to these fundamental questions. Recent advances in technology have led to experimental determinations of increasingly complex gene regulatory networks (GRNs) underlying embryonic development. These GRNs offer a window into systems level properties of the developmental process, but at the same time present the challenge of characterizing their behavior. A suitable modeling framework for developmental systems is needed to help gain insights into embryonic development. Such models should contain enough detail to capture features of interest to developmental biologists, while staying simple enough to be computationally tractable and amenable to conceptual analysis. Combining experiments with the complementary modeling framework, we can grasp a systems level understanding of the regulatory program not readily visible by focusing on individual genes or pathways.
This dissertation addresses both modeling and experimental challenges. First, we present the autonomous Boolean network modeling framework and show that it is a suitable approach for developmental regulatory systems. We show that important timing information associated with the regulatory interactions can be faithfully represented in autonomous Boolean models in which binary variables representing expression levels are updated in continuous time, and that such models can provide direct insight into features that are difficult to extract from ordinary differential equation (ODE) models. As an application, we model the experimentally well-studied network controlling fly body segmentation. The Boolean model successfully generates the patterns formed in normal and genetically perturbed fly embryos, permits the derivation of constraints on the time delay parameters, clarifies the logic associated with different ODE parameter sets, and provides a platform for studying connectivity and robustness in parameter space. By elucidating the role of regulatory time delays in pattern formation, the results suggest new types of experimental measurements in early embryonic development. We then use this framework to model the much more complicated sea urchin endomesoderm specification system and describe our recent progress on this long term effort.
Second, we present experimental results on developmental plasticity of the sea urchin embryo. The sea urchin embryo has the remarkable ability to replace surgically removed tissues by reprogramming the presumptive fate of remaining tissues, a process known as transfating, which in turn is a form of regulative development. We show that regulative development requires cellular competence, and that competence is lost early on but can be regained after further differentiation. We demonstrate that regulative replacement of missing tissues can induce distal germ layers to participate in reprogramming, leading to a complete re-patterning in the remainder of the embryo. To understand the molecular mechanism of cell fate reprogramming, we examined micromere depletion induced non-skeletogenic mesoderm (NSM) transfating. We found that the skeletogenic program was greatly temporally compressed in this case, and that akin to another NSM transfating case, the transfating cells went through a hybrid regulatory state where NSM and skeletogenic marker genes were co-expressed.
Item Open Access MicroRNA Target Prediction via Duplex Formation Features and Direct Binding Evidence(2012) Lekprasert, ParaweeMicroRNAs (miRNAs) are small RNAs that have important roles in post-transcriptional gene regulation in a wide range of species. This regulation is controlled by having miRNAs directly bind to a target messenger RNA (mRNA), causing it to be destabilized and degraded, or translationally repressed. Identifying miRNA targets has been a large area of focus for study; however, a lack of generally high-throughput experiments to validate direct miRNA targeting has been a limiting factor. To overcome these limitations, computational methods have become crucial for understanding and predicting miRNA-gene target interactions.
While a variety of computational tools exist for predicting miRNA targets, many of them are focused on a similar feature set for their prediction. These commonly used features are complementarity to 5'seed of miRNAs and evolutionary conservation. Unfortunately, not all miRNA target sites are conserved or adhere to canonical seed complementarity. Seeking to address these limitations, several studies have included energy features of mRNA:miRNA duplex formation as alternative features. However, different independent evaluations reported conflicting results on the reliability of energy-based predictions. Here, we reassess the usefulness of energy features for mammalian target prediction, aiming to relax or eliminate the need for perfect seed matches and conservation requirement.
We detect significant differences of energy features at experimentally supported human miRNA target sites and at genome-wide interaction sites to Argonaute (AGO) protein family members, which are essential parts of the miRNA machinery complex. This trend is confirmed on data sets that assay the effect of miRNAs on mRNA and protein expression changes, where a statistically significant change in expression is noted when compared to the control. Furthermore, our method also allows for prediction of strictly imperfect sites, as well as non-conserved targets.
Recently, new methods for identifying direct miRNA binding have been developed, which provides us with additional sources of information for miRNA target prediction. While some computational target predictions tools have begun to incorporate this information, they still rely on the presence of a seed match in the AGO-bound windows without accounting for the possibility of variations.
We investigate the usefulness of the site level direct binding evidence in miRNA target identification and propose a model that incorporates multiple different features along with the AGO-interaction data. Our method outperforms both an ad hoc strategy of seed match searches as well as an existing target prediction tool, while still allowing for predictions of sites other than a long perfect seed match. Additionally, we show supporting evidence for a class of non-canonical sites as bound targets. Our model can be extended to predict additional types of imperfect sites, and can also be readily modified to include additional features that may produce additional improvements.
Item Open Access Modeling Nuclease Digestion Data to Predict the Dynamics of Genome-wide Transcription Factor Occupancy(2016) Luo, KaixuanIdentifying and deciphering the complex regulatory information embedded in the genome is critical to our understanding of biology and the etiology of complex diseases. The regulation of gene expression is governed largely by the occupancy of transcription factors (TFs) at various cognate binding sites. Characterizing TF binding is particularly challenging since TF occupancy is not just complex but also dynamic. Current genome-wide surveys of TF binding sites typically use chromatin immunoprecipitation (ChIP), which is limited to measuring one TF at a time, thus less scalable in profiling the dynamics of TF occupancy across cell types or conditions. This dissertation develops novel computational frameworks to model sequencing data from DNase and/or MNase nuclease digestion assays that allows multiple TFs to be surveyed in a single experiment, in both human and yeast. We predicted occupancy landscapes and constructed a cell-type specificity map for many TFs across human cell types, revealed novel relationships between TF occupancy and TF expression, and monitored the occupancy dynamics of various TFs in response to androgen and estrogen hormone simulations. The TF/cell type occupancy matrix generated from our model expands the total output of the ENCODE ChIP-seq efforts by a factor of nearly 200 times. These computational frameworks serve as an innovative and cost effective strategy which enables efficient profiling of TF occupancy landscapes across different cell types or dynamic conditions in a high-throughput manner.
Item Open Access Organ-Level Communication During Heart Regeneration In Zebrafish.(2022) Sun, FeiTissue regeneration has been primarily investigated as local remodeling events in response to tissue damage or loss. Recent studies, however, indicate that uninjured structures can respond to distant tissue trauma and, in some cases, regulate tissue regeneration. One of the key questions that haven’t been answered in the field is how animals simultaneously exert customized control of local and remote injury responses during regeneration. Taking advantage of the genetic cardiomyocyte ablation system developed in adult zebrafish, we explored uninjured brain and kidney responses to heart regeneration. This dissertation identified a transcription factor gene, cebpd, through transcriptomic profiling of the uninjured brain and kidney during zebrafish heart regeneration. The expression of cebpd is induced both locally in the epicardial tissue of regenerating hearts and distantly in the brain ependymal layer and renal tubules. Knocking out cebpd using the CRISPR system, we found that cebpd is required for tissue repair adjacent to an injury event, as well as in the physiological sequelae of fluid regulation encompassing remote tissues. By profiling and molecular genetics in zebrafish, we identified a novel class of remote tissue regenerative enhancer elements (r-TREEs) responsible for remote gene activation during tissue regeneration. Interestingly, removing cebpd associated enhancer element CEN only abolished gene activation in remote uninjured brain and kidney but not local regenerating hearts. We further demonstrated that corticosteroid receptor activities are sufficient and required for CEN-dependent regulation of gene expression in remote tissues during regeneration. Loss of CEN perturbed fluid regulation in zebrafish during heart regeneration. My findings suggest a novel concept in tissue regeneration, in which r-TREEs segregate local and remote responses and stratify regeneration and physiological functions of key regulatory genes to achieve whole-organism coordination during regeneration.
Item Embargo Orthogonal screens to decode human T cell state and function(2024) McCutcheon, Sean RIn the last decade, the paradigm for cancer therapy has incrementally transitioned away from non-specific cytotoxic therapies (radiation, chemotherapy) and targeted therapies (small molecules, biologics) and towards immune cell-based therapies. Immune cell-based therapies such as adoptive T cell therapy (ACT) harness the intrinsic ‘sense and respond’ functions of immune cells to selectively target and eliminate cancer cells. Nevertheless, more than half of cancer patients either do not respond or relapse to existing ACTs. Several studies have defined specific transcriptional and epigenetic signatures of the infused T cell product associated with clinical response, indicating that T cell state and fitness is linked to ACT efficacy. Thus, epigenetically reprogramming T cells with enhanced potency and durability has the potential to improve ACT. However, this potential has yet to be fully realized due to technical challenges of adapting CRISPR-based epigenome editing technologies for applications in primary human T cells. To overcome these challenges, we developed and rigorously characterized compact and robust CRISPR repressors and activators for endogenous gene regulation. Next, we leveraged these technologies to systematically interrogate the effects of >100 transcriptional and epigenetic regulators on human CD8+ T cell state and function through complementary CRISPR interference (CRISPRi) and activation (CRISPRa) screens. These CRISPRi/a screens converged on basic leucine zipper ATF-like transcription factor (BATF3). Subsequent assays revealed that BATF3 overexpression promotes specific features of memory T cells (such as increased expression of IL7R and glycolysis), counters T cell exhaustion, and enhances CAR T cell potency in both in vitro and in vivo tumor models. In addition, BATF3 programs a transcriptional profile strongly associated with positive clinical response to CD19 CAR T cell therapy. Given that BATF3 is a compact transcription factor (TF) without any transactivation or epigenetic domains, we speculated that BATF3 achieves its widespread effects by interacting with other TFs. To identify these factors, we conducted parallel CRISPR knockout screens targeting all TFs with or without BATF3 overexpression. Using IL7R expression as a proxy for BATF3 activity, we identified both BATF3-independent and dependent transcriptional regulators of IL7R expression. For example, JUNB and IRF4 were uniquely enriched in the low IL7R population in the screen with BATF3 overexpression, suggesting BATF3 heterodimerizes with JUNB and interacts with IRF4 to regulate gene expression. Finally, these CRISPR knockout screens illuminated other candidate therapeutic targets for future exploration and characterization. Overall, we have developed a widely applicable synthetic biology toolkit of orthogonal epigenome editors, which we used to systematically identify regulators of human CD8+ T cell state and function. This catalogue of regulators could serve as the basis for engineering next generation T cell therapies for cancer.
Item Open Access Quantifying Eukaryotic Gene Regulation in Hormone Response and Disease.(2016) Vockley, Christopher VockleyQuantifying the function of mammalian enhancers at the genome or population scale has been longstanding challenge in the field of gene regulation. Studies of individual enhancers have provided anecdotal evidence on which many foundational assumptions in the field are based. Genome-scale studies have revealed that the number of sites bound by a given transcription factor far outnumber the genes that the factor regulates. In this dissertation we describe a new method, chromatin immune-enriched reporter assays (ChIP-reporters), and use that approach to comprehensively test the enhancer activity of genomic loci bound by the glucocorticoid receptor (GR). Integrative genomics analyses of our ChIP-reporter data revealed an unexpected mechanism of glucocorticoid (GC)-induced gene regulation. In that mechanism, only the minority of GR bound sites acts as GC-inducible enhancers. Many non-GC-inducible GR binding sites interact with GC-induced sites via chromatin looping. These interactions can increase the activity of GC-induced enhancers. Finally, we describe a method that enables the detection and characterization of the functional effects of non-coding genetic variation on enhancer activity at the population scale. Taken together, these studies yield both mechanistic and genetic evidence that provides context that informs the understanding of the effects of multiple enhancer variants on gene expression.
Item Open Access Regulatory Elements and Gene Expression in Primates and Diverse Human Cell-types(2013) Sheffield, NathanAfter finishing a human genome reference sequence in 2002, the genomics community has
turned to the task of interpreting it. A primary focus is to identify and characterize not only
protein-coding genes, but all functional elements in the genome. The effort has identified
millions of regulatory elements across species and in hundreds of human cell-types. Nearly
all identified regulatory elements are found in non-coding DNA, hypothesizing a function
for previously unannotated sequence. The ability to identify regulatory DNA genome-wide
provides a new opportunity to understand gene regulation and to ask fundamental questions
in diverse areas of biology.
One such area is the aim to understand the molecular basis for phenotypic differences
between humans and other primates. These phenotypic differences are partially driven
by mutations in non-coding regulatory DNA that alter gene expression. This hypothesis
has been supported by differential gene expression analyses in general, but we have not
yet identified specific regulatory variants responsible for differences in transcription and
phenotype. I have worked to identify regulatory differences in the same cell-type isolated
from human, chimpanzee, and macaque. Most regulatory elements were conserved among
all three species, as expected based on their central role in regulating transcription. How-
ever, several hundred regulatory elements were gained or lost on the lineages leading to
modern human and chimpanzee. Species-specific regulatory elements are enriched near
differentially expressed genes, are positively correlated with increased transcription, show
evidence of branch-specific positive selection, and overlap with active chromatin marks.
ivSpecies-specific sequence differences in transcription factor motifs found within this regu-
latory DNA are linked with species-specific changes in chromatin accessibility. Together,
these indicate that species-specific regulatory elements contribute to transcriptional and
phenotypic differences among primate species.
Another fundamental function of regulatory elements is to define different cell-types in
multicellular organisms. Regulatory elements recruit transcription factors that modulate
gene expression distinctly across cell-types. In a study of 112 human cell-types, I classified
regulatory elements into clusters based on regulatory signal tissue specificity. I then used
these to uncover distinct associations between regulatory elements and promoters, CpG-
islands, conserved elements, and transcription factor motif enrichment. Motif analysis
identified known and novel transcription factor binding motifs in cell-type-specific and
ubiquitous regulatory elements. I also developed a classifier that accurately predicts cell-
type lineage based on only 43 regulatory elements and evaluated the tissue of origin for
cancer cell-types. By correlating regulatory signal and gene expression, I predicted target
genes for more than 500k regulatory elements. Finally, I introduced a web resource to
enable researchers to explore these regulatory patterns and better understand how expression
is modulated within and across human cell-types.
Regulation of gene expression is fundamental to life. This dissertation uses identified
regulatory DNA to better understand regulatory systems. In the context of either evolution-
ary or developmental biology, understanding how differences in regulatory DNA contribute
to phenotype will be central to completely understanding human biology.
Item Open Access Roles of CTCF and YY1 in T Cell Receptor Gene Rearrangement And T Cell Development(2016) Chen, LiangDiversity of T cell receptors (TCR) and immunoglobulins (Ig) is generated by V(D)J recombination of antigen receptor (AgR) loci. The Tcra-Tcrd locus is of particular interest because it displays a nested organization of Tcrd and Tcra gene segments and V(D)J recombination follows an intricate developmental program to assemble both TCRδ and TCRα repertoires. However, the mechanisms that dictate the developmental regulation of V(D)J recombination of the Tcra-Tcrd locus remain unclear.
We have previously shown that CCCTC-binding factor (CTCF) regulates Tcra gene transcription and rearrangement through organizing chromatin looping between CTCF- binding elements (CBEs). This study is one of many showing that CTCF functions as a chromatin organizer and transcriptional regulator genome-wide. However, detailed understanding of the impact of specific CBEs is needed to fully comprehend the biological function of CTCF and how CTCF influences the generation of the TCR repertoire during thymocyte development. Thus, we generated several mouse models with genetically modified CBEs to gain insight into the CTCF-dependent regulation of the Tcra-Tcrd locus. We revealed a CTCF-dependent chromatin interaction network at the Tcra-Tcrd locus in double-negative thymocytes. Disruption of a discrete chromatin loop encompassing Dδ, Jδ and Cδ gene segments allowed a single Vδ segment to frequently contact and rearrange to diversity and joining gene segments and dominate the adult TCRδ repertoire. Disruption of this loop also narrowed the TCRα repertoire, which, we believe, followed as a consequence of the restricted TCRδ repertoire. Hence, a single CTCF-mediated chromatin loop directly regulates TCRδ diversity and indirectly regulates TCRα diversity. In addition, we showed that insertion of an ectopic CBE can modify chromatin interactions and disrupt the rearrangement of particular Vδ gene segments. Finally, we investigated the role of YY1 in early T cell development by conditionally deleting YY1 in developing thymocytes. We found that early ablation of YY1 caused severe developmental defects in the DN compartment due to a dramatic increase in DN thymocyte apoptosis. Furthermore, late ablation of YY1 resulted in increased apoptosis of DP thymocytes and a restricted TCRα repertoire. Mechanistically, we showed that p53 was upregulated in both DN and DP YY1-deficient thymocytes. Eliminating p53 in YY1-deficient thymocytes rescued the survival and developmental defects, indicating that these YY1-dependent defects were p53-mediated. We conclude that YY1 is required to maintain cell viability during thymocyte development by thwarting the accumulation of p53.
Overall, this thesis work has shown that CTCF-dependent looping provides a central framework for lineage- and developmental stage-specific regulation of Tcra-Tcrd gene expression and rearrangements. In addition, we identified YY1 as a novel regulator of thymocyte viability.
Item Open Access Studies into Location-specific cis-Regulatory Motifs(2010) Yokoyama, Ken DaigoroGene expression and regulation are major determinants of phenotypic traits displayed across species. Although the DNA sequence elements that control gene expression play a crucial role in determining species morphology, predicting cis-regulatory elements through sequence analysis alone remains a difficult task. A few regulatory elements, such as the TATA-box and Initiator sequence, have been known to exhibit overrepresentation at specific locations within the proximal promoter. However, the extent to which this occurs among cis-regulatory elements is not well understood. Here, we take a genome-wide approach towards detecting such functional sequence elements, using location-specific overrepresentation as a criterion for regulatory function. We provide evidence that a surprisingly large number of regulatory elements exhibit locational overrepresentation with respect to the transcription start site. We then utilize this characteristic to predict novel cis-regulatory elements overrepresented at particular locations within the proximal promoter.
Transcriptional regulation is most often controlled not by single protein factors acting in isolation, but instead multiple transcription factors acting together within multi-protein complexes. As protein-protein interactions are largely determined through protein structure, we would expect to see patterns of spatial preference between motif-pairs binding interacting factors. However, in the absence of methods to predict such spatial preferences between motifs, comprehensive assessments of such inter-relationships have not been previously conducted. As our model provides a general tool for detecting positional specificities of a motif relative to a given reference point, we expanded our model to measure distance preferences between pairs of motifs on a genome-wide scale. We show that there often exist patterns of spatial dependencies between pairs of sequence elements that bind interacting protein factors. We find that regulatory motifs binding interacting proteins often have multiple inter-motif distances at which they preferentially occur, and we show that the intervals between preferred distances are highly consistent across motif-pairs. This distance preference `phasing' was empirically found to occur at consistent intervals around ~8-10 bp, corresponding to approximately the number of nucleotides within a single turn of the DNA double-helix. This finding suggests a tendency for protein factor-pairs to interact in a specific orientation with respect to the turn of the DNA molecule, and offers a convenient method by which to determine motif-pairs binding interacting transcription factors de novo.
While little is known about the mechanisms by which individual cis-regulatory elements ultimately control gene expression, even less is known about how such elements evolve over time. A single transcription factor can potentially target hundreds of genes across the genome, and thus modifications in the binding affinities of such proteins must induce conversions at a multitude of functional sites in order to preserve the set of target genes that the trans-factor regulates. It is therefore commonly assumed that such changes occur rarely and at a slow rate over the course of evolution. Despite this widespread assumption, we find that a surprisingly large number of cis-regulatory elements have been subject to significant changes in consensus sequence in a lineage-specific manner. Here, we demonstrate that the genomic landscape is highly adaptable, rapidly adjusting to global changes in preferred regulatory consensus sequences. Focusing upon regulatory elements exhibiting location-specific overrepresentation, we find that a substantial fraction of regulatory elements have been subject to evolutionary modifications, even between closely related eutherians. These findings have broad implications regarding evolving phenotypes observed across species.
Item Open Access Targeted Gene Repression Technologies for Regenerative Medicine, Genomics, and Gene Therapy(2016) Thakore, Pratiksha IshwarsinhGene regulation is a complex and tightly controlled process that defines cell function in physiological and abnormal states. Programmable gene repression technologies enable loss-of-function studies for dissecting gene regulation mechanisms and represent an exciting avenue for gene therapy. Established and recently developed methods now exist to modulate gene sequence, epigenetic marks, transcriptional activity, and post-transcriptional processes, providing unprecedented genetic control over cell phenotype. Our objective was to apply and develop targeted repression technologies for regenerative medicine, genomics, and gene therapy applications. We used RNA interference to control cell cycle regulation in myogenic differentiation and enhance the proliferative capacity of tissue engineered cartilage constructs. These studies demonstrate how modulation of a single gene can be used to guide cell differentiation for regenerative medicine strategies. RNA-guided gene regulation with the CRISPR/Cas9 system has rapidly expanded the targeted repression repertoire from silencing single protein-coding genes to modulation of genes, promoters, and other distal regulatory elements. In order to facilitate its adaptation for basic research and translational applications, we demonstrated the high degree of specificity for gene targeting, gene silencing, and chromatin modification possible with Cas9 repressors. The specificity and effectiveness of RNA-guided transcriptional repressors for silencing endogenous genes are promising characteristics for mechanistic studies of gene regulation and cell phenotype. Furthermore, our results support the use of Cas9-based repressors as a platform for novel gene therapy strategies. We developed an in vivo AAV-based gene repression system for silencing endogenous genes in a mouse model. Together, these studies demonstrate the utility of gene repression tools for guiding cell phenotype and the potential of the RNA-guided CRISPR/Cas9 platform for applications such as causal studies of gene regulatory mechanisms and gene therapy.
Item Open Access Transcription Factor-Centric Approaches to Identify Regulatory Driver Mutations in Cancer(2020) Zhao, JingkangMost previous efforts to identify cancer driver mutations have focused on protein-coding genes. In recent years, the decreasing costs of DNA sequencing have enabled whole-genome sequencing (WGS) studies of thousands of tumor samples, making it possible to systematically survey non-coding regions for potential driver events. From these studies, millions of somatic mutations in cancer have been identified, the majority of which are non-coding. However, driver identification remains a far greater challenge in non-coding regions than in coding genes, primarily due to the incomplete annotation of the non-coding genome and the unknown functional impact of non-coding mutations.
In this work, we present new approaches to identify putative regulatory driver mutations in cancer, based on new methodology for predicting the quantitative effects of single nucleotide variants on transcription factor (TF) binding. Unlike most of the previous work on driver identification, our method does not require the driver mutations to be highly recurrent; instead, we assess the mutations’ significance by testing if they cause larger TF binding changes than expected in the case of completely random mutations. Since gene regulation relies on the cooperation of multiple regulatory elements, we have devised a way to combine the effects of all regulatory mutations of a gene in order to identify genes whose regulation is likely to be significantly perturbed by the mutations observed in their regulatory elements, through changes in TF binding.
We have applied our TF-centric approaches to analyze single nucleotide variants identified in a liver cancer data set from the International Cancer Genome Consortium (ICGC), and identified potentially dysregulated genes whose regulatory mutations could trigger significant TF binding changes. Notably, the genes identified by us are different from the ones prioritized by recurrence-based approaches. However, most of the potentially dysregulated genes we have identified have large changes in gene expression and/or are cancer prognostic genes. Our results suggest that regulatory mutations should be investigated further, not just by their recurrence, but also by their functional effects such as TF binding changes, to uncover dysregulated genes that may drive tumorigenesis.