Browsing by Author "Crawford, Gregory E"
Results Per Page
Sort Options
Item Open Access A complex intronic enhancer regulates expression of the CFTR gene by direct interaction with the promoter.(J Cell Mol Med, 2009-04) Ott, Christopher J; Suszko, Magdalena; Blackledge, Neil P; Wright, Jane E; Crawford, Gregory E; Harris, AnnGenes can maintain spatiotemporal expression patterns by long-range interactions between cis-acting elements. The cystic fibrosis transmembrane conductance regulator gene (CFTR) is expressed primarily in epithelial cells. An element located within a DNase I-hypersensitive site (DHS) 10 kb into the first intron was previously shown to augment CFTR promoter activity in a tissue-specific manner. Here, we reveal the mechanism by which this element influences CFTR transcription. We employed a high-resolution method of mapping DHS using tiled microarrays to accurately locate the intron 1 DHS. Transfection of promoter-reporter constructs demonstrated that the element displays classical tissue-specific enhancer properties and can independently recruit factors necessary for transcription initiation. In vitro DNase I footprinting analysis identified a protected region that corresponds to a conserved, predicted binding site for hepatocyte nuclear factor 1 (HNF1). We demonstrate by electromobility shift assays (EMSA) and chromatin immunoprecipitation (ChIP) that HNF1 binds to this element both in vitro and in vivo. Moreover, using chromosome conformation capture (3C) analysis, we show that this element interacts with the CFTR promoter in CFTR-expressing cells. These data provide the first insight into the three- dimensional (3D) structure of the CFTR locus and confirm the contribution of intronic cis-acting elements to the regulation of CFTR gene expression.Item Open Access CHD7 targets active gene enhancer elements to modulate ES cell-specific gene expression.(PLoS genetics, 2010-07-01) Schnetz, Michael P; Handoko, Lusy; Akhtar-Zaidi, Batool; Bartels, Cynthia F; Pereira, C Filipe; Fisher, Amanda G; Adams, David J; Flicek, Paul; Crawford, Gregory E; Laframboise, Thomas; Tesar, Paul; Wei, Chia-Lin; Scacheri, Peter CCHD7 is one of nine members of the chromodomain helicase DNA-binding domain family of ATP-dependent chromatin remodeling enzymes found in mammalian cells. De novo mutation of CHD7 is a major cause of CHARGE syndrome, a genetic condition characterized by multiple congenital anomalies. To gain insights to the function of CHD7, we used the technique of chromatin immunoprecipitation followed by massively parallel DNA sequencing (ChIP-Seq) to map CHD7 sites in mouse ES cells. We identified 10,483 sites on chromatin bound by CHD7 at high confidence. Most of the CHD7 sites show features of gene enhancer elements. Specifically, CHD7 sites are predominantly located distal to transcription start sites, contain high levels of H3K4 mono-methylation, found within open chromatin that is hypersensitive to DNase I digestion, and correlate with ES cell-specific gene expression. Moreover, CHD7 co-localizes with P300, a known enhancer-binding protein and strong predictor of enhancer activity. Correlations with 18 other factors mapped by ChIP-seq in mouse ES cells indicate that CHD7 also co-localizes with ES cell master regulators OCT4, SOX2, and NANOG. Correlations between CHD7 sites and global gene expression profiles obtained from Chd7(+/+), Chd7(+/-), and Chd7(-/-) ES cells indicate that CHD7 functions at enhancers as a transcriptional rheostat to modulate, or fine-tune the expression levels of ES-specific genes. CHD7 can modulate genes in either the positive or negative direction, although negative regulation appears to be the more direct effect of CHD7 binding. These data indicate that enhancer-binding proteins can limit gene expression and are not necessarily co-activators. Although ES cells are not likely to be affected in CHARGE syndrome, we propose that enhancer-mediated gene dysregulation contributes to disease pathogenesis and that the critical CHD7 target genes may be subject to positive or negative regulation.Item Open Access Chromatin Accessibility Dynamics Underlying Development and Disease(2015) Frank, Christopher L.Despite a largely static DNA sequence, our genomes are incredibly malleable. Comparative studies of chromatin features between different cell types, tissues, and species have revealed tremendous differences in how the genome is accessed, transcribed, and replicated. However, how the dynamics of chromatin accessibility contribute to development, environmental response, and disease status has only begun to be appreciated. In this work we identified chromatin accessibility changes by DNase-seq in three diverse processes: in granule neurons of the developing cerebellum, with intestinal epithelial cells in the absence of a normal microbiota, and with myelogenous leukemia cells in response to histone deacetylase inhibitor treatments. In all cases, we coupled these analyses with RNA-seq assays to identify concurrent transcriptional changes. By mapping the changes to these genome-wide signals we defined the contribution of local chromatin structure to the transcriptional programs underlying these processes, and improved our understanding of their relation to other chromatin changes like histone modifications. Furthermore we demonstrated use of the strongest accessibility changes to identify transcription factors critical for these processes by finding enrichment of their binding motifs. For a few of these key factors, depletion or overexpression of the protein was sufficient to regulate the expression of predicted target genes or exert limited chromatin accessibility changes, demonstrating the functional significance of these proteins in these processes. Together these studies have informed our understanding of the role chromatin accessibility changes play in development and environmental responses while also proving their utility for key regulator identification.
Item Open Access Chromatin accessibility mapping identifies mediators of basal transcription and retinoid-induced repression of OTX2 in medulloblastoma.(PLoS One, 2014) Wortham, Matthew; Guo, Changcun; Zhang, Monica; Song, Lingyun; Lee, Bum-Kyu; Iyer, Vishwanath R; Furey, Terrence S; Crawford, Gregory E; Yan, Hai; He, YipingDespite an emerging understanding of the genetic alterations giving rise to various tumors, the mechanisms whereby most oncogenes are overexpressed remain unclear. Here we have utilized an integrated approach of genomewide regulatory element mapping via DNase-seq followed by conventional reporter assays and transcription factor binding site discovery to characterize the transcriptional regulation of the medulloblastoma oncogene Orthodenticle Homeobox 2 (OTX2). Through these studies we have revealed that OTX2 is differentially regulated in medulloblastoma at the level of chromatin accessibility, which is in part mediated by DNA methylation. In cell lines exhibiting chromatin accessibility of OTX2 regulatory regions, we found that autoregulation maintains OTX2 expression. Comparison of medulloblastoma regulatory elements with those of the developing brain reveals that these tumors engage a developmental regulatory program to drive OTX2 transcription. Finally, we have identified a transcriptional regulatory element mediating retinoid-induced OTX2 repression in these tumors. This work characterizes for the first time the mechanisms of OTX2 overexpression in medulloblastoma. Furthermore, this study establishes proof of principle for applying ENCODE datasets towards the characterization of upstream trans-acting factors mediating expression of individual genes.Item Open Access Comparative Serum Challenges Show Divergent Patterns of Gene Expression and Open Chromatin in Human and Chimpanzee.(Genome biology and evolution, 2018-03) Pizzollo, Jason; Nielsen, William J; Shibata, Yoichiro; Safi, Alexias; Crawford, Gregory E; Wray, Gregory A; Babbitt, Courtney CHumans experience higher rates of age-associated diseases than our closest living evolutionary relatives, chimpanzees. Environmental factors can explain many of these increases in disease risk, but species-specific genetic changes can also play a role. Alleles that confer increased disease susceptibility later in life can persist in a population in the absence of selective pressure if those changes confer positive adaptation early in life. One age-associated disease that disproportionately affects humans compared with chimpanzees is epithelial cancer. Here, we explored genetic differences between humans and chimpanzees in a well-defined experimental assay that mimics gene expression changes that happen during cancer progression: A fibroblast serum challenge. We used this assay with fibroblasts isolated from humans and chimpanzees to explore species-specific differences in gene expression and chromatin state with RNA-Seq and DNase-Seq. Our data reveal that human fibroblasts increase expression of genes associated with wound healing and cancer pathways; in contrast, chimpanzee gene expression changes are not concentrated around particular functional categories. Chromatin accessibility dramatically increases in human fibroblasts, yet decreases in chimpanzee cells during the serum response. Many regions of opening and closing chromatin are in close proximity to genes encoding transcription factors or genes involved in wound healing processes, further supporting the link between changes in activity of regulatory elements and changes in gene expression. Together, these expression and open chromatin data show that humans and chimpanzees have dramatically different responses to the same physiological stressor, and how a core physiological process can evolve quickly over relatively short evolutionary time scales.Item Open Access Epigenetic basis of oncogenic-Kras-mediated epithelial-cellular proliferation and plasticity.(Developmental cell, 2022-02) Kadur Lakshminarasimha Murthy, Preetish; Xi, Rui; Arguijo, Diana; Everitt, Jeffrey I; Kocak, Dewran D; Kobayashi, Yoshihiko; Bozec, Aline; Vicent, Silvestre; Ding, Shengli; Crawford, Gregory E; Hsu, David; Tata, Purushothama Rao; Reddy, Timothy; Shen, XilingOncogenic Kras induces a hyper-proliferative state that permits cells to progress to neoplasms in diverse epithelial tissues. Depending on the cell of origin, this also involves lineage transformation. Although a multitude of downstream factors have been implicated in these processes, the precise chronology of molecular events controlling them remains elusive. Using mouse models, primary human tissues, and cell lines, we show that, in Kras-mutant alveolar type II cells (AEC2), FOSL1-based AP-1 factor guides the mSWI/SNF complex to increase chromatin accessibility at genomic loci controlling the expression of genes necessary for neoplastic transformation. We identified two orthogonal processes in Kras-mutant distal airway club cells. The first promoted their transdifferentiation into an AEC2-like state through NKX2.1, and the second controlled oncogenic transformation through the AP-1 complex. Our results suggest that neoplasms retain an epigenetic memory of their cell of origin through cell-type-specific transcription factors. Our analysis showed that a cross-tissue-conserved AP-1-dependent chromatin remodeling program regulates carcinogenesis.Item Open Access Explicit DNase sequence bias modeling enables high-resolution transcription factor footprint detection.(Nucleic Acids Res, 2014-10-29) Yardımcı, Galip Gürkan; Frank, Christopher L; Crawford, Gregory E; Ohler, UweDNaseI footprinting is an established assay for identifying transcription factor (TF)-DNA interactions with single base pair resolution. High-throughput DNase-seq assays have recently been used to detect in vivo DNase footprints across the genome. Multiple computational approaches have been developed to identify DNase-seq footprints as predictors of TF binding. However, recent studies have pointed to a substantial cleavage bias of DNase and its negative impact on predictive performance of footprinting. To assess the potential for using DNase-seq to identify individual binding sites, we performed DNase-seq on deproteinized genomic DNA and determined sequence cleavage bias. This allowed us to build bias corrected and TF-specific footprint models. The predictive performance of these models demonstrated that predicted footprints corresponded to high-confidence TF-DNA interactions. DNase-seq footprints were absent under a fraction of ChIP-seq peaks, which we show to be indicative of weaker binding, indirect TF-DNA interactions or possible ChIP artifacts. The modeling approach was also able to detect variation in the consensus motifs that TFs bind to. Finally, cell type specific footprints were detected within DNase hypersensitive sites that are present in multiple cell types, further supporting that footprints can identify changes in TF binding that are not detectable using other strategies.Item Open Access Genome-wide Analysis of Chromatin Structure across Diverse Human Cell Types(2013) Winter, Deborah R.Chromatin structure plays an important role in gene regulation, especially in differentiating the diverse cell types in humans. In this dissertation, we analyze the nucleosome positioning and open chromatin profiles genome-wide and investigate the relationship with transcription initiation, the activity of regulatory elements, and expression levels. We mainly focus on the results of DNase-seq experiments, but also employ annotations from MNase-seq, FAIRE-seq, ChIP-seq, CAGE, and RNA microarrays. Our methods are based on computational approaches including managing large data sets, statistical analysis, and machine learning. We find that different transcription initiation patterns lead to distinct chromatin structures, suggesting diverse regulatory strategies. Moreover, we present a tool for comparing genome-wide annotation tracks and evaluate DNase-seq against a unique assay for detecting open chromatin. We also demonstrate how DNase-seq can be used to successfully predict rotationally stable nucleosomes that are conserved across cell types. We conclude that DNase-seq can be used to study genome-wide chromatin structure in an effort to better understand how it regulates gene expression.
Item Open Access Genome-wide Cross-species Analysis Linking Open Chromatin, Differential Expression and Positive Selection(2012) Shibata, YoichiroDeciphering the molecular mechanisms driving the phenotypic differences between humans and primates remains a daunting challenge. Mutations found in protein coding DNA alone has not been able to explain these phenotypic differences. The hypothesis that mutations in non-coding regulatory DNA are responsible for altered gene expression leading to these phenotypic changes has now been widely supported by differential gene expression experiments. Yet, comprehensive identification of all regulatory DNA elements across different species has not been performed. To identify the genetic source of regulatory change, genome-wide DNaseI hypersensitivity assays, marking all types of active gene regulatory element sites, were performed in human, chimpanzee, macaque, orangutan, and mouse. Many DNaseI hypersensitive (DHS) sites were conserved among all 5 species, but we also identified hundreds of novel human- and chimpanzee-specific DHS gains and losses that showed signatures of positive selection. Species-specific DHS gains were enriched in distal non-coding regions, associated with active histone modifications, and positively correlated with increased expression - indicating that these are likely to be functioning as enhancers. Comparison to mouse DHS data indicate that human or chimpanzee DHS gains are likely to have been a result of single events that occurred primarily on the human- or chimpanzee-specific branch, respectively. In contrast, DHS losses are associated with events that occurred on multiple branches. At least one mechanism contributing to DHS gains and losses are species-specific variants that lead to sequence changes at transcription factor binding motifs, affecting the binding of TFs such as AP1. These variants were functionally verified by DNase footprinting and ChIP-qPCR analyses.
Item Open Access HDAC inhibitors cause site-specific chromatin remodeling at PU.1-bound enhancers in K562 cells.(Epigenetics Chromatin, 2016) Frank, Christopher L; Manandhar, Dinesh; Gordân, Raluca; Crawford, Gregory EBACKGROUND: Small molecule inhibitors of histone deacetylases (HDACi) hold promise as anticancer agents for particular malignancies. However, clinical use is often confounded by toxicity, perhaps due to indiscriminate hyperacetylation of cellular proteins. Therefore, elucidating the mechanisms by which HDACi trigger differentiation, cell cycle arrest, or apoptosis of cancer cells could inform development of more targeted therapies. We used the myelogenous leukemia line K562 as a model of HDACi-induced differentiation to investigate chromatin accessibility (DNase-seq) and expression (RNA-seq) changes associated with this process. RESULTS: We identified several thousand specific regulatory elements [~10 % of total DNase I-hypersensitive (DHS) sites] that become significantly more or less accessible with sodium butyrate or suberanilohydroxamic acid treatment. Most of the differential DHS sites display hallmarks of enhancers, including being enriched for non-promoter regions, associating with nearby gene expression changes, and increasing luciferase reporter expression in K562 cells. Differential DHS sites were enriched for key hematopoietic lineage transcription factor motifs, including SPI1 (PU.1), a known pioneer factor. We found PU.1 increases binding at opened DHS sites with HDACi treatment by ChIP-seq, but PU.1 knockdown by shRNA fails to block the chromatin accessibility and expression changes. A machine-learning approach indicates H3K27me3 initially marks PU.1-bound sites that open with HDACi treatment, suggesting these sites are epigenetically poised. CONCLUSIONS: We find HDACi treatment of K562 cells results in site-specific chromatin remodeling at epigenetically poised regulatory elements. PU.1 shows evidence of a pioneer role in this process by marking poised enhancers but is not required for transcriptional activation.Item Open Access Integrated Chromatin Analyses Offer Insights Into Trans-factor Function In Cancer Cell Lines(2012) Tewari, AlokUnderstanding the mechanisms whereby the sequence of the human genome is interpreted into diverse cellular phenotypes is a critical endeavor in modern biology. A major determinant of cellular phenotype is the spatial and temporal pattern gene expression, which is regulated in part by epigenomic properties such as histone post-translational modifications, DNA methylation, chromatin accessibility and the 3-dimensional architecture of the genome within the nucleus. These properties regulate the dynamic assembly of transcription factors and their co-regulatory proteins upon chromatin. To properly understand the interplay between the epigenomic framework of a cell and transcription factors, integrated analysis of transcription factor-DNA binding, chromatin status, and transcription is required. This work integrates information about chromatin accessibility, as measured by DNaseI hypersensitivity, transcription factor binding, as measured by chromatin immunoprecipitation, and transcription, as measured by microarray or transcriptome sequencing, to further understand the functional role of two important transcription factors, the androgen receptor (AR) and CTCF, in cancer cell line models. Data gathered from a prostate cancer cell line model demonstrate that the AR does not exclusively bind accessible chromatin upon ligand-activation, and induces significant changes in chromatin accessibility upon binding. Regions of quantitative change in chromatin accessibility contain motifs corresponding to potential collaborators for AR function, and are also significantly associated with AR-regulated transcriptional changes. Furthermore, base pair resolution of the DNaseI cleavage profile revealed three distinct patterns of AR-DNA interaction, suggesting multiple modes of AR interacting with the genome. A novel role for the nuclear receptor REV-ERBα in AR-mediated transcription was explored within the same model system. Though preliminary, results thus far indicate that REV-ERBα is required for AR-induced increases in target gene transcription in a manner that is likely dependent on HDAC3. Genetic knockdown of REV-ERBα resulted in notable changes in chromatin accessibility around AR-target genes both before and after AR activation. The function of CTCF was interrogated using stable knockdown in a breast cancer cell line model. CTCF knockdown led to widespread changes in chromatin accessibility that were dependent on DNA sequence. Further analysis suggested that AP-1 and FOXA1 are involved in CTCF function. Together, the work presented in this dissertation offers novel insight into the behavior of two critical transcription factors in cancer cell lines, and describe a framework of analysis that can be extended and applied to any transcription factor within any desired cellular context.
Item Open Access Interactions of chromatin context, binding site sequence content, and sequence evolution in stress-induced p53 occupancy and transactivation.(PLoS Genet, 2015-01) Su, Dan; Wang, Xuting; Campbell, Michelle R; Song, Lingyun; Safi, Alexias; Crawford, Gregory E; Bell, Douglas ACellular stresses activate the tumor suppressor p53 protein leading to selective binding to DNA response elements (REs) and gene transactivation from a large pool of potential p53 REs (p53REs). To elucidate how p53RE sequences and local chromatin context interact to affect p53 binding and gene transactivation, we mapped genome-wide binding localizations of p53 and H3K4me3 in untreated and doxorubicin (DXR)-treated human lymphoblastoid cells. We examined the relationships among p53 occupancy, gene expression, H3K4me3, chromatin accessibility (DNase 1 hypersensitivity, DHS), ENCODE chromatin states, p53RE sequence, and evolutionary conservation. We observed that the inducible expression of p53-regulated genes was associated with the steady-state chromatin status of the cell. Most highly inducible p53-regulated genes were suppressed at baseline and marked by repressive histone modifications or displayed CTCF binding. Comparison of p53RE sequences residing in different chromatin contexts demonstrated that weaker p53REs resided in open promoters, while stronger p53REs were located within enhancers and repressed chromatin. p53 occupancy was strongly correlated with similarity of the target DNA sequences to the p53RE consensus, but surprisingly, inversely correlated with pre-existing nucleosome accessibility (DHS) and evolutionary conservation at the p53RE. Occupancy by p53 of REs that overlapped transposable element (TE) repeats was significantly higher (p<10-7) and correlated with stronger p53RE sequences (p<10-110) relative to nonTE-associated p53REs, particularly for MLT1H, LTR10B, and Mer61 TEs. However, binding at these elements was generally not associated with transactivation of adjacent genes. Occupied p53REs located in L2-like TEs were unique in displaying highly negative PhyloP scores (predicted fast-evolving) and being associated with altered H3K4me3 and DHS levels. These results underscore the systematic interaction between chromatin status and p53RE context in the induced transactivation response. This p53 regulated response appears to have been tuned via evolutionary processes that may have led to repression and/or utilization of p53REs originating from primate-specific transposon elements.Item Open Access Novel Methods to Identify Chromatin Accessibility Differences Across Primates(2019) Edsall, Lee ElizabethOne of the aims of evolutionary biology is to identify gene regulatory regions (and the resulting level of expression) that evolved between species. The conventional method of analysis for this is to perform pairwise comparisons on data generated for each species. Software programs for this approach are mature and work well when there are only two species of interest. These same programs can be used when there are three species of interest. However, the analysis becomes more cumbersome and the statistical significance (p-value) difficult to calculate. Performing pairwise comparisons when there are more than three species have significant limitations. One is the exponential increase in the number of tests performed, greatly reducing the sensitivity after false discovery rate correction. For n species, (n-1) tests are performed on each region. Another limitation is the lack of a principled way to identify and classify genes (or regulatory regions) containing changes in multiple species.
To address these limitations, we developed a novel method of jointly modelling the data from all of the species using a negative binomial generalized linear model. In addition to providing a principled way of identifying and classifying sites with multiple changes, our method is more sensitive largely due to a substantial decrease in the number of tests performed. Our method jointly models all of the data in a single test, regardless of the number of species. As a result, the correction for number of independent tests performed is (n-1) times larger for the multiple pairwise method than for the joint modelling approach.
We applied this joint modelling approach to DNase-seq data generated from skin fibroblast cells from five primate species; human, chimpanzee, gorilla, orangutan, and rhesus macaque. We identified 89,744 DNase I Hypersensitive sites (DHS sites) that were comparable across all species, of which 41% (36,666) were classified as differential in one or more species. 30% of the differential sites (11,095) are likely due to a single change in chromatin accessibility in one species. Changes that likely occurred on the internal human-chimpanzee branch or human-chimpanzee-gorilla branch account for 15% (5,385) of the differential sites. 16% (6,034) of the differential sites contain changes that happened on either the human-chimpanzee-gorilla-orangutan internal branch or the rhesus macaque species branch. 32% (11,698) of the differential sites are due to multiple changes in chromatin accessibility (e.g., independent changes on the human and orangutan species branches).
The accuracy of this new approach was demonstrated by a high degree of concordance with an earlier study from our laboratory that analyzed data from human, chimpanzee, and rhesus macaque. Additionally, we performed a conventional pairwise analysis of the DHS sites from the five species and classified only 33% as differential, indicating decreased sensitivity compared to the joint modelling approach. Together, these results indicate that this novel joint modelling approach provides an improved method for comparative analysis of DNase-seq data.
Although we developed this method for DNase-seq data, we expect that it can be applied to other count-based data types such as ChIP-seq, ATAC-seq, and RNA-seq. We also expect that it can be applied to other experimental designs such as time-series, multi-tissue comparisons, and multiple developmental stage comparisons. The R script for performing the joint modelling analysis and instructions for modifying the script for use by other investigators are available in a GitHub repository (http://github.com/ledsall/2019primate).
Item Open Access Quantitative genetics of CTCF binding reveal local sequence effects and different modes of X-chromosome association.(PLoS Genet, 2014-11) Ding, Zhihao; Ni, Yunyun; Timmer, Sander W; Lee, Bum-Kyu; Battenhouse, Anna; Louzada, Sandra; Yang, Fengtang; Dunham, Ian; Crawford, Gregory E; Lieb, Jason D; Durbin, Richard; Iyer, Vishwanath R; Birney, EwanAssociating genetic variation with quantitative measures of gene regulation offers a way to bridge the gap between genotype and complex phenotypes. In order to identify quantitative trait loci (QTLs) that influence the binding of a transcription factor in humans, we measured binding of the multifunctional transcription and chromatin factor CTCF in 51 HapMap cell lines. We identified thousands of QTLs in which genotype differences were associated with differences in CTCF binding strength, hundreds of them confirmed by directly observable allele-specific binding bias. The majority of QTLs were either within 1 kb of the CTCF binding motif, or in linkage disequilibrium with a variant within 1 kb of the motif. On the X chromosome we observed three classes of binding sites: a minority class bound only to the active copy of the X chromosome, the majority class bound to both the active and inactive X, and a small set of female-specific CTCF sites associated with two non-coding RNA genes. In sum, our data reveal extensive genetic effects on CTCF binding, both direct and indirect, and identify a diversity of patterns of CTCF binding on the X chromosome.Item Open Access Regulatory Elements and Gene Expression in Primates and Diverse Human Cell-types(2013) Sheffield, NathanAfter finishing a human genome reference sequence in 2002, the genomics community has
turned to the task of interpreting it. A primary focus is to identify and characterize not only
protein-coding genes, but all functional elements in the genome. The effort has identified
millions of regulatory elements across species and in hundreds of human cell-types. Nearly
all identified regulatory elements are found in non-coding DNA, hypothesizing a function
for previously unannotated sequence. The ability to identify regulatory DNA genome-wide
provides a new opportunity to understand gene regulation and to ask fundamental questions
in diverse areas of biology.
One such area is the aim to understand the molecular basis for phenotypic differences
between humans and other primates. These phenotypic differences are partially driven
by mutations in non-coding regulatory DNA that alter gene expression. This hypothesis
has been supported by differential gene expression analyses in general, but we have not
yet identified specific regulatory variants responsible for differences in transcription and
phenotype. I have worked to identify regulatory differences in the same cell-type isolated
from human, chimpanzee, and macaque. Most regulatory elements were conserved among
all three species, as expected based on their central role in regulating transcription. How-
ever, several hundred regulatory elements were gained or lost on the lineages leading to
modern human and chimpanzee. Species-specific regulatory elements are enriched near
differentially expressed genes, are positively correlated with increased transcription, show
evidence of branch-specific positive selection, and overlap with active chromatin marks.
ivSpecies-specific sequence differences in transcription factor motifs found within this regu-
latory DNA are linked with species-specific changes in chromatin accessibility. Together,
these indicate that species-specific regulatory elements contribute to transcriptional and
phenotypic differences among primate species.
Another fundamental function of regulatory elements is to define different cell-types in
multicellular organisms. Regulatory elements recruit transcription factors that modulate
gene expression distinctly across cell-types. In a study of 112 human cell-types, I classified
regulatory elements into clusters based on regulatory signal tissue specificity. I then used
these to uncover distinct associations between regulatory elements and promoters, CpG-
islands, conserved elements, and transcription factor motif enrichment. Motif analysis
identified known and novel transcription factor binding motifs in cell-type-specific and
ubiquitous regulatory elements. I also developed a classifier that accurately predicts cell-
type lineage based on only 43 regulatory elements and evaluated the tissue of origin for
cancer cell-types. By correlating regulatory signal and gene expression, I predicted target
genes for more than 500k regulatory elements. Finally, I introduced a web resource to
enable researchers to explore these regulatory patterns and better understand how expression
is modulated within and across human cell-types.
Regulation of gene expression is fundamental to life. This dissertation uses identified
regulatory DNA to better understand regulatory systems. In the context of either evolution-
ary or developmental biology, understanding how differences in regulatory DNA contribute
to phenotype will be central to completely understanding human biology.
Item Open Access The PsychENCODE project.(Nat Neurosci, 2015-12) PsychENCODE Consortium; Akbarian, Schahram; Liu, Chunyu; Knowles, James A; Vaccarino, Flora M; Farnham, Peggy J; Crawford, Gregory E; Jaffe, Andrew E; Pinto, Dalila; Dracheva, Stella; Geschwind, Daniel H; Mill, Jonathan; Nairn, Angus C; Abyzov, Alexej; Pochareddy, Sirisha; Prabhakar, Shyam; Weissman, Sherman; Sullivan, Patrick F; State, Matthew W; Weng, Zhiping; Peters, Mette A; White, Kevin P; Gerstein, Mark B; Amiri, Anahita; Armoskus, Chris; Ashley-Koch, Allison E; Bae, Taejeong; Beckel-Mitchener, Andrea; Berman, Benjamin P; Coetzee, Gerhard A; Coppola, Gianfilippo; Francoeur, Nancy; Fromer, Menachem; Gao, Robert; Grennan, Kay; Herstein, Jennifer; Kavanagh, David H; Ivanov, Nikolay A; Jiang, Yan; Kitchen, Robert R; Kozlenkov, Alexey; Kundakovic, Marija; Li, Mingfeng; Li, Zhen; Liu, Shuang; Mangravite, Lara M; Mangravite, Lara M; Mattei, Eugenio; Markenscoff-Papadimitriou, Eirene; Navarro, Fábio CP; North, Nicole; Omberg, Larsson; Panchision, David; Parikshak, Neelroop; Poschmann, Jeremie; Price, Amanda J; Purcaro, Michael; Reddy, Timothy E; Roussos, Panos; Schreiner, Shannon; Scuderi, Soraya; Sebra, Robert; Shibata, Mikihito; Shieh, Annie W; Skarica, Mario; Sun, Wenjie; Swarup, Vivek; Thomas, Amber; Tsuji, Junko; van Bakel, Harm; Wang, Daifeng; Wang, Yongjun; Wang, Kai; Werling, Donna M; Willsey, A Jeremy; Witt, Heather; Won, Hyejung; Wong, Chloe CY; Wray, Gregory A; Wu, Emily Y; Xu, Xuming; Yao, Lijing; Senthil, Geetha; Lehner, Thomas; Sklar, Pamela; Sestan, NenadItem Open Access Tracking Transcription Factors on the Genome by their DNase-seq Footprints(2014) Yardimci, Galip GurkanAbstract
Transcription factors control numerous vital processes in the cell through their ability to control gene expression. Dysfunctional regulation by transcription factors lead to disorders and disease. Transcription factors regulate gene expression by binding to DNA sequences (motifs) on the genome and altering chromatin. DNase-seq footprinting is a well-established assay for identification of DNA sequences that bind to transcription factors. We developed computational techniques to analyze footprints and predict transcription factor binding. These transcription factor specific predictive models are able to correct for DNase sequence bias and characterize variation in DNA binding sequence. We found that DNase-seq footprints are able to identify cell-type or condition specific transcription factor activity and may offer information about the type of the interaction between DNA and transcription factor. Our DNase-seq footprint model is able to accurately discover high confidence transcription factor binding sites and discover alternative interactions between transcription factors and DNA. DNase-seq footprints can be used with ChIP-seq data to discover true binding sites and better understand transcription regulation.