Browsing by Subject "Computational biology"
Results Per Page
Sort Options
Item Open Access A Framework for Dissecting and Applying Bacterial Antibiotic Responses(2017) Meredith, Hannah Ruth BrittanyAn essential property of microbial communities is the ability to survive a disturbance. This is readily observed in bacteria, which have developed the ability to survive every antibiotic treatment at an alarming rate, considering the timescale at which new antibiotics are developed. Thus, there is a critical need to use antibiotics more effectively, extend the shelf life of existing antibiotics and minimize their side effects. This requires understanding the mechanisms underlying bacterial drug responses. Past studies have focused on survival in the presence of antibiotics by individual cells, such as genetic mutants. Also important, however, is the fact that a population of bacterial cells can collectively survive antibiotic treatments lethal to individual cells. This tolerance can arise by diverse mechanisms, including resistance-conferring enzyme production, titration-mediated bistable growth inhibition, swarming and interpopulation interactions. These strategies can enable rapid population recovery after antibiotic treatment and provide a time window during which otherwise susceptible bacteria can acquire inheritable genetic resistance.
To further explore bacterial antibiotic responses, I focused on bacteria producing β-lactamase, an enzyme that has drastically limited the use of our most commonly prescribed antibiotics: β-lactams. Through the characterization of clinical isolates and a computational model, my Ph.D. thesis has three implications:
First, survival can be achieved through resistance, the ability to absorb effects of a disturbance without a significant change, or resilience, the ability to recover after being perturbed by a disturbance. Current practices for determining the antibiotic sensitivity of bacteria do not characterize a population as resistant and/or resilient, they only report whether the bacteria can survive the antibiotic exposure. As resistance and resilience often depend on different attributes, distinguishing between these two modes of survival could inform treatment strategies. These concepts have long been applied to the analysis of ecological systems, though their interpretations are often subject to debate. This framework readily lends itself to the dissection of the bacterial response to antibiotic treatment, where both terms can be unambiguously defined.
Second, the ability to tolerate the antibiotic treatment in the short term corresponds to resistance, which primarily depends on traits associated with individual cells. In contrast, the ability to recover after being perturbed by an antibiotic corresponds to resilience, which primarily depends on traits associated with the population.
And finally, understanding the temporal dynamics of an antibiotic response could guide the design of a dosing protocol to optimize treatment efficiency for any antibiotic-pathogen combination. Ultimately, optimized dosing protocols could allow reintroduction of a repertoire of first-line antibiotics with improved treatment outcomes and preserve last-resort antibiotics.
Item Open Access Augmentations vs. Restoration: A computational study of the effects of bacterial sodium channels on cardiac conduction.(2022) Needs, Daniel AllenCardiac arrhythmias, including ventricular tachycardia, ventricular fibrillation, and atrial fibrillation, are associated with ectopic triggers such as those resulting from afterdepolarizations and structural changes within the cardiac changes. While ectopic triggers can be dealt with via radio frequency ablation, structural causes of arrhythmia, such as microscale source-load mismatches, do not have available treatments. Augmentation of cardiomyocytes with exogenous sodium channels such as Nav1.4 or prokaryotic voltage-gated sodium channels, or BacNavs, have shown promise for potentially alleviating these arrhythmias. However, due to size constraints, only the BacNavs are available for the highest efficiency viral vectors for stable transduction. Limitations in the ability to test these channels in adult mammalian cardiac tissue, particularly tissue with source-load mismatches, have led to a lack of understanding about BacNav’s therapeutic value. This dissertation aims to build models of engineered BacNavs and compare their impact in simulated diseased and healthy cardiac tissue with increases of the endogenous Nav1.5 current to probe mechanisms for therapy.Patch clamp data was analyzed to derive steady-state values and kinetics for the activation and inactivation gating of the BacNavs using techniques dating back to Hodgkin and Huxley’s squid axon model. Models using a cubic activation function and only a single slow inactivation channel were best able to replicate the data, including action potential traces and restitution curves for both action potential duration and conduction velocity. The single slow inactivation channel matches what has been observed in crystallography studies of other BacNav channels. Including the derived BacNav model into membrane models for guinea pig and human ventricular myocytes revealed general trends of action potential duration reduction, action potential amplitude increase, and increases in conduction velocity and upstroke velocity. The action potential duration and amplitude trends were more significant for BacNav than Nav1.5, but the endogenous channel was superior for conduction velocity increase. These effects existed despite different responses in relative and absolute current densities between the two membrane models. Despite evidence that late sodium current can lead to afterdepolarizations, BacNav did not increase susceptibility to them in vulnerable midmyocardial cells except at extremely high current densities. Finally, reductions in action potential duration removed alternans present in the restitution curves for single cells. To study how BacNav affected arrhythmias, BacNav was incorporated into one-dimensional cables and two-dimensional tissues with source-load mismatches present, premature stimuli that could induce unidirectional block or channelopathies such as mutations leading to Brugada syndrome. BacNavs outperformed the endogenous channel in source-load mismatches due to increased action potential amplitude and slower inactivation kinetics. These conclusions were stable to spatial heterogeneity in the treatment. It was also able to rescue Brugada syndrome in a dose-dependent manner and narrow the vulnerable window to unidirectional block for one-dimensional cables. In two dimensions, Nav1.5 had a smaller window to spiral wave induction but experienced wave breaks and multiple wavelets, whereas rotors with BacNav-treated cells were stable. These findings help generate hypotheses to be tested experimentally and further refine the model. Further studies may uncover engineering principles for designing optimal sodium channels for specific pathologies.
Item Open Access Computational Molecular Engineering Nucleic Acid Binding Proteins and Enzymes(2010) Reza, FaisalInteractions between nucleic acid substrates and the proteins and enzymes that bind and catalyze them are ubiquitous and essential for reading, writing, replicating, repairing, and regulating the genomic code by the proteomic machinery. In this dissertation, computational molecular engineering furthered the elucidation of spatial-temporal interactions of natural nucleic acid binding proteins and enzymes and the creation of synthetic counterparts with structure-function interactions at predictive proficiency. We examined spatial-temporal interactions to study how natural proteins can process signals and substrates. The signals, propagated by spatial interactions between genes and proteins, can encode and decode information in the temporal domain. Natural proteins evolved through facilitating signaling, limiting crosstalk, and overcoming noise locally and globally. Findings indicate that fidelity and speed of frequency signal transmission in cellular noise was coordinated by a critical frequency, beyond which interactions may degrade or fail. The substrates, bound to their corresponding proteins, present structural information that is precisely recognized and acted upon in the spatial domain. Natural proteins evolved by coordinating substrate features with their own. Findings highlight the importance of accurate structural modeling. We explored structure-function interactions to study how synthetic proteins can complex with substrates. These complexes, composed of nucleic acid containing substrates and amino acid containing enzymes, can recognize and catalyze information in the spatial and temporal domains. Natural proteins evolved by balancing stability, solubility, substrate affinity, specificity, and catalytic activity. Accurate computational modeling of mutants with desirable properties for nucleic acids while maintaining such balances extended molecular redesign approaches. Findings demonstrate that binding and catalyzing proteins redesigned by single-conformation and multiple-conformation approaches maintained this balance to function, often as well as or better than those found in nature. We enabled access to computational molecular engineering of these interactions through open-source practices. We examined the applications and issues of engineering nucleic acid binding proteins and enzymes for nanotechnology, therapeutics, and in the ethical, legal, and social dimensions. Findings suggest that these access and applications can make engineering biology more widely adopted, easier, more effective, and safer.
Item Open Access Constructing Mathematical Models of Gene Regulatory Networks for the Yeast Cell Cycle and Other Periodic Processes(2014) Deckard, AnastasiaWe work on constructing mathematical models of gene regulatory networks for periodic processes, such as the cell cycle in budding yeast, using biological data sets and applying or developing analysis methods in the areas of mathematics, statistics, and computer science. We identify genes with periodic expression and then the interactions between periodic genes, which defines the structure of the network. This network is then translated into a mathematical model, using Ordinary Differential Equations (ODEs), to describe these entities and their interactions. The models currently describe gene regulatory interactions, but we are expanding to capture other events, such as phosphorylation and ubiquitination. To model the behavior, we must then find appropriate parameters for the mathematical model that allow its dynamics to approximate the biological data.
This pipeline for model construction is not focused on a specific algorithm or data set for each step, but instead on leveraging several sources of data and analysis from several algorithms. For example, we are incorporating data from multiple time series experiments, genome-wide binding experiments, computationally predicted binding, and regulation inference to identify potential regulatory interactions.
These approaches are designed to be applicable to various periodic processes in different species. While we have worked most extensively on models for the cell cycle in Saccharomyces cerevisiae, we have also begun working with data sets for the metabolic cycle in S. cerevisiae, and the circadian rhythm in Mus musculus.
Item Open Access Divergence, Mutation, Function, Selection: The Evolution of the Human Genome(2023) Mangan, Riley JosephSearches for the genetic underpinnings of uniquely human traits have focused on human-specific divergence in conserved genomic regions, which reflects adaptive modifications of existing functional elements. However, the study of conserved regions excludes novel functional elements that descended from previously neutral regions. In this work, I integrate comparative genomic analyses with human population variation data to reveal that rapid divergence rate is associated with positive selection in human evolutionary history. Encouraged by this finding, I identified 1581 Human Ancestor Quickly Evolved Regions (HAQERs), which represent the fastest-evolved regions of the human genome. HAQERs rapidly diverged in an episodic burst of directional positive selection prior to the human-Neanderthal split before transitioning to constraint within hominins. HAQERs are enriched for bivalent chromatin states, particularly in gastrointestinal and neurodevelopmental tissues, and genetic variants linked to neurodevelopmental disease. I led a collaborative effort to develop scSTARR-seq as a multiplex single-cell in vivo enhancer assay to discover that rapid sequence divergence in HAQERs generated hominin-unique enhancers in the developing cerebral cortex. I propose that a lack of pleiotropic constraints and elevated mutation rates poised HAQERs for rapid adaptation and subsequent susceptibility to disease.
Item Open Access Exploring the structurial diversity and engineering potential of thermophilic periplasmic binding proteins(2007-05-02T17:37:41Z) Cuneo, Matthew JosephThe periplasmic binding protein (PBP) superfamily is found throughout the genosphere of both prokaryotic and eukaryotic organisms. PBPs function as receptors in bacterial solute transport and chemotaxis systems; however the same fold is also used in transcriptional regulators, enzymes, and eukaryotic neurotransmitter receptors. This versatility has been exploited for structure-based computational protein design experiments where PBPs have been engineered to bind novel ligands and serve as biosensors for the detection of small-molecule ligands relevant to biomedical or defense-related interests. In order to further understand functional adaptation from a structural biology perspective, and to provide a set of robust starting points for engineering novel biosensors by structure-based design, I have characterized the ligand-binding properties and solved the structure of nine PBPs from various thermophilic bacteria. Analysis of these structures reveals a variety of mechanisms by which diverse function can be encoded in a common fold. It is observed that re-modeling of secondary structure elements (such as insertions, deletions, and loop movements), and re-decoration of amino acid side-chains are common diversification mechanisms in PBPs. Furthermore, the relationship between hinge-bending motion and ligand binding is critical to understanding the function of natural or engineered adaptations in PBPs. Three of these proteins were solved in both the presence and absence of ligand which allowed for the first time the observation and analysis of ligand-induced structural rearrangements in thermophilic PBPs. This work revealed that the magnitude and transduction of local and global ligand-induced motions are diverse throughout the PBP superfamily. Through the analysis of the open-to-closed transition, and the identification of natural structural adaptations in thermophilic members of the PBP superfamily, I reveal strategies which can be applied to computational protein design to significantly improve current strategies.Item Open Access Genome-wide Analysis of Chromatin Structure across Diverse Human Cell Types(2013) Winter, Deborah R.Chromatin structure plays an important role in gene regulation, especially in differentiating the diverse cell types in humans. In this dissertation, we analyze the nucleosome positioning and open chromatin profiles genome-wide and investigate the relationship with transcription initiation, the activity of regulatory elements, and expression levels. We mainly focus on the results of DNase-seq experiments, but also employ annotations from MNase-seq, FAIRE-seq, ChIP-seq, CAGE, and RNA microarrays. Our methods are based on computational approaches including managing large data sets, statistical analysis, and machine learning. We find that different transcription initiation patterns lead to distinct chromatin structures, suggesting diverse regulatory strategies. Moreover, we present a tool for comparing genome-wide annotation tracks and evaluate DNase-seq against a unique assay for detecting open chromatin. We also demonstrate how DNase-seq can be used to successfully predict rotationally stable nucleosomes that are conserved across cell types. We conclude that DNase-seq can be used to study genome-wide chromatin structure in an effort to better understand how it regulates gene expression.
Item Open Access Modeling Biological Systems from Heterogeneous Data(2008-04-24) Bernard, Allister P.The past decades have seen rapid development of numerous high-throughput technologies to observe biomolecular phenomena. High-throughput biological data are inherently heterogeneous, providing information at the various levels at which organisms integrate inputs to arrive at an observable phenotype. Approaches are needed to not only analyze heterogeneous biological data, but also model the complex experimental observation procedures. We first present an algorithm for learning dynamic cell cycle transcriptional regulatory networks from gene expression and transcription factor binding data. We learn regulatory networks using dynamic Bayesian network inference algorithms that combine evidence from gene expression data through the likelihood and evidence from binding data through an informative structure prior. We next demonstrate how analysis of cell cycle measurements like gene expression data are obstructed by sychrony loss in synchronized cell populations. Due to synchrony loss, population-level cell cycle measurements are convolutions of the true measurements that would have been observed when monitoring individual cells. We introduce a fully parametric, probabilistic model, CLOCCS, capable of characterizing multiple sources of asynchrony in synchronized cell populations. Using CLOCCS, we formulate a constrained convex optimization deconvolution algorithm that recovers single cell estimates from observed population-level measurements. Our algorithm offers a solution for monitoring individual cells rather than a population of cells that lose synchrony over time. Using our deconvolution algorithm, we provide a global high resolution view of cell cycle gene expression in budding yeast, right from an initial cell progressing through its cell cycle, to across the newly created mother and daughter cell. Proteins, and not gene expression, are responsible for all cellular functions, and we need to understand how proteins and protein complexes operate. We introduce PROCTOR, a statistical approach capable of learning the hidden interaction topology of protein complexes from direct protein-protein interaction data and indirect co-complexed protein interaction data. We provide a global view of the budding yeast interactome depicting how proteins interact with each other via their interfaces to form macromolecular complexes. We conclude by demonstrating how our algorithms, utilizing information from heterogeneous biological data, can provide a dynamic view of regulatory control in the budding yeast cell cycle.Item Open Access Modeling Multi-factor Binding of the Genome(2010) Wasson, Todd StevenHundreds of different factors adorn the eukaryotic genome, binding to it in large number. These DNA binding factors (DBFs) include nucleosomes, transcription factors (TFs), and other proteins and protein complexes, such as the origin recognition complex (ORC). DBFs compete with one another for binding along the genome, yet many current models of genome binding do not consider different types of DBFs together simultaneously. Additionally, binding is a stochastic process that results in a continuum of binding probabilities at any position along the genome, but many current models tend to consider positions as being either binding sites or not.
Here, we present a model that allows a multitude of DBFs, each at different concentrations, to compete with one another for binding sites along the genome. The result is an 'occupancy profile', a probabilistic description of the DNA occupancy of each factor at each position. We implement our model efficiently as the software package COMPETE. We demonstrate genome-wide and at specific loci how modeling nucleosome binding alters TF binding, and vice versa, and illustrate how factor concentration influences binding occupancy. Binding cooperativity between nearby TFs arises implicitly via mutual competition with nucleosomes. Our method applies not only to TFs, but also recapitulates known occupancy profiles of a well-studied replication origin with and without ORC binding.
We then develop a statistical framework for tuning our model concentrations to further improve its predictions. Importantly, this tuning optimizes with respect to actual biological data. We take steps to ensure that our tuned parameters are biologically plausible.
Finally, we discuss novel extensions and applications of our model, suggesting next steps in its development and deployment.
Item Open Access Pattern Formation in Engineered Bacteria: from Understanding to Applications(2017) Cao, YangxiaoluPatterns are ubiquitous in living organisms. However, the mechanisms driving self-organized pattern formations are not well understood. Due to the complexity of natural systems, many confounding factors complicate quantitative experiments and data interpretation, often making it difficult to draw definitive conclusions. Therefore, a limited number of experimental systems could enable precise perturbation and quantification of pattern formation. In comparison, the synthetic system serves as well-defined model systems to elucidate ‘‘design principles’’ of biological networks. In the past sixteen years, engineering pattern formation is a major endeavor in synthetic biology. However, there are only two studies about the generation of programmed self-organized pattern formation in growing cells based on coordinated dynamics in a population.
Intrigued by the challenge, my colleagues and I programmed E. coli with a synthetic gene circuit to generate self-organized pattern formation. Two implications of this engineered pattern-forming system were illustrated in my Ph.D. thesis.
First, the synthetic system provides a well-defined context to probe principles underlying the scaling property of self-organized pattern formation. Our mechanism underscores the importance of temporal control in generating scale-invariant patterns. The fundamental premise of this approach is that the principles defined in such engineered systems can be generally applicable to natural examples.
Second, the synthetic system serves as a foundation to generate structured materials with well-defined physical properties. Diverse natural biological systems can form structured materials with well-defined physical and chemical properties spontaneously. However, these natural processes are not readily programmable. By taking the synthetic biology approach, we demonstrate here the programmable, three-dimensional (3D) material fabrication using pattern-forming bacteria growing on top of permeable membranes as the structural scaffold. We equip the bacteria with an engineered protein that enables the assembly of gold nanoparticles into a hybrid organic-inorganic dome structure. The resulting hybrid structure functions as a pressure sensor that responds to touch. We show that the response dynamics are determined by the geometry of the structure, which is programmable by the membrane properties and the extent of circuit activation. Taking advantage of this property, we demonstrate signal sensing and processing using one or multiple bacterially assembled structures.
Item Open Access Protein and Drug Design Algorithms Using Improved Biophysical Modeling(2016) Hallen, Mark AndrewThis thesis focuses on the development of algorithms that will allow protein design calculations to incorporate more realistic modeling assumptions. Protein design algorithms search large sequence spaces for protein sequences that are biologically and medically useful. Better modeling could improve the chance of success in designs and expand the range of problems to which these algorithms are applied. I have developed algorithms to improve modeling of backbone flexibility (DEEPer) and of more extensive continuous flexibility in general (EPIC and LUTE). I’ve also developed algorithms to perform multistate designs, which account for effects like specificity, with provable guarantees of accuracy (COMETS), and to accommodate a wider range of energy functions in design (EPIC and LUTE).
Item Open Access Regulation of Global Transcription Dynamics During Cell Division and Root Development(2009) Orlando, David AnthonyThe successful completion of many critical biological processes depends on the proper execution of complex spatial and temporal gene expression programs. With the advent of high-throughput microarray technology, it is now possible to measure the dynamics of these expression programs on a genome-wide level. In this thesis we present work focused on utilizing this technology, in combination with novel computational techniques, to examine the role of transcriptional regulatory mechanisms in controlling the complex gene expression programs underlying two fundamental biological processes---the cell cycle and the development and differentiation of an organ.
We generate a dataset describing the genomic expression program which occurs during the cell division cycle of Saccharomyces cerevisiae. By concurrently measuring the dynamics in both wild-type and mutant cells that do not express either S-phase or mitotic cyclins we quantify the relative contributions of cyclin-CDK complexes and transcriptional regulatory networks in the regulation the cell cell expression program. We show that CDKs are not the sole regulators of periodic transcription as contrary to previously accepted models; and we hypothesize an oscillating transcriptional regulatory network which could work independent of, or in tandem with, the CDK oscillator to control the cell cell expression program.
To understand the acquisition of cellular identity, we generate a nearly complete gene expression map of the Arabidopsis Thaliana root at the resolution of individual cell-types and developmental stages. An analysis of this data reveals a representative set of dominant expression patterns which are used to begin defining the spatiotemporal transcriptional programs that control development within the root.
Additionally, we develop computational tools that improve the interpretability and power of these data. We present CLOCCS, a model for the dynamics of population synchrony loss in time-series experiments. We demonstrate the utility of CLOCCS in integrating disparate datasets and present a CLOCCS based deconvolution of the cell-cycle expression data. A deconvolution method is also developed for the Arabidopsis dataset, increasing its resolution to cell-type/section subregion specificity. Finally, a method for identifying biological processes occurring on multiple timescales is presented and applied to both datasets.
It is through the combination of these new genome-wide expression studies and computational tools that we begin to elucidate the transcriptional regulatory mechanisms controlling fundamental biological processes.
Item Open Access The Effect of Structural Microheterogeneity on the Initiation and Propagation of Ectopic Activity in Cardiac Tissue(2010) Hubbard, Marjorie LetitiaCardiac arrhythmias triggered by both reentrant and focal sources are closely correlated with regions of tissue characterized by significant structural heterogeneity. Experimental and modeling studies of electrical activity in the heart have shown that local microscopic heterogeneities which average out at the macroscale in healthy tissue play a much more important role in diseased and aging cardiac tissue which have low levels of coupling and abnormal or reduced membrane excitability. However, it is still largely unknown how various combinations of microheterogeneity in the intracellular and interstitial spaces affect wavefront propagation in these critical regimes.
This thesis uses biophysically realistic 1-D and 2-D computer models to investigate how heterogeneity in the interstitial and intracellular spaces influence both the initiation of ectopic beats and the escape of multiple ectopic beats from a poorly coupled region of tissue into surrounding well-coupled tissue. An approximate discrete monodomain model that incorporates local heterogeneity in both the interstitial and intracellular spaces was developed to represent the tissue domain.
The results showed that increasing the effective interstitial resistivity in poorly coupled fibers alters the distribution of electrical load at the microscale and causes propagation to become more like that observed in continuous fibers. In poorly coupled domains, this nearly continuous state is modulated by cell length and is characterized by decreased gap junction delay, sustained conduction velocity, increased sodium current, reduced maximum upstroke velocity, and increased safety factor. In inhomogeneous fibers with adjacent well-coupled and poorly coupled regions, locally increasing the effective interstitial resistivity in the poorly coupled region reduces the size of the focal source needed to generate an ectopic beat, reduces dispersion of repolarization, and delays the onset of conduction block that is caused by source-load mismatch at the boundary between well-coupled and poorly-coupled regions. In 2-D tissue models, local increases in effective interstitial resistivity as well as microstructural variations in cell arrangement at the boundary between poorly coupled and well-coupled regions of tissue modulate the distribution of maximum sodium current which facilitates the unidirectional escape of focal beats. Variations in the distribution of sodium current as a function of cell length and width lead to directional differences in the response to increased effective interstitial resistivity. Propagation in critical regimes such as the ectopic substrate is very sensitive to source-load interactions and local increases in maximum sodium current caused by microheterogeneity in both intracellular and interstitial structure.
Item Open Access Uncovering the Transcription Factor Network Underlying Mammalian Sex Determination(2014) Natarajan, AnirudhUnderstanding transcriptional regulation in development and disease is one of the central questions in modern biology. The current working model is that Transcription Factors (TFs) combinatorially bind to specific regions of the genome and drive the expression of groups of genes in a cell-type specific fashion. In organisms with large genomes, particularly mammals, TFs bind to enhancer regions that are often several kilobases away from the genes they regulate, which makes identifying the regulators of gene expression difficult. In order to overcome these obstacles and uncover transcriptional regulatory networks, we used an approach combining expression profiling and genome-wide identification of enhancers followed by motif analysis. Further, we applied these approaches to uncover the TFs important in mammalian sex determination.
Using expression data from a panel of 19 human cell lines we identified genes showing patterns of cell-type specific up-regulation, down-regulation and constitutive expression. We then utilized matched DNase-seq data to assign DNase Hypersensitivity Sites (DHSs) to each gene based on proximity. These DHSs were scanned for matches to motifs and compiled to generate scores reflecting the presence of TF binding sites (TFBSs) in each gene's putative regulatory regions. We used a sparse logistic regression classifier to classify differentially regulated groups of genes. Comparing our approach to proximal promoter regions, we discovered that using sequence features in regions of open chromatin provided significant performance improvement. Crucially, we discovered both known and novel regulators of gene expression in different cell types. For some of these TFs, we found cell-type specific footprints indicating direct binding to their cognate motifs.
The mammalian gonad is an excellent system to study cell fate determination processes and the dynamic regulation orchestrated by TFs in development. At embryonic day (E) 10.5, the bipotential gonad initiates either testis development in XY embryos, or ovarian development in XX embryos. Genetic studies over the last 3 decades have revealed about 30 genes important in this process, but there are still significant gaps in our understanding. Specifically, we do not know the network of TFs and their specific combinations that cause the rapid changes in gene expression observed during gonadal fate commitment. Further, more than half the cases of human sex reversal are as yet unexplained.
To apply the methods we developed to identify regulators of gene expression to the gonad, we took two approaches. First, we carried out a careful dissection of the transcriptional dynamics during gonad differentiation in the critical window between E11.0 and E12.0. We profiled the transcriptome at 6 equally spaced time points and developed a Hidden Markov Model to reveal the cascades of transcription that drive the differentiation of the gonad. Further, we discovered that while the ovary maintains its transcriptional state at this early stage, concurrent up- and down-regulation of hundreds of genes are orchestrated by the testis pathway. Further, we compared two different strains of mice with differential susceptibility to XY male-to-female sex reversal. This analysis revealed that in the C57BL/6J strain, the male pathway is delayed by ~5 hours, likely explaining the increased susceptibility to sex reversal in this strain. Finally, we validated the function of Lmo4, a transcriptional co-factor up-regulated in XY gonads at E11.6 in both strains. RNAi mediated knockdown of Lmo4 in primary gonadal cells led to the down-regulation of male pathway genes including key regulators such as Sox9 and Fgf9.
To find the enhancers in the XY gonad, we conducted DNase-seq in E13.5 XY supporting cells. In addition, we conducted ChIP-seq for H3K27ac, a mark correlated with active enhancer activity. Further, we conducted motif analysis to reveal novel regulators of sex determination. Our work is an important step towards combining expression and chromatin profiling data to assemble transcriptional networks and is applicable to several systems.