Browsing by Subject "Statistical inference"
- Results Per Page
- Sort Options
Item Open Access Computational Inference of Genome-Wide Protein-DNA Interactions Using High-Throughput Genomic Data(2015) Zhong, JianlingTranscriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.
We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.
We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.
Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.
This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.
Item Open Access Drivers of Dengue Within-Host Dynamics and Virulence Evolution(2016) BenShachar, RotemDengue is an important vector-borne virus that infects on the order of 400 million individuals per year. Infection with one of the virus's four serotypes (denoted DENV-1 to 4) may be silent, result in symptomatic dengue 'breakbone' fever, or develop into the more severe dengue hemorrhagic fever/dengue shock syndrome (DHF/DSS). Extensive research has therefore focused on identifying factors that influence dengue infection outcomes. It has been well-documented through epidemiological studies that DHF is most likely to result from a secondary heterologous infection, and that individuals experiencing a DENV-2 or DENV-3 infection typically are more likely to present with more severe dengue disease than those individuals experiencing a DENV-1 or DENV-4 infection. However, a mechanistic understanding of how these risk factors affect disease outcomes, and further, how the virus's ability to evolve these mechanisms will affect disease severity patterns over time, is lacking. In the second chapter of my dissertation, I formulate mechanistic mathematical models of primary and secondary dengue infections that describe how the dengue virus interacts with the immune response and the results of this interaction on the risk of developing severe dengue disease. I show that only the innate immune response is needed to reproduce characteristic features of a primary infection whereas the adaptive immune response is needed to reproduce characteristic features of a secondary dengue infection. I then add to these models a quantitative measure of disease severity that assumes immunopathology, and analyze the effectiveness of virological indicators of disease severity. In the third chapter of my dissertation, I then statistically fit these mathematical models to viral load data of dengue patients to understand the mechanisms that drive variation in viral load. I specifically consider the roles that immune status, clinical disease manifestation, and serotype may play in explaining viral load variation observed across the patients. With this analysis, I show that there is statistical support for the theory of antibody dependent enhancement in the development of severe disease in secondary dengue infections and that there is statistical support for serotype-specific differences in viral infectivity rates, with infectivity rates of DENV-2 and DENV-3 exceeding those of DENV-1. In the fourth chapter of my dissertation, I integrate these within-host models with a vector-borne epidemiological model to understand the potential for virulence evolution in dengue. Critically, I show that dengue is expected to evolve towards intermediate virulence, and that the optimal virulence of the virus depends strongly on the number of serotypes that co-circulate. Together, these dissertation chapters show that dengue viral load dynamics provide insight into the within-host mechanisms driving differences in dengue disease patterns and that these mechanisms have important implications for dengue virulence evolution.
Item Open Access Phylodynamic Methods for Infectious Disease Epidemiology(2014) Rasmussen, David AlanIn this dissertation, I present a general statistical framework for phylodynamic inference that can be used to estimate epidemiological parameters and reconstruct disease dynamics from pathogen genealogies. This framework can be used to fit a broad class of epidemiological models, including nonlinear stochastic models, to genealogies by relating the population dynamics of a pathogen to its genealogy using coalescent theory. By combining Markov chain Monte Carlo and particle filtering methods, efficient Bayesian inference of all parameters and unobserved latent variables is possible even when analytical likelihood expressions are not available under the epidemiological model. Through extensive simulations, I show that this method can be used to reliably estimate epidemiological parameters of interest as well as reconstruct past disease dynamics from genealogies, or jointly from genealogies and other common sources of epidemiological data like time series. I then extend this basic framework to include different types of host population structure, including models with spatial structure, multiple-hosts or vectors, and different stages of infection. The later is demonstrated by using a multistage model of HIV infection to estimate stage-specific transmission rates and incidence from HIV sequence data collected in Detroit, Michigan. Finally, to demonstrate how the approach can be used more generally, I consider the case of dengue virus in southern Vietnam. I show how earlier phylodynamic inference methods fail to reliably reconstruct the dynamics of dengue observed in hospitalization data, but by deriving coalescent models that take into consideration ecological complexities like seasonality, vector dynamics and spatial structure, accurate dynamics can be reconstructed from genealogies. In sum, by extending phylodynamics to include more ecologically realistic and mechanistic models, this framework can provide more accurate estimates and give deeper insight into the processes driving infectious disease dynamics.