Deciphering genome-wide chromatin occupancy, dynamics, and their connections to gene regulation
Access is limited until:
The genomic DNA is bound by a myriad of proteins to form the chromatin inside the nucleus of the cell. The proteins can bind to the genome in different combinations leading to a combinatorial explosion in the number of possible chromatin configurations. The differences in the chromatin configurations for the same genome sequence give rise to distinct cell types. Likewise, cells of the same type also undergo changes in chromatin configurations under different environmental conditions. Key changes to the occupancy profile of the chromatin may dictate changes in gene regulation or vice versa. Therefore, it is important to decipher the chromatin occupancy profiles of the genome and understand how these configurations are related to the transcription of genes.
In this dissertation, we analyze chromatin using chromatin accessibility data sets, particularly MNase-seq and ATAC-seq, that describe the protein-bound and unbound regions of the genome. We first describe a state-space model that uses chromatin accessibility data to jointly infer the occupancy profile of hundreds of proteins binding to the genome. We apply our model to the yeast genome to study the occupancy profile of transcription factors and nucleosomes. We further extend our model to study chromatin dynamics of yeast cells subjected to cadmium stress. In doing so, we identify genomic regions exhibiting changes in the occupancy profile of transcription factors and nucleosomes. Upon comparing with available gene expression data we find that key changes in chromatin configuration occur around gene bodies that are differentially regulated during cadmium stress. Our analyses highlight how specific changes to the occupancy profiles relate to gene expression.
Building upon the interrelatedness of chromatin and transcription, we describe a regression-based approach that predicts transcription from chromatin accessibility data sets. We find that the chromatin accessibility in specific parts of the genome is highly correlated to gene expression. These genomic regions are potential regulatory regions that can lie far away from the gene body and interact with the genes due to the looping of the DNA. Our model identifies these regulatory regions in a gene-specific manner that helps us further understand the connections between chromatin and transcription.
Hidden Markov model
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info