Modeling Nuclease Digestion Data to Predict the Dynamics of Genome-wide Transcription Factor Occupancy
Identifying and deciphering the complex regulatory information embedded in the genome is critical to our understanding of biology and the etiology of complex diseases. The regulation of gene expression is governed largely by the occupancy of transcription factors (TFs) at various cognate binding sites. Characterizing TF binding is particularly challenging since TF occupancy is not just complex but also dynamic. Current genome-wide surveys of TF binding sites typically use chromatin immunoprecipitation (ChIP), which is limited to measuring one TF at a time, thus less scalable in profiling the dynamics of TF occupancy across cell types or conditions. This dissertation develops novel computational frameworks to model sequencing data from DNase and/or MNase nuclease digestion assays that allows multiple TFs to be surveyed in a single experiment, in both human and yeast. We predicted occupancy landscapes and constructed a cell-type specificity map for many TFs across human cell types, revealed novel relationships between TF occupancy and TF expression, and monitored the occupancy dynamics of various TFs in response to androgen and estrogen hormone simulations. The TF/cell type occupancy matrix generated from our model expands the total output of the ENCODE ChIP-seq efforts by a factor of nearly 200 times. These computational frameworks serve as an innovative and cost effective strategy which enables efficient profiling of TF occupancy landscapes across different cell types or dynamic conditions in a high-throughput manner.
Bayesian hierarchical model
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations