Novel Methods to Identify Chromatin Accessibility Differences Across Primates

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



One of the aims of evolutionary biology is to identify gene regulatory regions (and the resulting level of expression) that evolved between species. The conventional method of analysis for this is to perform pairwise comparisons on data generated for each species. Software programs for this approach are mature and work well when there are only two species of interest. These same programs can be used when there are three species of interest. However, the analysis becomes more cumbersome and the statistical significance (p-value) difficult to calculate. Performing pairwise comparisons when there are more than three species have significant limitations. One is the exponential increase in the number of tests performed, greatly reducing the sensitivity after false discovery rate correction. For n species, (n-1) tests are performed on each region. Another limitation is the lack of a principled way to identify and classify genes (or regulatory regions) containing changes in multiple species.

To address these limitations, we developed a novel method of jointly modelling the data from all of the species using a negative binomial generalized linear model. In addition to providing a principled way of identifying and classifying sites with multiple changes, our method is more sensitive largely due to a substantial decrease in the number of tests performed. Our method jointly models all of the data in a single test, regardless of the number of species. As a result, the correction for number of independent tests performed is (n-1) times larger for the multiple pairwise method than for the joint modelling approach.

We applied this joint modelling approach to DNase-seq data generated from skin fibroblast cells from five primate species; human, chimpanzee, gorilla, orangutan, and rhesus macaque. We identified 89,744 DNase I Hypersensitive sites (DHS sites) that were comparable across all species, of which 41% (36,666) were classified as differential in one or more species. 30% of the differential sites (11,095) are likely due to a single change in chromatin accessibility in one species. Changes that likely occurred on the internal human-chimpanzee branch or human-chimpanzee-gorilla branch account for 15% (5,385) of the differential sites. 16% (6,034) of the differential sites contain changes that happened on either the human-chimpanzee-gorilla-orangutan internal branch or the rhesus macaque species branch. 32% (11,698) of the differential sites are due to multiple changes in chromatin accessibility (e.g., independent changes on the human and orangutan species branches).

The accuracy of this new approach was demonstrated by a high degree of concordance with an earlier study from our laboratory that analyzed data from human, chimpanzee, and rhesus macaque. Additionally, we performed a conventional pairwise analysis of the DHS sites from the five species and classified only 33% as differential, indicating decreased sensitivity compared to the joint modelling approach. Together, these results indicate that this novel joint modelling approach provides an improved method for comparative analysis of DNase-seq data.

Although we developed this method for DNase-seq data, we expect that it can be applied to other count-based data types such as ChIP-seq, ATAC-seq, and RNA-seq. We also expect that it can be applied to other experimental designs such as time-series, multi-tissue comparisons, and multiple developmental stage comparisons. The R script for performing the joint modelling analysis and instructions for modifying the script for use by other investigators are available in a GitHub repository (





Edsall, Lee Elizabeth (2019). Novel Methods to Identify Chromatin Accessibility Differences Across Primates. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.