Novel Methods to Identify Chromatin Accessibility Differences Across Primates

dc.contributor.advisor

Crawford, Gregory E

dc.contributor.advisor

Wray, Gregory A

dc.contributor.author

Edsall, Lee Elizabeth

dc.date.accessioned

2020-02-10T17:27:43Z

dc.date.available

2020-02-10T17:27:43Z

dc.date.issued

2019

dc.department

Genetics and Genomics

dc.description.abstract

One of the aims of evolutionary biology is to identify gene regulatory regions (and the resulting level of expression) that evolved between species. The conventional method of analysis for this is to perform pairwise comparisons on data generated for each species. Software programs for this approach are mature and work well when there are only two species of interest. These same programs can be used when there are three species of interest. However, the analysis becomes more cumbersome and the statistical significance (p-value) difficult to calculate. Performing pairwise comparisons when there are more than three species have significant limitations. One is the exponential increase in the number of tests performed, greatly reducing the sensitivity after false discovery rate correction. For n species, (n-1) tests are performed on each region. Another limitation is the lack of a principled way to identify and classify genes (or regulatory regions) containing changes in multiple species.

To address these limitations, we developed a novel method of jointly modelling the data from all of the species using a negative binomial generalized linear model. In addition to providing a principled way of identifying and classifying sites with multiple changes, our method is more sensitive largely due to a substantial decrease in the number of tests performed. Our method jointly models all of the data in a single test, regardless of the number of species. As a result, the correction for number of independent tests performed is (n-1) times larger for the multiple pairwise method than for the joint modelling approach.

We applied this joint modelling approach to DNase-seq data generated from skin fibroblast cells from five primate species; human, chimpanzee, gorilla, orangutan, and rhesus macaque. We identified 89,744 DNase I Hypersensitive sites (DHS sites) that were comparable across all species, of which 41% (36,666) were classified as differential in one or more species. 30% of the differential sites (11,095) are likely due to a single change in chromatin accessibility in one species. Changes that likely occurred on the internal human-chimpanzee branch or human-chimpanzee-gorilla branch account for 15% (5,385) of the differential sites. 16% (6,034) of the differential sites contain changes that happened on either the human-chimpanzee-gorilla-orangutan internal branch or the rhesus macaque species branch. 32% (11,698) of the differential sites are due to multiple changes in chromatin accessibility (e.g., independent changes on the human and orangutan species branches).

The accuracy of this new approach was demonstrated by a high degree of concordance with an earlier study from our laboratory that analyzed data from human, chimpanzee, and rhesus macaque. Additionally, we performed a conventional pairwise analysis of the DHS sites from the five species and classified only 33% as differential, indicating decreased sensitivity compared to the joint modelling approach. Together, these results indicate that this novel joint modelling approach provides an improved method for comparative analysis of DNase-seq data.

Although we developed this method for DNase-seq data, we expect that it can be applied to other count-based data types such as ChIP-seq, ATAC-seq, and RNA-seq. We also expect that it can be applied to other experimental designs such as time-series, multi-tissue comparisons, and multiple developmental stage comparisons. The R script for performing the joint modelling analysis and instructions for modifying the script for use by other investigators are available in a GitHub repository (http://github.com/ledsall/2019primate).

dc.identifier.uri

https://hdl.handle.net/10161/20097

dc.subject

Bioinformatics

dc.subject

Evolution & development

dc.subject

Genetics

dc.subject

chromatin accessibility

dc.subject

cis-regulatory evolution

dc.subject

comparative functional genomics

dc.subject

generalized linear model

dc.subject

Transcriptional regulation

dc.title

Novel Methods to Identify Chromatin Accessibility Differences Across Primates

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Edsall_duke_0066D_15396.pdf
Size:
16.16 MB
Format:
Adobe Portable Document Format

Collections