Predicting Genome-wide DNA Methylation in Humans
DNA methylation is one of the most studied and important epigenetic modifications in cells, playing a role in DNA transcription, splicing, and imprinting. Recently, advanced genome-wide DNA methylation profiling technologies have been developed, making it possible to conduct methylome-wide association studies. One of the problems with large scale DNA methylation studies is that the current technologies are either targeting only a limited number of CpG sites in the genome or whole genome sequencing is expensive and time consuming for most laboratories. Computational prediction of CpG site-specific methylation levels is the cost-saving and time-saving alternative.
In this work, we found striking patterns of DNA methylation across the genome. We show that correlation among CpG sites decays rapidly within several hundreds base pairs in contrast to the LD structure of genotypes which holds for up to several KB. Using genomic features including, neighbor CpG site methylation and genomic distance, genomic context such as CpG island regions, and genomic regulatory elements, we built random forest classifiers to predict CpG site methylation levels. Our approach achieves 92% prediction accuracy at single CpG sites in different genome-wide methylation datasets. We achieves the highest accuracy as 98% for prediction within CpG island regions. What's more, our method identifies genomic features that interact with DNA methylation, which improves our understanding of mechanisms involved in DNA methylation modification and regulation.

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Masters Theses
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info