Predicting Genome-wide DNA Methylation in Humans
Date
2014
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
DNA methylation is one of the most studied and important epigenetic modifications in cells, playing a role in DNA transcription, splicing, and imprinting. Recently, advanced genome-wide DNA methylation profiling technologies have been developed, making it possible to conduct methylome-wide association studies. One of the problems with large scale DNA methylation studies is that the current technologies are either targeting only a limited number of CpG sites in the genome or whole genome sequencing is expensive and time consuming for most laboratories. Computational prediction of CpG site-specific methylation levels is the cost-saving and time-saving alternative.
In this work, we found striking patterns of DNA methylation across the genome. We show that correlation among CpG sites decays rapidly within several hundreds base pairs in contrast to the LD structure of genotypes which holds for up to several KB. Using genomic features including, neighbor CpG site methylation and genomic distance, genomic context such as CpG island regions, and genomic regulatory elements, we built random forest classifiers to predict CpG site methylation levels. Our approach achieves 92% prediction accuracy at single CpG sites in different genome-wide methylation datasets. We achieves the highest accuracy as 98% for prediction within CpG island regions. What's more, our method identifies genomic features that interact with DNA methylation, which improves our understanding of mechanisms involved in DNA methylation modification and regulation.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Zhang, Weiwei (2014). Predicting Genome-wide DNA Methylation in Humans. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/9123.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.