Skip to main content
Duke University Libraries
View Item 
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
    • Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Population Genetic Annotation of the Human Genome: Identifying Pathogenic Mutations

    Thumbnail
    View / Download
    3.3 Mb
    Date
    2016
    Author
    Gussow, Ayal Baruch
    Advisor
    Goldstein, David B
    Repository Usage Stats
    146
    views
    42
    downloads
    Abstract

    In the past decade, there have been a series of breakthroughs in human genetics. The advent of next-generation sequencing (NGS) has made it possible, for the first time, to sequence an entire human genome inexpensively and efficiently. The affordability and ease of NGS has led to an explosion of data. Now, the largest hurdle in human genetics has shifted from technology-based limitations of sequencing to developing a framework interpreting the large amount of data that has been generated. Specifically, in medical genetics, the key challenge is recognizing which of a given patient's many mutations may be contributing to disease.

    The most successful methodologies for this problem rely on conservation. However, conservation cannot capture human-specific intolerance to variation. Other methodologies rely on biochemical annotations, which can indicate a genomic region's functional role, but do not directly assess its intolerance to variation in the context of disease. Therefore, despite these available methodologies, detecting causal variation still remains incredibly challenging.

    In my thesis, I describe three methodologies, based on population genetics and standing human variation, which can help identify the regions of the genome that are most likely to cause disease when mutated. The first, subRVIS, focuses on sub-regions within genes. The second, ncRVIS, focuses on the regulatory regions of genes. The third, Orion, tackles the daunting problem of interpreting and prioritizing variants across the entire genome.

    In Chapter 1, we will review some of the history that has brought us to this point and some of the methodologies currently in use for detecting disease-causing variants.

    In Chapter 2, we describe subRVIS, a methodology that divides the gene into sub-regions based on sequence homology to known protein domains, and then ranks those sub-regions based on their tolerance to functional variation. We show that this ranking is associated with the sub-region's likelihood of carrying a previously known pathogenic mutation. Further, we demonstrate that the biological division into domains adds significant information in comparison to dividing the gene into random regions matched in size. This methodology is useful in localizing where pathogenic mutations are most likely to fall within genes.

    Chapter 3 describes a methodology to rank genes based on the likelihood that mutations falling in their regulatory regions are pathogenic. We demonstrate that this ranking is associated with whether or not a gene is sensitive to changes in its dosage. This methodology is useful in assessing the pathogenicity of mutations occurring in known regulatory regions that have been associated with genes.

    In Chapter 4 we tackle one of the most intimidating and challenging problems in the field of medical genetics: detecting intolerance to variation across the entire human genome. Using a sliding window, we generate a score per base to highlight the regions of the genome that are intolerant to variation, with higher scores corresponding to more intolerant sequence. We term this approach Orion. We demonstrate that exons and DNase hypersensitive sites are enriched for higher Orion scores. This methodology will transform the way whole-genome sequence data are interpreted, by giving researchers the ability to assess the pathogenicity of variants in regions of the genome that are not yet fully understood.

    We have developed methodologies to tackle the key problem of detecting disease-causing variation in patients' sequence data. In an era overwhelmed by NGS data, these methodologies bring us closer to understanding the genetics of disease.

    Type
    Dissertation
    Department
    Computational Biology and Bioinformatics
    Subject
    Genetics
    Permalink
    https://hdl.handle.net/10161/13356
    Citation
    Gussow, Ayal Baruch (2016). Population Genetic Annotation of the Human Genome: Identifying Pathogenic Mutations. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/13356.
    Collections
    • Duke Dissertations
    More Info
    Show full item record
    Creative Commons License
    This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

    Rights for Collection: Duke Dissertations

     

     

    Search Scope

    Browse

    All of DukeSpaceCommunities & CollectionsAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit DateThis CollectionAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit Date

    My Account

    LoginRegister

    Statistics

    View Usage Statistics