Integrative Modeling of Genetic and Transciptomic Data for the Identification of Allele-Specific Expression
dc.contributor.advisor | Allen, Andrew S | |
dc.contributor.advisor | Majoros, William H | |
dc.contributor.author | Zou, Xue | |
dc.date.accessioned | 2024-06-06T13:44:57Z | |
dc.date.issued | 2024 | |
dc.department | Computational Biology and Bioinformatics | |
dc.description.abstract | The challenge of diagnosing rare genetic diseases persists despite advances in high-throughput sequencing. The limitation stems from an exome-centric diagnostic focus that often overlooks the influence of non-coding variants on gene expression. This research addresses this shortfall by leveraging allele-specific expression (ASE) analysis to detect cis-regulatory disruptions in gene expression, which could be pivotal for the diagnosis of non-exomic rare diseases.A novel computational framework, Bayesian Estimation of Allele Specific Transcript Integration across Exons (BEASTIE), was developed to refine ASE estimation. BEASTIE incorporates multiple heterozygous loci within a gene and rectifies phasing errors inherent in ASE detection. Comparative analyses reveal BEASTIE's enhanced accuracy over traditional methods, particularly in scenarios characterized by elevated heterozygosity and phasing errors. An advanced iteration, iBEASTIE, further incorporates error rates informed by genetic and genomic features, optimizing ASE estimations. In collaboration, quickBEAST—a C++ implementation of the BEASTIE model—was engineered, employing a subgrid algorithm to expedite the computation of ASE effect sizes. This tool proves essential for genome-wide analyses, evidenced by its application to 1000 Genome Project data, which aimed to map the ASE landscape and unearth novel imprinted genes. The practicality of these methods was tested in a case study of Glycogen Storage Disease (GSD), involving six probands. The integrated diagnostic pipeline—encompassing ASE, isoform, and differential expression analyses—identified a regulatory variant implicated in the disease phenotype. This finding was substantiated through CRISPR assays, verifying the computational predictions. | |
dc.identifier.uri | ||
dc.rights.uri | ||
dc.subject | Bioinformatics | |
dc.subject | Genetics | |
dc.subject | Biostatistics | |
dc.title | Integrative Modeling of Genetic and Transciptomic Data for the Identification of Allele-Specific Expression | |
dc.type | Dissertation | |
duke.embargo.months | 12 | |
duke.embargo.release | 2025-06-06T13:44:57Z |