Browsing by Author "Allen, Andrew S"
- Results Per Page
- Sort Options
Item Open Access A genome-wide association study of variants associated with acquisition of Staphylococcus aureus bacteremia in a healthcare setting.(BMC Infect Dis, 2014-02-13) Nelson, Charlotte L; Pelak, Kimberly; Podgoreanu, Mihai V; Ahn, Sun Hee; Scott, William K; Allen, Andrew S; Cowell, Lindsay G; Rude, Thomas H; Zhang, Yurong; Tong, Amy; Ruffin, Felicia; Sharma-Kuinkel, Batu K; Fowler, Vance GBACKGROUND: Humans vary in their susceptibility to acquiring Staphylococcus aureus infection, and research suggests that there is a genetic basis for this variability. Several recent genome-wide association studies (GWAS) have identified variants that may affect susceptibility to infectious diseases, demonstrating the potential value of GWAS in this arena. METHODS: We conducted a GWAS to identify common variants associated with acquisition of S. aureus bacteremia (SAB) resulting from healthcare contact. We performed a logistic regression analysis to compare patients with healthcare contact who developed SAB (361 cases) to patients with healthcare contact in the same hospital who did not develop SAB (699 controls), testing 542,410 SNPs and adjusting for age (by decade), sex, and 6 significant principal components from our EIGENSTRAT analysis. Additionally, we evaluated the joint effect of the host and pathogen genomes in association with severity of SAB infection via logistic regression, including an interaction of host SNP with bacterial genotype, and adjusting for age (by decade), sex, the 6 significant principal components, and dialysis status. Bonferroni corrections were applied in both analyses to control for multiple comparisons. RESULTS: Ours is the first study that has attempted to evaluate the entire human genome for variants potentially involved in the acquisition or severity of SAB. Although this study identified no common variant of large effect size to have genome-wide significance for association with either the risk of acquiring SAB or severity of SAB, the variant (rs2043436) most significantly associated with severity of infection is located in a biologically plausible candidate gene (CDON, a member of the immunoglobulin family) and may warrant further study. CONCLUSIONS: The genetic architecture underlying SAB is likely to be complex. Future investigations using larger samples, narrowed phenotypes, and advances in both genotyping and analytical methodologies will be important tools for identifying causative variants for this common and serious cause of healthcare-associated infection.Item Open Access Efficient analysis of complex, multimodal genomic data(2016) Acharya, Chaitanya RamanujOur primary goal is to better understand complex diseases using statistically disciplined approaches. As multi-modal data is streaming out of consortium projects like Genotype-Tissue Expression (GTEx) project, which aims at collecting samples from various tissue sites in order to understand tissue-specific gene regulation, new approaches are needed that can efficiently model groups of data with minimal loss of power. For example, GTEx project delivers RNA-Seq, Microarray gene expression and genotype data (SNP Arrays) from a vast number of tissues in a given individual subject. In order to analyze this type of multi-level (hierarchical) multi-modal data, we proposed a series of efficient-score based tests or score tests and leveraged groups of tissues or gene isoforms in order map genomic biomarkers. We model group-specific variability as a random effect within a mixed effects model framework. In one instance, we proposed a score-test based approach to map expression quantitative trait loci (eQTL) across multiple-tissues. In order to do that we jointly model all the tissues and make use of all the information available to maximize the power of eQTL mapping and investigate an overall shift in the gene expression combined with tissue-specific effects due to genetic variants. In the second instance, we showed the flexibility of our model framework by expanding it to include tissue-specific epigenetic data (DNA methylation) and map eQTL by leveraging both tissues and methylation. Finally, we also showed that our methods are applicable on different data type such as whole transcriptome expression data, which is designed to analyze genomic events such alternative gene splicing. In order to accomplish this, we proposed two different models that exploit gene expression data of all available gene-isoforms within a gene to map biomarkers of interest (either genes or gene-sets) in paired early-stage breast tumor samples before and after treatment with external beam radiation. Our efficient score-based approaches have very distinct advantages. They have a computational edge over existing methods because they do not need parameter estimation under the alternative hypothesis. As a result, model parameters only have to be estimated once per genome, significantly decreasing computation time. Also, the efficient score is the locally most powerful test and is guaranteed a theoretical optimality over all other approaches in a neighborhood of the null hypothesis. This theoretical performance is born out in extensive simulation studies which show that our approaches consistently outperform existing methods both in statistical power and computational speed. We applied our methods to publicly available datasets. It is important to note that all of our methods also accommodate the analysis of next-generation sequencing data.
Item Open Access Gene set-based Signal-Detection Analyses with Goodness-of-Fit Statistics and Their Application in Complex Diseases(2019) Zhang, MengqiRare diseases are difficult to diagnose and uncertain to treat. The identification of specific genes associated with particular rare diseases and phenotypes can provide insight into the mechanism of certain rare disease subtypes and suggest therapeutic targets to improve patient outcomes. However, single gene-based methods for detecting rare disease-associated variants are often underpowered and can be hard to interpret. Therefore, this dissertation explores alternative approaches based on gene set-based methods. These analyses can be solved with a goodness-of-fit test that assesses whether the distribution of observed statistics of a given set of genes/variants significantly differs from the expected distribution.
This dissertation explores a flexible gene set-based signal-detection framework based on the goodness-of-fit tests. A user-friendly and efficient R program was developed for this research. In addition, this dissertation proposes a new gene-set analyses method that can leverage prior information to inform the detection of whether any of the genes within a biologically informed gene-set is associated with disease phenotypes on a special goodness-of-fit a test called higher criticism. Further, this dissertation investigates the asymptotic distribution of our higher criticism statistic based on the theoretically weighted p-values. Collectively, these methods are innovative because they based on gene set and incorporate the prior information, which enhances the power of associations between rare variants and complex diseases. These results improve the ability to identify and optimally treat genetic disease subtypes.
Item Embargo Integrative Modeling of Genetic and Transciptomic Data for the Identification of Allele-Specific Expression(2024) Zou, XueThe challenge of diagnosing rare genetic diseases persists despite advances in high-throughput sequencing. The limitation stems from an exome-centric diagnostic focus that often overlooks the influence of non-coding variants on gene expression. This research addresses this shortfall by leveraging allele-specific expression (ASE) analysis to detect cis-regulatory disruptions in gene expression, which could be pivotal for the diagnosis of non-exomic rare diseases.A novel computational framework, Bayesian Estimation of Allele Specific Transcript Integration across Exons (BEASTIE), was developed to refine ASE estimation. BEASTIE incorporates multiple heterozygous loci within a gene and rectifies phasing errors inherent in ASE detection. Comparative analyses reveal BEASTIE's enhanced accuracy over traditional methods, particularly in scenarios characterized by elevated heterozygosity and phasing errors. An advanced iteration, iBEASTIE, further incorporates error rates informed by genetic and genomic features, optimizing ASE estimations. In collaboration, quickBEAST—a C++ implementation of the BEASTIE model—was engineered, employing a subgrid algorithm to expedite the computation of ASE effect sizes. This tool proves essential for genome-wide analyses, evidenced by its application to 1000 Genome Project data, which aimed to map the ASE landscape and unearth novel imprinted genes. The practicality of these methods was tested in a case study of Glycogen Storage Disease (GSD), involving six probands. The integrated diagnostic pipeline—encompassing ASE, isoform, and differential expression analyses—identified a regulatory variant implicated in the disease phenotype. This finding was substantiated through CRISPR assays, verifying the computational predictions.
Item Open Access Somatic uniparental disomy of Chromosome 16p in hemimegalencephaly.(Cold Spring Harbor molecular case studies, 2017-09) Griffin, Nicole G; Cronin, Kenneth D; Walley, Nicole M; Hulette, Christine M; Grant, Gerald A; Mikati, Mohamad A; LaBreche, Heather G; Rehder, Catherine W; Allen, Andrew S; Crino, Peter B; Heinzen, Erin LHemimegalencephaly (HME) is a heterogeneous cortical malformation characterized by enlargement of one cerebral hemisphere. Somatic variants in mammalian target of rapamycin (mTOR) regulatory genes have been implicated in some HME cases; however, ∼70% have no identified genetic etiology. Here, we screened two HME patients to identify disease-causing somatic variants. DNA from leukocytes, buccal swabs, and surgically resected brain tissue from two HME patients were screened for somatic variants using genome-wide genotyping arrays or sequencing of the protein-coding regions of the genome. Functional studies were performed to evaluate the molecular consequences of candidate disease-causing variants. Both HME patients evaluated were found to have likely disease-causing variants in DNA extracted from brain tissue but not in buccal swab or leukocyte DNA, consistent with a somatic mutational mechanism. In the first case, a previously identified disease-causing somatic single nucleotide in MTOR was identified. In the second case, we detected an overrepresentation of the alleles inherited from the mother on Chromosome 16 in brain tissue DNA only, indicative of somatic uniparental disomy (UPD) of the p-arm of Chromosome 16. Using methylation analyses, an imprinted locus on 16p spanning ZNF597 was identified, which results in increased expression of ZNF597 mRNA and protein in the brain tissue of the second case. Enhanced mTOR signaling was observed in tissue specimens from both patients. We speculate that overexpression of maternally expressed ZNF597 led to aberrant hemispheric development in the patient with somatic UPD of Chromosome 16p possibly through modulation of mTOR signaling.Item Open Access Statistical Methods of Disease-Gene Mapping in Trio-based Next Generation Sequencing(2015) Jiang, YuDisease-gene mapping plays an important role in improving the development of medical science. As with the development of Next Generation Sequencing technologies, mapping disease genes through rare genetic variants become economic and reliable. De novo mutations as the most extreme form of rare variants played an important role in the occurrence of complex diseases. To detect de novo mutations, case-parent trios are used to perform the sequencing studies. This case-parent design provides the chance to detect disease-causal genes from both de novo mutations and inherited mutations. We proposed three novel methods to map disease genes according to de novo mutation load (fitDNM), allele transmission rate (rvTDT) and compound heterozygous and recessive genes (coreTDT) separately to maximize the statistical power of analysis in case-parent trios. These three methods are then applied to analyze neurodevelopmental/neuropsychiatric disorders. The analysis with fitDNM provides strong statistical evidence supporting two potentially causal genes: SUV420H1 in autism spectral disorder and TRIO in a combined analysis of the four neurodevelopmental/neuropsychiatric disorders investigated. The application of rvTDT on epileptic encephalopathy (EE) trios find that dominant (or additive) inherited rare variants are unlikely to play a substantial role within EE genes previously identified through de novo mutation studies.
Item Open Access Testing for A Loss of Homozygosity and Compound Heterozygosity Using Human Standing Variation(2017) Du, GuangjianHomozygosity indicates the state of possessing two identical alleles of a particular gene, one inherited from each parent. Homozygous genes are these where both copies share identical alleles at one specific site, this can result from identical mutations on one site or several sites on both copies. In contrast, heterozygous genes are those with different alleles at a given site, compound heterozygous genotype occurs when there is more than one mutation on either copy of the gene but at different sites.
Homozygosity plays a key role in the risk of recessive Mendelian diseases. Because for recessive diseases, only when dysfunctional mutations are expressed on both copies of an individual's genome, these variants could cause genetic diseases such as cystic fibrosis and phenylketonuria disease.
Since genes have two copies inherited from parents, when a gene has two recessive alleles for the same gene, but with those two alleles being different from each other, both copies have dysfunctional mutations at different locations, these genotypes are called compound heterozygosity. Both homozygosity and compound heterozygosity could end up in completely knocked out of the function for a selected gene. Therefore, homozygosity and heterozygosity are important risk factors for recessive genetic disorders. It is essential to understand which genes have recessive effects on phenotypes.
Some work has been done on ranking human genes based on their tolerance to functional genetic variants. These studies give a sense of how unusual functional mutation is in the context of a particular gene. Other work like genetic constraints test for a depletion of rare singleton qualifying variation over expectation, based on mutation rate, using large population database. All these work has been useful for identifying genes with strong dominant effects. However, there is currently no method for identifying recessive intolerant gene.
Here, we propose a method for identifying recessive intolerant genes by looking for a deficit of homozygosity and compound heterozygosity using human standing variations. We first develop a novel computationally efficient and robust statistical model to evaluate the viability of individuals according to the number of copies of a selected gene harboring rare dysfunctional variants, using human standing variation data. Then, we build a general framework to assess whether there's evidence supporting a shift towards a deficit of homozygosity or compound heterozygosity from the distribution of expected genotypes. Third, we apply the statistical Score tests to evaluate the deficit probability of a given gene. Finally, we use a simulation model to further confirm the accuracy of our framework.
Item Open Access Topics and Applications of Weighting Methods in Case-Control and Observational Studies(2019) Li, FanWeighting methods have been widely used in statistics and related applications. For example, the inverse probability weighting is a standard approach to correct for survey non-response. The case-control design, frequently seen in epidemiologic or genetic studies, can be regarded as a special type of survey design; analogous inverse probability weighting approaches have been explored when the interest is the association between exposures and the disease (primary analysis) as well as when the interest is the association among exposures (secondary analysis). Meanwhile, in observational comparative effectiveness research, inverse probability weighting has been suggested as a valid approach to correct for confounding bias. This dissertation develops and extends weighting methods for case-control and observational studies.
The first part of this dissertation extends the inverse probability weighting approach for secondary analysis of case-control data. We revisit an inverse probability weighting estimator to offer new insights and extensions. Specifically, we construct its more general form by generalized least squares (GLS). Such a construction allows us to connect the GLS estimator with the generalized method of moments and motivates a new specification test designed to assess the adequacy of the inverse probability weights. The specification test statistic measures the weighted discrepancy between the case and control subsample estimators, and asymptotically follows a Chi-squared distribution under correct model specification. We illustrate the GLS estimator and specification test using a case-control sample of peripheral arterial disease, and use simulations to shed light on the operating characteristics of the specification test. The second part develops a robust difference-in-differences (DID) estimator for estimating causal effect with observational before-after data. Within the DID framework, two common estimation strategies are outcome regression and propensity score weighting. Motivated by a real application in traffic safety research, we propose a new double-robust DID estimator that hybridizes outcome regression and propensity score weighting. We show that the proposed estimator possesses the desirable large-sample robustness property, namely the consistency only requires either one of the outcome model or the propensity score model to be correctly specified. We illustrate the new estimator to study the causal effect of rumble strips in reducing vehicle crashes, and conduct a simulation study to examine its finite-sample performance. The third part discusses a unified framework, the balancing weights, for estimating causal effects in observational studies with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common pre-specified target population. Within this framework, we further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weights corresponds to the target population with the most overlap in covariates between treatments, similar to the population in equipoise in clinical trials. We show that the generalized overlap weights minimize the total asymptotic variance of the nonparametric estimators for the pairwise contrasts within the class of balancing weights. We apply the new weighting method to study the racial disparities in medical expenditure and further examine its operating characteristics by simulations.