Browsing by Author "Goldstein, David B"
- Results Per Page
- Sort Options
Item Open Access Common genetic variation and the control of HIV-1 in humans.(PLoS Genet, 2009-12) Fellay, Jacques; Ge, Dongliang; Shianna, Kevin V; Colombo, Sara; Ledergerber, Bruno; Cirulli, Elizabeth T; Urban, Thomas J; Zhang, Kunlin; Gumbs, Curtis E; Smith, Jason P; Castagna, Antonella; Cozzi-Lepri, Alessandro; De Luca, Andrea; Easterbrook, Philippa; Günthard, Huldrych F; Mallal, Simon; Mussini, Cristina; Dalmau, Judith; Martinez-Picado, Javier; Miro, José M; Obel, Niels; Wolinsky, Steven M; Martinson, Jeremy J; Detels, Roger; Margolick, Joseph B; Jacobson, Lisa P; Descombes, Patrick; Antonarakis, Stylianos E; Beckmann, Jacques S; O'Brien, Stephen J; Letvin, Norman L; McMichael, Andrew J; Haynes, Barton F; Carrington, Mary; Feng, Sheng; Telenti, Amalio; Goldstein, David B; NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI)To extend the understanding of host genetic determinants of HIV-1 control, we performed a genome-wide association study in a cohort of 2,554 infected Caucasian subjects. The study was powered to detect common genetic variants explaining down to 1.3% of the variability in viral load at set point. We provide overwhelming confirmation of three associations previously reported in a genome-wide study and show further independent effects of both common and rare variants in the Major Histocompatibility Complex region (MHC). We also examined the polymorphisms reported in previous candidate gene studies and fail to support a role for any variant outside of the MHC or the chemokine receptor cluster on chromosome 3. In addition, we evaluated functional variants, copy-number polymorphisms, epistatic interactions, and biological pathways. This study thus represents a comprehensive assessment of common human genetic variation in HIV-1 control in Caucasians.Item Open Access Contributions of Mamu-A*01 status and TRIM5 allele expression, but not CCL3L copy number variation, to the control of SIVmac251 replication in Indian-origin rhesus monkeys.(PLoS genetics, 2010) Lim, So-Yon; Chan, Tiffany; Gelman, Rebecca S; Whitney, James B; O'Brien, Kara L; Barouch, Dan H; Goldstein, David B; Haynes, Barton F; Letvin, Norman LCCL3 is a ligand for the HIV-1 co-receptor CCR5. There have recently been conflicting reports in the literature concerning whether CCL3-like gene (CCL3L) copy number variation (CNV) is associated with resistance to HIV-1 acquisition and with both viral load and disease progression following infection with HIV-1. An association has also been reported between CCL3L CNV and clinical sequelae of the simian immunodeficiency virus (SIV) infection in vivo in rhesus monkeys. The present study was initiated to explore the possibility of an association of CCL3L CNV with the control of virus replication and AIDS progression in a carefully defined cohort of SIVmac251-infected, Indian-origin rhesus monkeys. Although we demonstrated extensive variation in copy number of CCL3L in this cohort of monkeys, CCL3L CNV was not significantly associated with either peak or set-point plasma SIV RNA levels in these monkeys when MHC class I allele Mamu-A*01 was included in the models or progression to AIDS in these monkeys. With 66 monkeys in the study, there was adequate power for these tests if the correlation of CCL3L and either peak or set-point plasma SIV RNA levels was 0.34 or 0.36, respectively. These findings call into question the premise that CCL3L CNV is important in HIV/SIV pathogenesis.Item Open Access Determinants of protection among HIV‐exposed seronegative persons: an overview.(J Infect Dis, 2010-11-01) Lederman, Michael M; Alter, Galit; Daskalakis, Demetre C; Rodriguez, Benigno; Sieg, Scott F; Hardy, Gareth; Cho, Michael; Anthony, Donald; Harding, Clifford; Weinberg, Aaron; Silverman, Robert H; Douek, Daniel C; Margolis, Leonid; Goldstein, David B; Carrington, Mary; Goedert, James JBoth clinical experience and a growing medical literature indicate that some persons who have been exposed to human immunodeficiency virus (HIV) infection remain uninfected. Although in some instances this may represent good fortune, cohorts of uninfected persons have been reported who are considered at high risk for infection. In these cohorts a variety of characteristics have been proposed as mediating protection, but to date only the 32–base pair deletion in the chemokine (C‐C motif) receptor 5 gene, which results in complete failure of cell surface expression of this coreceptor, has been associated with high‐level protection from HIV infection. With this in mind, there are probably many other factors that may individually or in combination provide some level of protection from acquisition of HIV infection. Because some of these factors are probably incompletely protective or inconsistently active, identifying them with confidence will be difficult. Nonetheless, clarifying the determinants of protection against HIV infection is a high priority that will require careful selection of high‐risk uninfected cohorts, who should undergo targeted studies of plausible mediators and broad screening for unexpected determinants of protection.Item Open Access Dominant Splice Site Mutations in PIK3R1 Cause Hyper IgM Syndrome, Lymphadenopathy and Short Stature.(J Clin Immunol, 2016-07) Petrovski, Slavé; Parrott, Roberta E; Roberts, Joseph L; Huang, Hongxiang; Yang, Jialong; Gorentla, Balachandra; Mousallem, Talal; Wang, Endi; Armstrong, Martin; McHale, Duncan; MacIver, Nancie J; Goldstein, David B; Zhong, Xiao-Ping; Buckley, Rebecca HThe purpose of this research was to use next generation sequencing to identify mutations in patients with primary immunodeficiency diseases whose pathogenic gene mutations had not been identified. Remarkably, four unrelated patients were found by next generation sequencing to have the same heterozygous mutation in an essential donor splice site of PIK3R1 (NM_181523.2:c.1425 + 1G > A) found in three prior reports. All four had the Hyper IgM syndrome, lymphadenopathy and short stature, and one also had SHORT syndrome. They were investigated with in vitro immune studies, RT-PCR, and immunoblotting studies of the mutation's effect on mTOR pathway signaling. All patients had very low percentages of memory B cells and class-switched memory B cells and reduced numbers of naïve CD4+ and CD8+ T cells. RT-PCR confirmed the presence of both an abnormal 273 base-pair (bp) size and a normal 399 bp size band in the patient and only the normal band was present in the parents. Following anti-CD40 stimulation, patient's EBV-B cells displayed higher levels of S6 phosphorylation (mTOR complex 1 dependent event), Akt phosphorylation at serine 473 (mTOR complex 2 dependent event), and Akt phosphorylation at threonine 308 (PI3K/PDK1 dependent event) than controls, suggesting elevated mTOR signaling downstream of CD40. These observations suggest that amino acids 435-474 in PIK3R1 are important for its stability and also its ability to restrain PI3K activity. Deletion of Exon 11 leads to constitutive activation of PI3K signaling. This is the first report of this mutation and immunologic abnormalities in SHORT syndrome.Item Open Access Functional Evaluation of Causal Mutations Identified in Human Genetic Studies(2016) Lu, Yi-FanHuman genetics has been experiencing a wave of genetic discoveries thanks to the development of several technologies, such as genome-wide association studies (GWAS), whole-exome sequencing, and whole genome sequencing. Despite the massive genetic discoveries of new variants associated with human diseases, several key challenges emerge following the genetic discovery. GWAS is known to be good at identifying the locus associated with the patient phenotype. However, the actually causal variants responsible for the phenotype are often elusive. Another challenge in human genetics is that even the causal mutations are already known, the underlying biological effect might remain largely ambiguous. Functional evaluation plays a key role to solve these key challenges in human genetics both to identify causal variants responsible for the phenotype, and to further develop the biological insights from the disease-causing mutations.
We adopted various methods to characterize the effects of variants identified in human genetic studies, including patient genetic and phenotypic data, RNA chemistry, molecular biology, virology, and multi-electrode array and primary neuronal culture systems. Chapter 1 is a broader introduction for the motivation and challenges for functional evaluation in human genetic studies, and the background of several genetics discoveries, such as hepatitis C treatment response, in which we performed functional characterization.
Chapter 2 focuses on the characterization of causal variants following the GWAS study for hepatitis C treatment response. We characterized a non-coding SNP (rs4803217) of IL28B (IFNL3) in high linkage disequilibrium (LD) with the discovery SNP identified in the GWAS. In this chapter, we used inter-disciplinary approaches to characterize rs4803217 on RNA structure, disease association, and protein translation.
Chapter 3 describes another avenue of functional characterization following GWAS focusing on the novel transcripts and proteins identified near the IL28B (IFNL3) locus. It has been recently speculated that this novel protein, which was named IFNL4, may affect the HCV treatment response and clearance. In this chapter, we used molecular biology, virology, and patient genetic and phenotypic data to further characterize and understand the biology of IFNL4. The efforts in chapter 2 and 3 provided new insights to the candidate causal variant(s) responsible for the GWAS for HCV treatment response, however, more evidence is still required to make claims for the exact causal roles of these variants for the GWAS association.
Chapter 4 aims to characterize a mutation already known to cause a disease (seizure) in a mouse model. We demonstrate the potential use of multi-electrode array (MEA) system for the functional characterization and drug testing on mutations found in neurological diseases, such as seizure. Functional characterization in neurological diseases is relatively challenging and available systematic tools are relatively limited. This chapter shows an exploratory research and example to establish a system for the broader use for functional characterization and translational opportunities for mutations found in neurological diseases.
Overall, this dissertation spans a range of challenges of functional evaluations in human genetics. It is expected that the functional characterization to understand human mutations will become more central in human genetics, because there are still many biological questions remaining to be answered after the explosion of human genetic discoveries. The recent advance in several technologies, including genome editing and pluripotent stem cells, is also expected to make new tools available for functional studies in human diseases.
Item Open Access Genetic and Environmental Contributions to Baseline Cognitive Ability and Cognitive Response to Topiramate(2010) Cirulli, Elizabeth TrilbyAlthough much research has focused on cognitive ability and the genetic and environmental factors that might influence it, this aspect of human nature is still far from being well understood. It has been well-established that certain factors such as age and education have significant impacts on performance on most cognitive tests, but the effects of variables such as cognitive pastimes and strategies used during testing have generally not been assessed. Additionally, no genetic variant has yet been unequivocally shown to influence the normal variation in cognitive ability of healthy individuals. Candidate gene studies of cognition have produced conflicting results that have not been replicable, and genome-wide association studies have not found common variants with large influences on this trait.
Here, we have recruited a large cohort of healthy volunteers (n=1,887) and administered a brief cognitive battery utilizing diverse, common, and well-known tests. In addition to providing standard demographic information, the subjects also filled out a questionnaire that was designed to assess novel factors such as whether they had seen the test before, in what cognitive pastimes they participated, and what strategies they had used during testing. Linear regression models were built to assess the effects of these variables on the test scores. I found that the addition of novel covariates to standard ones increased the percent of the variation in test score that was explained for all tests; for some tests, the increase was as high as 70%.
Next, I examined the effects of genetic variants on test scores. I first performed a genome-wide association study using the Illumina HumanHap 550 and 610 chips. These chips are designed to directly genotype or tag the vast majority of the common variants in the genome. Despite having 80% power to detect a common variant explaining at least 3-6% (depending on the test) of the variation in the trait, I did not find any genetic variants that were significantly associated after correction for multiple testing. This is in line with the general findings from GWA studies that single common variants have a limited impact on complex traits.
Because of the recent technological advances in next-generation sequencing and the apparently limited role of very common variants, many human geneticists are making a transition from genome-wide association study to whole-genome and whole-exome sequencing, which allow for the identification of rarer variants. Because these methods are currently costly, it is important to utilize study designs that have the best chance of finding causal variants in a small sample size. One such method is the extreme-trait design, where individuals from one or both ends of a trait distribution are sequenced and variants that are enriched in the group(s) are identified. Here, I have sequenced the exomes of 20 young individuals of European ethnicity: 10 that performed at the top of the distribution for the cognitive battery and 10 that performed at the bottom. I identified rare genetic variants that were enriched in one extreme group as compared to the other and performed follow-up genotyping of the best candidate variant that emerged from this analysis. Unfortunately, this variant was not found to be associated in a larger sample of individuals. This pilot study indicates that a larger sample size will be needed to identify variants enriched in cognition extremes.
Finally, I assessed the effect of topiramate, an antiepileptic drug that causes marked side effects in certain cognitive areas in certain individuals, on some of the healthy volunteers (n=158) by giving them a 100 mg dose and then administering the cognitive test two hours later. I compared their scores at this testing session to those at the previous session and calculated the overall level to which they were affected by topiramate. I found that the topiramate blood levels, which were highly dependent on weight and the time from dosing to testing, varied widely between individuals after this acute dose, and that this variation explained 35% of the variability in topiramate response. A genome-wide association study of the remaining variability in topiramate response did not identify a genome-wide significant association.
In sum, I studied the contributions of both environmental and genetic variables to cognitive ability and cognitive response to topiramate. I found that I could identify environmental variables explaining large proportions of the variation in these traits, but that I could not identify genetic variants that influenced the traits. My analysis of genetic variants was for the most part restricted to the very common ones found on genotyping chips, and this and other studies have generally found that single common genetic variants do not have large affects on complex traits. As we move forward into studies that involve the sequencing of whole exomes and genomes, genetic variants with large effects on these complex traits may finally be found.
Item Open Access Genome-Wide Analyses of HIV-1 Host Genetics(2012) Pelak, KimberlyHIV has presented some of the greatest biomedical challenges in recent decades, and an understanding of how the virus behaves when it is in the human body is critical to addressing many of these challenges. One avenue through which to do this is the study of host genetics, which investigates the human genetic variants that modify the interactions between the HIV-1 virus and the human body. In my graduate work, I performed several different investigations that have furthered our understanding of the human genetic variants that either modulate the response to HIV-1 infection or play a role in the acquisition of an HIV-1 infection. This work took place at a time of transition in human genetics, and spanned both the era of genome-wide association studies as well as the beginning of the sequencing and rare variant eras.
The earliest HIV-1 host genetics findings were made through candidate gene studies, which reflected the state of human genetics research in the 1990s and early 2000s. The draft sequence of the human genome was released in 2001, and HIV host genetics, as well as human genetics in general, has changed considerably since then. Chapter 1 describes the basics of HIV-1 biology and the HIV-1 epidemic, as well as some crucial findings in HIV-1 host genetics. This chapter also gives a brief recent history of human genetics and describes some of the current challenges in the field.
Chapters 2 and 3 describe the identification of human genetic variants that associate with viral load set point. Chapter 2 describes a copy number variable region (CNV) in the KIR region of the genome that associates with a change in set point, and Chapter 3 describes an allele of HLA-B (HLA-B*5703) that is the largest determinant of viral control in an African American population. Both chapters use data from genotyping chips as a starting point.
In the past several years, the cost to sequence a genome has plummeted, and it is now possible for a single group to sequence and align an entire human genome in just a few weeks. This "next-generation" sequencing has dramatically changed the field of human genetics, and Chapter 4 will discuss this new technology and provide an early analysis of the patterns of variation that are observed across multiple human genomes. Notably, this new technology allows for an unprecedented amount of variant discovery, including the possibility of identifying low frequency and rare variants.
Chapter 5 describes two different projects that make use of next-generation sequencing technology to investigate variants that influence HIV-1 disease acquisition and progression. Both projects are extreme phenotype whole-genome sequencing projects. For the first project, we have sequenced individuals who have hemophilia and were highly exposed to contaminated blood products but who remained uninfected. For the second project, we have sequenced African American individuals whose disease progressed very quickly or very slowly. I compare the variants in these individuals to the variants in control populations and describe follow-up genotyping results. I have not identified a causative variant in either of these studies, although a list of candidate variants is still being pursued. These analyses have shown that there is substantial heterogeneity in the genetic basis for both phenotypes.
Overall, my work has identified two common variants that are playing a role in modulating HIV-1 infection, as well as provided the first assessment of the patterns of variation across a set of unrelated human genomes. This thesis also describes some of the early attempts to apply the next-generation sequencing technique to HIV-1 host genetics. In the Conclusion, I discuss the future of HIV-1 host genetics research and the clinical applications of human genetics.
Item Open Access Genome-wide mRNA expression correlates of viral control in CD4+ T-cells from HIV-1-infected individuals.(PLoS Pathog, 2010-02-26) Rotger, Margalida; Dang, Kristen K; Fellay, Jacques; Heinzen, Erin L; Feng, Sheng; Descombes, Patrick; Shianna, Kevin V; Ge, Dongliang; Günthard, Huldrych F; Goldstein, David B; Telenti, Amalio; Swiss HIV Cohort Study; Center for HIV/AIDS Vaccine ImmunologyThere is great interindividual variability in HIV-1 viral setpoint after seroconversion, some of which is known to be due to genetic differences among infected individuals. Here, our focus is on determining, genome-wide, the contribution of variable gene expression to viral control, and to relate it to genomic DNA polymorphism. RNA was extracted from purified CD4+ T-cells from 137 HIV-1 seroconverters, 16 elite controllers, and 3 healthy blood donors. Expression levels of more than 48,000 mRNA transcripts were assessed by the Human-6 v3 Expression BeadChips (Illumina). Genome-wide SNP data was generated from genomic DNA using the HumanHap550 Genotyping BeadChip (Illumina). We observed two distinct profiles with 260 genes differentially expressed depending on HIV-1 viral load. There was significant upregulation of expression of interferon stimulated genes with increasing viral load, including genes of the intrinsic antiretroviral defense. Upon successful antiretroviral treatment, the transcriptome profile of previously viremic individuals reverted to a pattern comparable to that of elite controllers and of uninfected individuals. Genome-wide evaluation of cis-acting SNPs identified genetic variants modulating expression of 190 genes. Those were compared to the genes whose expression was found associated with viral load: expression of one interferon stimulated gene, OAS1, was found to be regulated by a SNP (rs3177979, p = 4.9E-12); however, we could not detect an independent association of the SNP with viral setpoint. Thus, this study represents an attempt to integrate genome-wide SNP signals with genome-wide expression profiles in the search for biological correlates of HIV-1 control. It underscores the paradox of the association between increasing levels of viral load and greater expression of antiviral defense pathways. It also shows that elite controllers do not have a fully distinctive mRNA expression pattern in CD4+ T cells. Overall, changes in global RNA expression reflect responses to viral replication rather than a mechanism that might explain viral control.Item Open Access Host determinants of HIV-1 control in African Americans.(J Infect Dis, 2010-04-15) Pelak, Kimberly; Goldstein, David B; Walley, Nicole M; Fellay, Jacques; Ge, Dongliang; Shianna, Kevin V; Gumbs, Curtis; Gao, Xiaojiang; Maia, Jessica M; Cronin, Kenneth D; Hussain, Shehnaz K; Carrington, Mary; Michael, Nelson L; Weintrob, Amy C; Infectious Disease Clinical Research Program HIV Working Group; National Institute of Allergy and Infectious Diseases Center for HIV/AIDS Vaccine Immunology (CHAVI)We performed a whole-genome association study of human immunodeficiency virus type 1 (HIV-1) set point among a cohort of African Americans (n = 515), and an intronic single-nucleotide polymorphism (SNP) in the HLA-B gene showed one of the strongest associations. We use a subset of patients to demonstrate that this SNP reflects the effect of the HLA-B*5703 allele, which shows a genome-wide statistically significant association with viral load set point (P = 5.6 x 10(-10)). These analyses therefore confirm a member of the HLA-B*57 group of alleles as the most important common variant that influences viral load variation in African Americans, which is consistent with what has been observed for individuals of European ancestry, among whom the most important common variant is HLA-B*5701.Item Open Access Host genetics and HIV-1: the final phase?(PLoS Pathog, 2010-10-14) Fellay, Jacques; Shianna, Kevin V; Telenti, Amalio; Goldstein, David BThis is a crucial transition time for human genetics in general, and for HIV host genetics in particular. After years of equivocal results from candidate gene analyses, several genome-wide association studies have been published that looked at plasma viral load or disease progression. Results from other studies that used various large-scale approaches (siRNA screens, transcriptome or proteome analysis, comparative genomics) have also shed new light on retroviral pathogenesis. However, most of the inter-individual variability in response to HIV-1 infection remains to be explained: genome resequencing and systems biology approaches are now required to progress toward a better understanding of the complex interactions between HIV-1 and its human host.Item Open Access Human Genomics of Complex Trait Severity(2017) Kleinstein, Sarah ElizabethGenetics account for a large, mostly unexplained proportion of human disease. Though the role of genetics in simple, Mendelian traits has long been established, it is more difficult to disambiguate the role of various human genetic factors in complex disease traits. However, as genetics technology and methodology has advanced, from genome-wide association studies (GWAS) to next-generation sequencing (NGS), our ability to detect the role of both rare and common human genetic variation in complex disease traits has greatly improved, allowing us to demonstrate robust genetic factors involved in a variety of disease from metabolic to viral. However, despite the outstanding progress in human genetics, many complex disease traits lack robustly associated genetic variants, the existing variation only accounts for a small proportion of the estimated heritability, or the trait lacks comprehensive genetic investigation all together.
In this thesis I conducted a common variant study using GWAS and a comprehensive NGS analysis - both standards in the field - to investigate the role of human genetics in the severity of complex disease traits ranging from viral disease to metabolic: herpes simplex virus type 2 (HSV-2) and non-alcoholic fatty liver disease (NAFLD). Chapter 1 provides a broad overview of current human genetics methodologies and the advantages and caveats to each technology for complex disease traits, as well as the background and current state of genetics research for the two complex traits investigated: HSV-2 and NAFLD.
Chapter 2 utilizes a GWAS to investigate the role of common human genetic variation in HSV-2 severity, which has previously only been investigated through a small handful of candidate gene studies. We were unable to replicate previous candidate gene associations, though we did detect several variants in or near biologically plausible genes (including ABCA1 and KIF1B) that approached, though did not reach, genome-wide statistical significance with HSV-2 severity as measured by the quantitative viral shedding rate. This is the first genome-wide investigation of human genetics in HSV-2.
Chapter 3 utilizes whole-exome sequencing at both the single-variant and gene levels to further elucidate the role of human genetics in gold standard liver biopsy confirmed NAFLD fibrosis extreme phenotypes: protective and progressor. We were able to replicate known associations with PNPLA3 and TM6SF2 and advanced fibrosis, despite the limited available sample size. We also observed enrichment of variation in distinct genes for progressor or protective NAFLD phenotypes, though these genes did not reach statistical significance. This is the first NGS study of NAFLD, and thus the first investigation of the role of rare variation in NAFLD.
Overall, this thesis applied genome-wide techniques to interrogate gaps in the genetics of complex trait severity, from viral to liver disease, using unique, well-phenotyped cohorts. Human genetics remains a complicated field that will require the continued use of well-phenotyped cohorts in larger numbers, as well as both complementary and confirmatory sequencing and bioinformatics methods to fully detangle. While the research in this thesis is primarily hypothesis generating, and potentially associated variants will have to be replicated and investigated on a functional level to be confirmed as causal, the exploration of genetic associations with complex disease traits can prove highly informative for both understanding the underlying biology of these traits and for identifying genes and pathways that may act as biomarkers or treatment targets. Thus, this thesis has acted as a primer to expand knowledge of the role of human genetics in two highly complex and varied traits, HSV-2 and NAFLD, paving the way for further studies, ultimately with the goal of improving human health.
Item Open Access Identification and Characterization of Pathogenic Mutations in Neurodevelopmental Disorders Discovered by Next-Generation Sequencing(2014) Ruzzo, Elizabeth KathrynNeurodevelopmental disorders develop over time and are characterized by a wide variety of mental, behavioral, and physical phenotypes. The categorization of neurodevelopmental disorders encompasses a broad range of conditions including intellectual disability, autism spectrum disorder, attention deficit hyperactivity disorder, cerebral palsy, schizophrenia, bipolar disorder, and epilepsy, among others. Diagnostic classifications of neurodevelopmental disorders are complicated by comorbidities among these neurodevelopmental disorders, unidentified causal genes, and growing evidence of shared genetic risk factors.
We sought to identify the genetic underpinnings of a variety of neurodevelopmental disorders, with a particular emphasis on the epilepsies, by employing next–generation sequencing to thoroughly interrogate genetic variation in the human genome/exome. First, we investigated four families presenting with a seemingly identical and previously undescribed neurodevelopmental disorder characterized by congenital microcephaly, intellectual disability, progressive cerebral atrophy, and intractable seizures. These families all exhibited an apparent autosomal recessive pattern of inheritance. Second, we investigated a heterogeneous cohort of ∼60 undiagnosed patients, the majority of whom suffered from severe neurodevelopmental disorders with a suspected genetic etiology. Third, we investigated 264 patients with epileptic encephalopathies — severe childhood epilepsy disorders — looking specifically at infantile spasms and Lennox–Gastaut syndrome. Finally, we investigated ∼40 large multiplex epilepsy families with complex phenotypic constellations and unclear modes of inheritance. The studied neurodevelopmental disorders exhibited a range of genetic complexity, from clear Mendelian disorders to common complex disorders, resulting in varying degrees of success in the identification of clearly causal genetic variants.
In the first project, we successfully identified the disease–causing gene. We show that recessive mutations in ASNS (encoding asparagine synthetase) are responsible for this previously undescribed neurodevelopmental disorder. We also characterized the causal mutations in vitro and studied Asns–deficient mice that mimicked aspects of the patient phenotype. This work describes ASNS deficiency as a novel neurodevelopmental disorder, identifies three distinct causal mutations in the ASNS gene, and indicates that asparagine synthesis is essential for the proper development and function of the brain.
In the second project, we exome sequenced 62 undiagnosed patients and their unaffected biological parents (trios). By analyzing all identified variants that were annotated as putatively functional and observed as a novel genotype in the probands (not observed in the unaffected parents or controls), we obtained a genetic diagnosis for 32% (20/62) of these patients. Additionally, we identify strong candidate variants in 31% (13/42) of the undiagnosed cases. We also present additional analysis methods for moving beyond traditional screens, e.g., considering only securely implicated genes, or subjecting qualifying variants from any gene to two unique analysis approaches. This work adds to the growing evidence for the utility of diagnostic exome sequencing, increases patient sizes for rare neurodevelopmental disorders (enabling more detailed analyses of the phenotypic spectrum), and proposes novel analysis approaches which will likely become beneficial as the number of sequenced undiagnosed patients grows.
In the third project, we again employ a trio–based exome sequencing design to investigate the role of de novo mutations in two classical forms of epileptic encephalopathy. We find a significant excess of de novo mutations in the ∼4,000 genes that are the most intolerant to functional genetic variation in the human population (P = 2.9 x 10–3, likelihood analysis). We provide clear statistical evidence for two novel genes associated with epileptic encephalopathy — GABRB3 and ALG13. Together with the 15 well–established epileptic encephalopathy genes, we statistically confirm the association of an additional ten putative epileptic encephalopathy genes. We show that only ∼12% of epileptic encephalopathy patients in our cohort are explained by de novo mutations in one of these 24 genes, highlighting the extreme locus heterogeneity of the epileptic encephalopathies.
Finally, we investigated multiplex epilepsy families to uncover novel epilepsy susceptibility factors. Candidate variants emerging from sequencing within discovery families were further assessed by cosegregation testing, variant association testing in a case–control cohort, and gene–based resequencing in a cohort of additional multiplex epilepsy families. Despite employing multiple approaches, we did not identify any clear genetic associations with epilepsy. This work has, however, identified a set of candidates that may include real risk factors for epilepsy; the most promising of these is the MYCBP2 gene. This work emphasizes the extremely high locus and allelic heterogeneity of the epilepsies and demonstrates that very large sample sizes are needed to uncover novel genetic risk factors.
Collectively, this body of work has securely implicated three novel neurodevelopmental disease genes that inform the underlying pathology of these disorders. Furthermore, in the final three studies, this work has highlighted additional candidate variants and genes that may ultimately be validated as disease–causing as sample sizes increase.
Item Open Access Looking beyond the exome: a phenotype-first approach to molecular diagnostic resolution in rare and undiagnosed diseases.(Genetics in medicine : official journal of the American College of Medical Genetics, 2018-04) Pena, Loren DM; Jiang, Yong-Hui; Schoch, Kelly; Spillmann, Rebecca C; Walley, Nicole; Stong, Nicholas; Rapisardo Horn, Sarah; Sullivan, Jennifer A; McConkie-Rosell, Allyn; Kansagra, Sujay; Smith, Edward C; El-Dairi, Mays; Bellet, Jane; Keels, Martha Ann; Jasien, Joan; Kranz, Peter G; Noel, Richard; Nagaraj, Shashi K; Lark, Robert K; Wechsler, Daniel SG; Del Gaudio, Daniela; Leung, Marco L; Hendon, Laura G; Parker, Collette C; Jones, Kelly L; Undiagnosed Diseases Network Members; Goldstein, David B; Shashi, VandanaPurposeTo describe examples of missed pathogenic variants on whole-exome sequencing (WES) and the importance of deep phenotyping for further diagnostic testing.MethodsGuided by phenotypic information, three children with negative WES underwent targeted single-gene testing.ResultsIndividual 1 had a clinical diagnosis consistent with infantile systemic hyalinosis, although WES and a next-generation sequencing (NGS)-based ANTXR2 test were negative. Sanger sequencing of ANTXR2 revealed a homozygous single base pair insertion, previously missed by the WES variant caller software. Individual 2 had neurodevelopmental regression and cerebellar atrophy, with no diagnosis on WES. New clinical findings prompted Sanger sequencing and copy number testing of PLA2G6. A novel homozygous deletion of the noncoding exon 1 (not included in the WES capture kit) was detected, with extension into the promoter, confirming the clinical suspicion of infantile neuroaxonal dystrophy. Individual 3 had progressive ataxia, spasticity, and magnetic resonance image changes of vanishing white matter leukoencephalopathy. An NGS leukodystrophy gene panel and WES showed a heterozygous pathogenic variant in EIF2B5; no deletions/duplications were detected. Sanger sequencing of EIF2B5 showed a frameshift indel, probably missed owing to failure of alignment.ConclusionThese cases illustrate potential pitfalls of WES/NGS testing and the importance of phenotype-guided molecular testing in yielding diagnoses.Item Open Access Microelectrode Array Modeling of Genetic Neurological Disorders in the Era of Next Generation Sequencing(2017) McSweeney, Keisha MelodiAdvances in next-generation sequencing (NGS) and the ability to sequence the entire genome of many individuals in a cost-effective manner has led to the revelation of the genetic etiologies of a number of neurological disorders. Parallel advancements in predictive software, for example, have allowed for the annotation of potentially pathogenic variants. However, the development of appropriate systems to functionally interpret variants and identify pathogenic mechanisms has lagged behind. Understanding pathogenic mechanisms is crucial to the development of targeted therapeutics. Therefore, the main challenge to translating genetic findings into targeted therapeutics is functional modeling.
Increased understanding of the genetic architecture of epilepsy and the hyperexcitability that results from many epilepsy-causing variants makes the disease particularly well-suited for the development of model systems for functional interpretation of genetic variation. To capture the effects of genetic variation in neurological diseases, like epilepsy, complex cellular systems are crucial. In my thesis I describe a paradigm that addresses the need for complex cellular systems. The paradigm utilizes cultured neural networks (CNNs) that can be collected either from mouse models or derived from human induced stem cell models (hIPSCs). CNNs retain much of the electrical and network forming capabilities of the intact brain. CNNs plated onto multi-well microelectrode arrays (CNN-MEAs), which capture extracellular activity of electrically active cells, therefore offer a particularly appealing cellular system for the investigation of genetic variants that cause neurological disorders.
In chapter one I review the history of genetics and epilepsy. I discuss how studies of genetic variants that cause epilepsy give insights into the mechanisms of a wide scope of neurological disorders. I suggest that epilepsy is therefore a good place to start in the development of cellular models of disease and targeted therapeutic options. I next introduce the MEA as a platform capable of capturing important electrophysiological data from CNNs, creating the foundations for chapters two and three.
Chapter two describes one application of the CNN-MEA paradigm in which we inhibited microRNA (miRNA) expression in vitro and evaluated the resulting activity profiles. MiRNAs are increasingly linked to epileptogenesis. We show that small differences in miRNA expression can have large effects on network activity. Chapter two offers a proof-of-principle of the utility of the CNN-MEA paradigm in capturing pathogenic hyperexcitability.
Chapter three discusses a second application in which mutations in the ATP1A3 gene were evaluated. Mutations in ATP1A3 cause at least four distinct disorders and it is not yet fully understood how mutations mediate pathophysiologic consequences. We first investigate ATP1A3 mutations in COS7 cells and observe no clear differences. We next evaluate the effect of two mutations that cause the most severe ATP1A3-associated disorder, Alternating Hemiplegia of Childhood (AHC), on network dynamics. We show that mutant cultures demonstrate hypersynchronous activity and distorted bursting properties when compared to wild-type. Using strategic pharmacological manipulation, we illustrate the role of GABA neurotransmission on aberrant network dynamics and further show the partial rescue of activity phenotypes using adenosine triphosphate (ATP) and an anti-epileptic drug. Chapter three illustrates the shortcomings of heterologous cell modeling and provides additional support for the use of CNN-MEAs to study genetic variation.
The CNN-MEA paradigm provides a promising method to evaluate the effect of mutations that cause neurological disorders. Furthermore, with the use of multi-well MEAs, this paradigm provides a scalable option to evaluate multiple parameters simultaneously. Understanding the functional impact of genetic variation using the CNN-MEA paradigm is a crucial step to developing targeted therapeutics.
Item Open Access Mutations in NGLY1 cause an inherited disorder of the endoplasmic reticulum-associated degradation pathway.(Genet Med, 2014-10) Enns, Gregory M; Shashi, Vandana; Bainbridge, Matthew; Gambello, Michael J; Zahir, Farah R; Bast, Thomas; Crimian, Rebecca; Schoch, Kelly; Platt, Julia; Cox, Rachel; Bernstein, Jonathan A; Scavina, Mena; Walter, Rhonda S; Bibb, Audrey; Jones, Melanie; Hegde, Madhuri; Graham, Brett H; Need, Anna C; Oviedo, Angelica; Schaaf, Christian P; Boyle, Sean; Butte, Atul J; Chen, Rui; Chen, Rong; Clark, Michael J; Haraksingh, Rajini; FORGE Canada Consortium; Cowan, Tina M; He, Ping; Langlois, Sylvie; Zoghbi, Huda Y; Snyder, Michael; Gibbs, Richard A; Freeze, Hudson H; Goldstein, David BPURPOSE: The endoplasmic reticulum-associated degradation pathway is responsible for the translocation of misfolded proteins across the endoplasmic reticulum membrane into the cytosol for subsequent degradation by the proteasome. To define the phenotype associated with a novel inherited disorder of cytosolic endoplasmic reticulum-associated degradation pathway dysfunction, we studied a series of eight patients with deficiency of N-glycanase 1. METHODS: Whole-genome, whole-exome, or standard Sanger sequencing techniques were employed. Retrospective chart reviews were performed in order to obtain clinical data. RESULTS: All patients had global developmental delay, a movement disorder, and hypotonia. Other common findings included hypolacrima or alacrima (7/8), elevated liver transaminases (6/7), microcephaly (6/8), diminished reflexes (6/8), hepatocyte cytoplasmic storage material or vacuolization (5/6), and seizures (4/8). The nonsense mutation c.1201A>T (p.R401X) was the most common deleterious allele. CONCLUSION: NGLY1 deficiency is a novel autosomal recessive disorder of the endoplasmic reticulum-associated degradation pathway associated with neurological dysfunction, abnormal tear production, and liver disease. The majority of patients detected to date carry a specific nonsense mutation that appears to be associated with severe disease. The phenotypic spectrum is likely to enlarge as cases with a broader range of mutations are detected.Item Open Access Personalized Medicine and Human Genetic Diversity(COLD SPRING HARBOR PERSPECTIVES IN BIOLOGY, 2014-09) Lu, Yi-Fan; Goldstein, David B; Angrist, Misha; Cavalleri, GianpieroItem Open Access Population Genetic Annotation of the Human Genome: Identifying Pathogenic Mutations(2016) Gussow, Ayal BaruchIn the past decade, there have been a series of breakthroughs in human genetics. The advent of next-generation sequencing (NGS) has made it possible, for the first time, to sequence an entire human genome inexpensively and efficiently. The affordability and ease of NGS has led to an explosion of data. Now, the largest hurdle in human genetics has shifted from technology-based limitations of sequencing to developing a framework interpreting the large amount of data that has been generated. Specifically, in medical genetics, the key challenge is recognizing which of a given patient's many mutations may be contributing to disease.
The most successful methodologies for this problem rely on conservation. However, conservation cannot capture human-specific intolerance to variation. Other methodologies rely on biochemical annotations, which can indicate a genomic region's functional role, but do not directly assess its intolerance to variation in the context of disease. Therefore, despite these available methodologies, detecting causal variation still remains incredibly challenging.
In my thesis, I describe three methodologies, based on population genetics and standing human variation, which can help identify the regions of the genome that are most likely to cause disease when mutated. The first, subRVIS, focuses on sub-regions within genes. The second, ncRVIS, focuses on the regulatory regions of genes. The third, Orion, tackles the daunting problem of interpreting and prioritizing variants across the entire genome.
In Chapter 1, we will review some of the history that has brought us to this point and some of the methodologies currently in use for detecting disease-causing variants.
In Chapter 2, we describe subRVIS, a methodology that divides the gene into sub-regions based on sequence homology to known protein domains, and then ranks those sub-regions based on their tolerance to functional variation. We show that this ranking is associated with the sub-region's likelihood of carrying a previously known pathogenic mutation. Further, we demonstrate that the biological division into domains adds significant information in comparison to dividing the gene into random regions matched in size. This methodology is useful in localizing where pathogenic mutations are most likely to fall within genes.
Chapter 3 describes a methodology to rank genes based on the likelihood that mutations falling in their regulatory regions are pathogenic. We demonstrate that this ranking is associated with whether or not a gene is sensitive to changes in its dosage. This methodology is useful in assessing the pathogenicity of mutations occurring in known regulatory regions that have been associated with genes.
In Chapter 4 we tackle one of the most intimidating and challenging problems in the field of medical genetics: detecting intolerance to variation across the entire human genome. Using a sliding window, we generate a score per base to highlight the regions of the genome that are intolerant to variation, with higher scores corresponding to more intolerant sequence. We term this approach Orion. We demonstrate that exons and DNase hypersensitive sites are enriched for higher Orion scores. This methodology will transform the way whole-genome sequence data are interpreted, by giving researchers the ability to assess the pathogenicity of variants in regions of the genome that are not yet fully understood.
We have developed methodologies to tackle the key problem of detecting disease-causing variation in patients' sequence data. In an era overwhelmed by NGS data, these methodologies bring us closer to understanding the genetics of disease.
Item Open Access Rare variants create synthetic genome-wide associations.(PLoS Biol, 2010-01-26) Dickson, Samuel P; Wang, Kai; Krantz, Ian; Hakonarson, Hakon; Goldstein, David BGenome-wide association studies (GWAS) have now identified at least 2,000 common variants that appear associated with common diseases or related traits (http://www.genome.gov/gwastudies), hundreds of which have been convincingly replicated. It is generally thought that the associated markers reflect the effect of a nearby common (minor allele frequency >0.05) causal site, which is associated with the marker, leading to extensive resequencing efforts to find causal sites. We propose as an alternative explanation that variants much less common than the associated one may create "synthetic associations" by occurring, stochastically, more often in association with one of the alleles at the common site versus the other allele. Although synthetic associations are an obvious theoretical possibility, they have never been systematically explored as a possible explanation for GWAS findings. Here, we use simple computer simulations to show the conditions under which such synthetic associations will arise and how they may be recognized. We show that they are not only possible, but inevitable, and that under simple but reasonable genetic models, they are likely to account for or contribute to many of the recently identified signals reported in genome-wide association studies. We also illustrate the behavior of synthetic associations in real datasets by showing that rare causal mutations responsible for both hearing loss and sickle cell anemia create genome-wide significant synthetic associations, in the latter case extending over a 2.5-Mb interval encompassing scores of "blocks" of associated variants. In conclusion, uncommon or rare genetic variants can easily create synthetic associations that are credited to common variants, and this possibility requires careful consideration in the interpretation and follow up of GWAS signals.Item Open Access Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.(Genome Biol, 2010) Cirulli, Elizabeth T; Singh, Abanish; Shianna, Kevin V; Ge, Dongliang; Smith, Jason P; Maia, Jessica M; Heinzen, Erin L; Goedert, James J; Goldstein, David B; Center for HIV/AIDS Vaccine Immunology (CHAVI)BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.Item Open Access The characterization of twenty sequenced human genomes.(PLoS Genet, 2010-09-09) Pelak, Kimberly; Shianna, Kevin V; Ge, Dongliang; Maia, Jessica M; Zhu, Mingfu; Smith, Jason P; Cirulli, Elizabeth T; Fellay, Jacques; Dickson, Samuel P; Gumbs, Curtis E; Heinzen, Erin L; Need, Anna C; Ruzzo, Elizabeth K; Singh, Abanish; Campbell, C Ryan; Hong, Linda K; Lornsen, Katharina A; McKenzie, Alexander M; Sobreira, Nara LM; Hoover-Fong, Julie E; Milner, Joshua D; Ottman, Ruth; Haynes, Barton F; Goedert, James J; Goldstein, David BWe present the analysis of twenty human genomes to evaluate the prospects for identifying rare functional variants that contribute to a phenotype of interest. We sequenced at high coverage ten "case" genomes from individuals with severe hemophilia A and ten "control" genomes. We summarize the number of genetic variants emerging from a study of this magnitude, and provide a proof of concept for the identification of rare and highly-penetrant functional variants by confirming that the cause of hemophilia A is easily recognizable in this data set. We also show that the number of novel single nucleotide variants (SNVs) discovered per genome seems to stabilize at about 144,000 new variants per genome, after the first 15 individuals have been sequenced. Finally, we find that, on average, each genome carries 165 homozygous protein-truncating or stop loss variants in genes representing a diverse set of pathways.