Browsing by Subject "Sequence Alignment"
Now showing 1 - 20 of 28
- Results Per Page
- Sort Options
Item Open Access 29 mammalian genomes reveal novel exaptations of mobile elements for likely regulatory functions in the human genome.(PloS one, 2012-01) Lowe, Craig B; Haussler, DavidRecent research supports the view that changes in gene regulation, as opposed to changes in the genes themselves, play a significant role in morphological evolution. Gene regulation is largely dependent on transcription factor binding sites. Researchers are now able to use the available 29 mammalian genomes to measure selective constraint at the level of binding sites. This detailed map of constraint suggests that mammalian genomes co-opt fragments of mobile elements to act as gene regulatory sequence on a large scale. In the human genome we detect over 280,000 putative regulatory elements, totaling approximately 7 Mb of sequence, that originated as mobile element insertions. These putative regulatory regions are conserved non-exonic elements (CNEEs), which show considerable cross-species constraint and signatures of continued negative selection in humans, yet do not appear in a known mature transcript. These putative regulatory elements were co-opted from SINE, LINE, LTR and DNA transposon insertions. We demonstrate that at least 11%, and an estimated 20%, of gene regulatory sequence in the human genome showing cross-species conservation was co-opted from mobile elements. The location in the genome of CNEEs co-opted from mobile elements closely resembles that of CNEEs in general, except in the centers of the largest gene deserts where recognizable co-option events are relatively rare. We find that regions of certain mobile element insertions are more likely to be held under purifying selection than others. In particular, we show 6 examples where paralogous instances of an often co-opted mobile element region define a sequence motif that closely matches a transcription factor's binding profile.Item Open Access A flexible statistical model for alignment of label-free proteomics data--incorporating ion mobility and product ion information.(BMC Bioinformatics, 2013-12-16) Benjamin, Ashlee M; Thompson, J Will; Soderblom, Erik J; Geromanos, Scott J; Henao, Ricardo; Kraus, Virginia B; Moseley, M Arthur; Lucas, Joseph EBACKGROUND: The goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing--the matching of peptide measurements across samples. RESULTS: We describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates. CONCLUSIONS: Based on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods.Item Open Access A high-resolution map of human evolutionary constraint using 29 mammals.(Nature, 2011-10-12) Lindblad-Toh, Kerstin; Garber, Manuel; Zuk, Or; Lin, Michael F; Parker, Brian J; Washietl, Stefan; Kheradpour, Pouya; Ernst, Jason; Jordan, Gregory; Mauceli, Evan; Ward, Lucas D; Lowe, Craig B; Holloway, Alisha K; Clamp, Michele; Gnerre, Sante; Alföldi, Jessica; Beal, Kathryn; Chang, Jean; Clawson, Hiram; Cuff, James; Di Palma, Federica; Fitzgerald, Stephen; Flicek, Paul; Guttman, Mitchell; Hubisz, Melissa J; Jaffe, David B; Jungreis, Irwin; Kent, W James; Kostka, Dennis; Lara, Marcia; Martins, Andre L; Massingham, Tim; Moltke, Ida; Raney, Brian J; Rasmussen, Matthew D; Robinson, Jim; Stark, Alexander; Vilella, Albert J; Wen, Jiayu; Xie, Xiaohui; Zody, Michael C; Broad Institute Sequencing Platform and Whole Genome Assembly Team; Baldwin, Jen; Bloom, Toby; Chin, Chee Whye; Heiman, Dave; Nicol, Robert; Nusbaum, Chad; Young, Sarah; Wilkinson, Jane; Worley, Kim C; Kovar, Christie L; Muzny, Donna M; Gibbs, Richard A; Baylor College of Medicine Human Genome Sequencing Center Sequencing Team; Cree, Andrew; Dihn, Huyen H; Fowler, Gerald; Jhangiani, Shalili; Joshi, Vandita; Lee, Sandra; Lewis, Lora R; Nazareth, Lynne V; Okwuonu, Geoffrey; Santibanez, Jireh; Warren, Wesley C; Mardis, Elaine R; Weinstock, George M; Wilson, Richard K; Genome Institute at Washington University; Delehaunty, Kim; Dooling, David; Fronik, Catrina; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Minx, Patrick; Sodergren, Erica; Birney, Ewan; Margulies, Elliott H; Herrero, Javier; Green, Eric D; Haussler, David; Siepel, Adam; Goldman, Nick; Pollard, Katherine S; Pedersen, Jakob S; Lander, Eric S; Kellis, ManolisThe comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.Item Open Access An enzyme that inactivates the inflammatory mediator leukotriene b4 restricts mycobacterial infection.(PLoS One, 2013) Tobin, David M; Roca, Francisco J; Ray, John P; Ko, Dennis C; Ramakrishnan, LalitaWhile tuberculosis susceptibility has historically been ascribed to failed inflammation, it is now known that an excess of leukotriene A4 hydrolase (LTA4H), which catalyzes the final step in leukotriene B4 (LTB4) synthesis, produces a hyperinflammatory state and tuberculosis susceptibility. Here we show that the LTB4-inactivating enzyme leukotriene B4 dehydrogenase/prostaglandin reductase 1 (LTB4DH/PTGR1) restricts inflammation and independently confers resistance to tuberculous infection. LTB4DH overexpression counters the susceptibility resulting from LTA4H excess while ltb4dh-deficient animals can be rescued pharmacologically by LTB4 receptor antagonists. These data place LTB4DH as a key modulator of TB susceptibility and suggest new tuberculosis therapeutic strategies.Item Open Access Binding site on human immunoglobulin G for the affinity ligand HWRGWV.(Journal of molecular recognition : JMR, 2010-05) Yang, Haiou; Gurgel, Patrick V; Williams, D Keith; Bobay, Benjamin G; Cavanagh, John; Muddiman, David C; Carbonell, Ruben GAffinity ligand HWRGWV has demonstrated the ability to isolate human immunoglobulin G (hIgG) from mammalian cell culture media. The ligand specifically binds hIgG through its Fc portion. This work shows that deglycosylation of hIgG has no influence on its binding to the HWRGWV ligand and the ligand does not compete with Protein A or Protein G in binding hIgG. It is suggested by the mass spectrometry (MS) data and docking simulation that HWRGWV binds to the pFc portion of hIgG and interacts with the amino acids in the loop Ser383-Asn389 (SNGQPEN) located in the C(H)3 domain. Subsequent modeling has suggested a possible three-dimensional minimized solution structure for the interaction of hIgG and the HWRGWV ligand. The results support the fact that a peptide as small as a hexamer can have specific interactions with large proteins such as hIgG.Item Open Access Comparative genomics based on massive parallel transcriptome sequencing reveals patterns of substitution and selection across 10 bird species.(Mol Ecol, 2010-03) Künstner, Axel; Wolf, Jochen BW; Backström, Niclas; Whitney, Osceola; Balakrishnan, Christopher N; Day, Lainy; Edwards, Scott V; Janes, Daniel E; Schlinger, Barney A; Wilson, Richard K; Jarvis, Erich D; Warren, Wesley C; Ellegren, HansNext-generation sequencing technology provides an attractive means to obtain large-scale sequence data necessary for comparative genomic analysis. To analyse the patterns of mutation rate variation and selection intensity across the avian genome, we performed brain transcriptome sequencing using Roche 454 technology of 10 different non-model avian species. Contigs from de novo assemblies were aligned to the two available avian reference genomes, chicken and zebra finch. In total, we identified 6499 different genes across all 10 species, with approximately 1000 genes found in each full run per species. We found evidence for a higher mutation rate of the Z chromosome than of autosomes (male-biased mutation) and a negative correlation between the neutral substitution rate (d(S)) and chromosome size. Analyses of the mean d(N)/d(S) ratio (omega) of genes across chromosomes supported the Hill-Robertson effect (the effect of selection at linked loci) and point at stochastic problems with omega as an independent measure of selection. Overall, this study demonstrates the usefulness of next-generation sequencing for obtaining genomic resources for comparative genomic analysis of non-model organisms.Item Open Access Domain-oriented edge-based alignment of protein interaction networks.(Bioinformatics, 2009-06-15) Guo, Xin; Hartemink, Alexander JMOTIVATION: Recent advances in high-throughput experimental techniques have yielded a large amount of data on protein-protein interactions (PPIs). Since these interactions can be organized into networks, and since separate PPI networks can be constructed for different species, a natural research direction is the comparative analysis of such networks across species in order to detect conserved functional modules. This is the task of network alignment. RESULTS: Most conventional network alignment algorithms adopt a node-then-edge-alignment paradigm: they first identify homologous proteins across networks and then consider interactions among them to construct network alignments. In this study, we propose an alternative direct-edge-alignment paradigm. Specifically, instead of explicit identification of homologous proteins, we directly infer plausibly alignable PPIs across species by comparing conservation of their constituent domain interactions. We apply our approach to detect conserved protein complexes in yeast-fly and yeast-worm PPI networks, and show that our approach outperforms two recent approaches in most alignment performance metrics. AVAILABILITY: Supplementary material and source code can be found at http://www.cs.duke.edu/ approximately amink/.Item Open Access Evolution of networks and sequences in eukaryotic cell cycle control.(Philos Trans R Soc Lond B Biol Sci, 2011-12-27) Cross, Frederick R; Buchler, Nicolas E; Skotheim, Jan MThe molecular networks regulating the G1-S transition in budding yeast and mammals are strikingly similar in network structure. However, many of the individual proteins performing similar network roles appear to have unrelated amino acid sequences, suggesting either extremely rapid sequence evolution, or true polyphyly of proteins carrying out identical network roles. A yeast/mammal comparison suggests that network topology, and its associated dynamic properties, rather than regulatory proteins themselves may be the most important elements conserved through evolution. However, recent deep phylogenetic studies show that fungal and animal lineages are relatively closely related in the opisthokont branch of eukaryotes. The presence in plants of cell cycle regulators such as Rb, E2F and cyclins A and D, that appear lost in yeast, suggests cell cycle control in the last common ancestor of the eukaryotes was implemented with this set of regulatory proteins. Forward genetics in non-opisthokonts, such as plants or their green algal relatives, will provide direct information on cell cycle control in these organisms, and may elucidate the potentially more complex cell cycle control network of the last common eukaryotic ancestor.Item Open Access Finding regulatory DNA motifs using alignment-free evolutionary conservation information.(Nucleic Acids Res, 2010-04) Gordân, Raluca; Narlikar, Leelavati; Hartemink, Alexander JAs an increasing number of eukaryotic genomes are being sequenced, comparative studies aimed at detecting regulatory elements in intergenic sequences are becoming more prevalent. Most comparative methods for transcription factor (TF) binding site discovery make use of global or local alignments of orthologous regulatory regions to assess whether a particular DNA site is conserved across related organisms, and thus more likely to be functional. Since binding sites are usually short, sometimes degenerate, and often independent of orientation, alignment algorithms may not align them correctly. Here, we present a novel, alignment-free approach for using conservation information for TF binding site discovery. We relax the definition of conserved sites: we consider a DNA site within a regulatory region to be conserved in an orthologous sequence if it occurs anywhere in that sequence, irrespective of orientation. We use this definition to derive informative priors over DNA sequence positions, and incorporate these priors into a Gibbs sampling algorithm for motif discovery. Our approach is simple and fast. It requires neither sequence alignments nor the phylogenetic relationships between the orthologous sequences, yet it is more effective on real biological data than methods that do.Item Open Access Genetic signatures in the envelope glycoproteins of HIV-1 that associate with broadly neutralizing antibodies.(PLoS Comput Biol, 2010-10-07) Gnanakaran, S; Daniels, MG; Bhattacharya, T; Lapedes, AS; Sethi, A; Li, M; Tang, H; Greene, K; Gao, H; Haynes, BF; Cohen, MS; Shaw, GM; Seaman, MS; Kumar, A; Gao, F; Montefiori, DC; Korber, BA steady increase in knowledge of the molecular and antigenic structure of the gp120 and gp41 HIV-1 envelope glycoproteins (Env) is yielding important new insights for vaccine design, but it has been difficult to translate this information to an immunogen that elicits broadly neutralizing antibodies. To help bridge this gap, we used phylogenetically corrected statistical methods to identify amino acid signature patterns in Envs derived from people who have made potently neutralizing antibodies, with the hypothesis that these Envs may share common features that would be useful for incorporation in a vaccine immunogen. Before attempting this, essentially as a control, we explored the utility of our computational methods for defining signatures of complex neutralization phenotypes by analyzing Env sequences from 251 clonal viruses that were differentially sensitive to neutralization by the well-characterized gp120-specific monoclonal antibody, b12. We identified ten b12-neutralization signatures, including seven either in the b12-binding surface of gp120 or in the V2 region of gp120 that have been previously shown to impact b12 sensitivity. A simple algorithm based on the b12 signature pattern was predictive of b12 sensitivity/resistance in an additional blinded panel of 57 viruses. Upon obtaining these reassuring outcomes, we went on to apply these same computational methods to define signature patterns in Env from HIV-1 infected individuals who had potent, broadly neutralizing responses. We analyzed a checkerboard-style neutralization dataset with sera from 69 HIV-1-infected individuals tested against a panel of 25 different Envs. Distinct clusters of sera with high and low neutralization potencies were identified. Six signature positions in Env sequences obtained from the 69 samples were found to be strongly associated with either the high or low potency responses. Five sites were in the CD4-induced coreceptor binding site of gp120, suggesting an important role for this region in the elicitation of broadly neutralizing antibody responses against HIV-1.Item Open Access Horsetails and ferns are a monophyletic group and the closest living relatives to seed plants.(Nature, 2001-02) Pryer, KM; Schneider, H; Smith, AR; Cranfill, R; Wolf, PG; Hunt, JS; Sipes, SDMost of the 470-million-year history of plants on land belongs to bryophytes, pteridophytes and gymnosperms, which eventually yielded to the ecological dominance by angiosperms 90 Myr ago. Our knowledge of angiosperm phylogeny, particularly the branching order of the earliest lineages, has recently been increased by the concurrence of multigene sequence analyses. However, reconstructing relationships for all the main lineages of vascular plants that diverged since the Devonian period has remained a challenge. Here we report phylogenetic analyses of combined data--from morphology and from four genes--for 35 representatives from all the main lineages of land plants. We show that there are three monophyletic groups of extant vascular plants: (1) lycophytes, (2) seed plants and (3) a clade including equisetophytes (horsetails), psilotophytes (whisk ferns) and all eusporangiate and leptosporangiate ferns. Our maximum-likelihood analysis shows unambiguously that horsetails and ferns together are the closest relatives to seed plants. This refutes the prevailing view that horsetails and ferns are transitional evolutionary grades between bryophytes and seed plants, and has important implications for our understanding of the development and evolution of plants.Item Open Access Identification of cis-suppression of human disease mutations by comparative genomics.(Nature, 2015-08) Jordan, Daniel M; Frangakis, Stephan G; Golzio, Christelle; Cassa, Christopher A; Kurtzberg, Joanne; Task Force for Neonatal Genomics; Davis, Erica E; Sunyaev, Shamil R; Katsanis, NicholasPatterns of amino acid conservation have served as a tool for understanding protein evolution. The same principles have also found broad application in human genomics, driven by the need to interpret the pathogenic potential of variants in patients. Here we performed a systematic comparative genomics analysis of human disease-causing missense variants. We found that an appreciable fraction of disease-causing alleles are fixed in the genomes of other species, suggesting a role for genomic context. We developed a model of genetic interactions that predicts most of these to be simple pairwise compensations. Functional testing of this model on two known human disease genes revealed discrete cis amino acid residues that, although benign on their own, could rescue the human mutations in vivo. This approach was also applied to ab initio gene discovery to support the identification of a de novo disease driver in BTG2 that is subject to protective cis-modification in more than 50 species. Finally, on the basis of our data and models, we developed a computational tool to predict candidate residues subject to compensation. Taken together, our data highlight the importance of cis-genomic context as a contributor to protein evolution; they provide an insight into the complexity of allele effect on phenotype; and they are likely to assist methods for predicting allele pathogenicity.Item Open Access IGHV1-69 B cell chronic lymphocytic leukemia antibodies cross-react with HIV-1 and hepatitis C virus antigens as well as intestinal commensal bacteria.(PLoS One, 2014) Hwang, Kwan-Ki; Trama, Ashley M; Kozink, Daniel M; Chen, Xi; Wiehe, Kevin; Cooper, Abby J; Xia, Shi-Mao; Wang, Minyue; Marshall, Dawn J; Whitesides, John; Alam, Munir; Tomaras, Georgia D; Allen, Steven L; Rai, Kanti R; McKeating, Jane; Catera, Rosa; Yan, Xiao-Jie; Chu, Charles C; Kelsoe, Garnett; Liao, Hua-Xin; Chiorazzi, Nicholas; Haynes, Barton FB-cell chronic lymphocytic leukemia (B-CLL) patients expressing unmutated immunoglobulin heavy variable regions (IGHVs) use the IGHV1-69 B cell receptor (BCR) in 25% of cases. Since HIV-1 envelope gp41 antibodies also frequently use IGHV1-69 gene segments, we hypothesized that IGHV1-69 B-CLL precursors may contribute to the gp41 B cell response during HIV-1 infection. To test this hypothesis, we rescued 5 IGHV1-69 unmutated antibodies as heterohybridoma IgM paraproteins and as recombinant IgG1 antibodies from B-CLL patients, determined their antigenic specificities and analyzed BCR sequences. IGHV1-69 B-CLL antibodies were enriched for reactivity with HIV-1 envelope gp41, influenza, hepatitis C virus E2 protein and intestinal commensal bacteria. These IGHV1-69 B-CLL antibodies preferentially used IGHD3 and IGHJ6 gene segments and had long heavy chain complementary determining region 3s (HCDR3s) (≥21 aa). IGHV1-69 B-CLL BCRs exhibited a phenylalanine at position 54 (F54) of the HCDR2 as do rare HIV-1 gp41 and influenza hemagglutinin stem neutralizing antibodies, while IGHV1-69 gp41 antibodies induced by HIV-1 infection predominantly used leucine (L54) allelic variants. These results demonstrate that the B-CLL cell population is an expansion of members of the innate polyreactive B cell repertoire with reactivity to a number of infectious agent antigens including intestinal commensal bacteria. The B-CLL IGHV1-69 B cell usage of F54 allelic variants strongly suggests that IGHV1-69 B-CLL gp41 antibodies derive from a restricted B cell pool that also produces rare HIV-1 gp41 and influenza hemagglutinin stem antibodies.Item Open Access Incongruence between primary sequence data and the distribution of a mitochondrial atp1 group II intron among ferns and horsetails.(Molecular phylogenetics and evolution, 2005-09) Wikström, N; Pryer, KMUsing DNA sequence data from multiple genes (often from more than one genome compartment) to reconstruct phylogenetic relationships has become routine. Augmenting this approach with genomic structural characters (e.g., intron gain and loss, changes in gene order) as these data become available from comparative studies already has provided critical insight into some long-standing questions about the evolution of land plants. Here we report on the presence of a group II intron located in the mitochondrial atp1 gene of leptosporangiate and marattioid ferns. Primary sequence data for the atp1 gene are newly reported for 27 taxa, and results are presented from maximum likelihood-based phylogenetic analyses using Bayesian inference for 34 land plants in three data sets: (1) single-gene mitochondrial atp1 (exon+intron sequences); (2) five combined genes (mitochondrial atp1 [exon only]; plastid rbcL, atpB, rps4; nuclear SSU rDNA); and (3) same five combined genes plus morphology. All our phylogenetic analyses corroborate results from previous fern studies that used plastid and nuclear sequence data: the monophyly of euphyllophytes, as well as of monilophytes; whisk ferns (Psilotidae) sister to ophioglossoid ferns (Ophioglossidae); horsetails (Equisetopsida) sister to marattioid ferns (Marattiidae), which together are sister to the monophyletic leptosporangiate ferns. In contrast to the results from the primary sequence data, the genomic structural data (atp1 intron distribution pattern) would seem to suggest that leptosporangiate and marattioid ferns are monophyletic, and together they are the sister group to horsetails--a topology that is rarely reconstructed using primary sequence data.Item Open Access Interrogation of individual intratumoral B lymphocytes from lung cancer patients for molecular target discovery.(Cancer Immunol Immunother, 2016-02) Campa, Michael J; Moody, M Anthony; Zhang, Ruijun; Liao, Hua-Xin; Gottlin, Elizabeth B; Patz, Edward FIntratumoral B lymphocytes are an integral part of the lung tumor microenvironment. Interrogation of the antibodies they express may improve our understanding of the host response to cancer and could be useful in elucidating novel molecular targets. We used two strategies to explore the repertoire of intratumoral B cell antibodies. First, we cloned VH and VL genes from single intratumoral B lymphocytes isolated from one lung tumor, expressed the genes as recombinant mAbs, and used the mAbs to identify the cognate tumor antigens. The Igs derived from intratumoral B cells demonstrated class switching, with a mean VH mutation frequency of 4%. Although there was no evidence for clonal expansion, these data are consistent with antigen-driven somatic hypermutation. Individual recombinant antibodies were polyreactive, although one clone demonstrated preferential immunoreactivity with tropomyosin 4 (TPM4). We found that higher levels of TPM4 antibodies were more common in cancer patients, but measurement of TPM4 antibody levels was not a sensitive test for detecting cancer. Second, in an effort to focus our recombinant antibody expression efforts on those B cells that displayed evidence of clonal expansion driven by antigen stimulation, we performed deep sequencing of the Ig genes of B cells collected from seven different tumors. Deep sequencing demonstrated somatic hypermutation but no dominant clones. These strategies may be useful for the study of B cell antibody expression, although identification of a dominant clone and unique therapeutic targets may require extensive investigation.Item Open Access Investigating deep phylogenetic relationships among cyanobacteria and plastids by small subunit rRNA sequence analysis.(The Journal of eukaryotic microbiology, 1999-07) Turner, S; Pryer, KM; Miao, VP; Palmer, JDSmall subunit rRNA sequence data were generated for 27 strains of cyanobacteria and incorporated into a phylogenetic analysis of 1,377 aligned sequence positions from a diverse sampling of 53 cyanobacteria and 10 photosynthetic plastids. Tree inference was carried out using a maximum likelihood method with correction for site-to-site variation in evolutionary rate. Confidence in the inferred phylogenetic relationships was determined by construction of a majority-rule consensus tree based on alternative topologies not considered to be statistically significantly different from the optimal tree. The results are in agreement with earlier studies in the assignment of individual taxa to specific sequence groups. Several relationships not previously noted among sequence groups are indicated, whereas other relationships previously supported are contradicted. All plastids cluster as a strongly supported monophyletic group arising near the root of the cyanobacterial line of descent.Item Open Access Leveraging Fungal and Human Calcineurin-Inhibitor Structures, Biophysical Data, and Dynamics To Design Selective and Nonimmunosuppressive FK506 Analogs.(mBio, 2021-12) Gobeil, Sophie M-C; Bobay, Benjamin G; Juvvadi, Praveen R; Cole, D Christopher; Heitman, Joseph; Steinbach, William J; Venters, Ronald A; Spicer, Leonard DCalcineurin is a critical enzyme in fungal pathogenesis and antifungal drug tolerance and, therefore, an attractive antifungal target. Current clinically accessible calcineurin inhibitors, such as FK506, are immunosuppressive to humans, so exploiting calcineurin inhibition as an antifungal strategy necessitates fungal specificity in order to avoid inhibiting the human pathway. Harnessing fungal calcineurin-inhibitor crystal structures, we recently developed a less immunosuppressive FK506 analog, APX879, with broad-spectrum antifungal activity and demonstrable efficacy in a murine model of invasive fungal infection. Our overarching goal is to better understand, at a molecular level, the interaction determinants of the human and fungal FK506-binding proteins (FKBP12) required for calcineurin inhibition in order to guide the design of fungus-selective, nonimmunosuppressive FK506 analogs. To this end, we characterized high-resolution structures of the Mucor circinelloides FKBP12 bound to FK506 and of the Aspergillus fumigatus, M. circinelloides, and human FKBP12 proteins bound to the FK506 analog APX879, which exhibits enhanced selectivity for fungal pathogens. Combining structural, genetic, and biophysical methodologies with molecular dynamics simulations, we identify critical variations in these structurally similar FKBP12-ligand complexes. The work presented here, aimed at the rational design of more effective calcineurin inhibitors, indeed suggests that modifications to the APX879 scaffold centered around the C15, C16, C18, C36, and C37 positions provide the potential to significantly enhance fungal selectivity. IMPORTANCE Invasive fungal infections are a leading cause of death in the immunocompromised patient population. The rise in drug resistance to current antifungals highlights the urgent need to develop more efficacious and highly selective agents. Numerous investigations of major fungal pathogens have confirmed the critical role of the calcineurin pathway for fungal virulence, making it an attractive target for antifungal development. Although FK506 inhibits calcineurin, it is immunosuppressive in humans and cannot be used as an antifungal. By combining structural, genetic, biophysical, and in silico methodologies, we pinpoint regions of the FK506 scaffold and a less immunosuppressive analog, APX879, centered around the C15 to C18 and C36 to C37 positions that could be altered with selective extensions and/or deletions to enhance fungal selectivity. This work represents a significant advancement toward realizing calcineurin as a viable target for antifungal drug discovery.Item Open Access Modeling the evolution of regulatory elements by simultaneous detection and alignment with phylogenetic pair HMMs.(PLoS Comput Biol, 2010-12-16) Majoros, William H; Ohler, UweThe computational detection of regulatory elements in DNA is a difficult but important problem impacting our progress in understanding the complex nature of eukaryotic gene regulation. Attempts to utilize cross-species conservation for this task have been hampered both by evolutionary changes of functional sites and poor performance of general-purpose alignment programs when applied to non-coding sequence. We describe a new and flexible framework for modeling binding site evolution in multiple related genomes, based on phylogenetic pair hidden Markov models which explicitly model the gain and loss of binding sites along a phylogeny. We demonstrate the value of this framework for both the alignment of regulatory regions and the inference of precise binding-site locations within those regions. As the underlying formalism is a stochastic, generative model, it can also be used to simulate the evolution of regulatory elements. Our implementation is scalable in terms of numbers of species and sequence lengths and can produce alignments and binding-site predictions with accuracy rivaling or exceeding current systems that specialize in only alignment or only binding-site prediction. We demonstrate the validity and power of various model components on extensive simulations of realistic sequence data and apply a specific model to study Drosophila enhancers in as many as ten related genomes and in the presence of gain and loss of binding sites. Different models and modeling assumptions can be easily specified, thus providing an invaluable tool for the exploration of biological hypotheses that can drive improvements in our understanding of the mechanisms and evolution of gene regulation.Item Open Access Phylogenomic analyses data of the avian phylogenomics project.(Gigascience, 2015) Jarvis, Erich D; Mirarab, Siavash; Aberer, Andre J; Li, Bo; Houde, Peter; Li, Cai; Ho, Simon YW; Faircloth, Brant C; Nabholz, Benoit; Howard, Jason T; Suh, Alexander; Weber, Claudia C; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Narula, Nitish; Liu, Liang; Burt, Dave; Ellegren, Hans; Edwards, Scott V; Stamatakis, Alexandros; Mindell, David P; Cracraft, Joel; Braun, Edward L; Warnow, Tandy; Jun, Wang; Gilbert, M Thomas Pius; Zhang, Guojie; Avian Phylogenomics ConsortiumBACKGROUND: Determining the evolutionary relationships among the major lineages of extant birds has been one of the biggest challenges in systematic biology. To address this challenge, we assembled or collected the genomes of 48 avian species spanning most orders of birds, including all Neognathae and two of the five Palaeognathae orders. We used these genomes to construct a genome-scale avian phylogenetic tree and perform comparative genomic analyses. FINDINGS: Here we present the datasets associated with the phylogenomic analyses, which include sequence alignment files consisting of nucleotides, amino acids, indels, and transposable elements, as well as tree files containing gene trees and species trees. Inferring an accurate phylogeny required generating: 1) A well annotated data set across species based on genome synteny; 2) Alignments with unaligned or incorrectly overaligned sequences filtered out; and 3) Diverse data sets, including genes and their inferred trees, indels, and transposable elements. Our total evidence nucleotide tree (TENT) data set (consisting of exons, introns, and UCEs) gave what we consider our most reliable species tree when using the concatenation-based ExaML algorithm or when using statistical binning with the coalescence-based MP-EST algorithm (which we refer to as MP-EST*). Other data sets, such as the coding sequence of some exons, revealed other properties of genome evolution, namely convergence. CONCLUSIONS: The Avian Phylogenomics Project is the largest vertebrate phylogenomics project to date that we are aware of. The sequence, alignment, and tree data are expected to accelerate analyses in phylogenomics and other related areas.Item Open Access Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.(Genome Biol, 2010) Cirulli, Elizabeth T; Singh, Abanish; Shianna, Kevin V; Ge, Dongliang; Smith, Jason P; Maia, Jessica M; Heinzen, Erin L; Goedert, James J; Goldstein, David B; Center for HIV/AIDS Vaccine Immunology (CHAVI)BACKGROUND: There is considerable interest in the development of methods to efficiently identify all coding variants present in large sample sets of humans. There are three approaches possible: whole-genome sequencing, whole-exome sequencing using exon capture methods, and RNA-Seq. While whole-genome sequencing is the most complete, it remains sufficiently expensive that cost effective alternatives are important. RESULTS: Here we provide a systematic exploration of how well RNA-Seq can identify human coding variants by comparing variants identified through high coverage whole-genome sequencing to those identified by high coverage RNA-Seq in the same individual. This comparison allowed us to directly evaluate the sensitivity and specificity of RNA-Seq in identifying coding variants, and to evaluate how key parameters such as the degree of coverage and the expression levels of genes interact to influence performance. We find that although only 40% of exonic variants identified by whole genome sequencing were captured using RNA-Seq; this number rose to 81% when concentrating on genes known to be well-expressed in the source tissue. We also find that a high false positive rate can be problematic when working with RNA-Seq data, especially at higher levels of coverage. CONCLUSIONS: We conclude that as long as a tissue relevant to the trait under study is available and suitable quality control screens are implemented, RNA-Seq is a fast and inexpensive alternative approach for finding coding variants in genes with sufficiently high expression levels.