Browsing by Subject "Phylogenetics"
- Results Per Page
- Sort Options
Item Open Access A Molecular Phylogenetic Study of Historical Biogeography and the Evolution of Self-Incompatibility RNases in Indian Ocean Coffea (Rubiaceae)(2010) Nowak, Michael DennisA fundamental goal in the diverse field of evolutionary biology is reconstructing the historical processes that facilitated lineage diversification and the current geographic distribution of species diversity. Oceanic islands provide a view of evolutionary processes that may otherwise be obscured by the complex biogeographic histories of continental systems, and have thus provided evolutionary biology with some of its most lasting and significant theories. The Indian Ocean island of Madagascar is home to an extraordinarily diverse and endemic biota, and reconstructing the historical processes responsible for this diversity has consumed countless academic careers. While the flowering plant genus Coffea is but one lineage contributing to Madagascar's staggering floral diversity, it is representative of the common evolutionary theme of adaptive radiation and local endemism on the island. In this dissertation, I employ the genus Coffea as a model for understanding historical biogeographic processes in the Indian Ocean using methods of molecular phylogenetics and population genetics. In the molecular phylogenetic study of Coffea presented in chapter 2, I show that Madagascan Coffea diversity is likely the product of at least two independent colonization events from Africa, a result that contradicts current hypotheses for the single origin of this group.
Species of Coffea are known to exhibit self-incompatibly, which can have a dramatic affect on the geographic distribution of plant genetic diversity. In chapter 3, I identify the genetic mechanism of self-incompatibility in Coffea as homologous to the canonical eudicot S-RNase system. Baker's Rule suggests that self-incompatible lineages are very unlikely to colonize oceanic islands, and in chapter 4, I test this hypothesis by characterizing the strength of self-incompatibility and comparing S-RNase polymorphism in Coffea populations endemic to isolated Indian Ocean islands (Grande Comore and Mauritius) with that of Madagascan/African species. My findings suggest that while island populations show little evidence for genetic bottleneck in S-RNase allelic diversity, Mauritian endemic Coffea may have evolved a type of "leaky" self-incompatibility allowing self-fertilization at some unknown rate. Through the application of traditional phylogenetic methods and novel data from the self-incompatibly locus, my dissertation contributes a wealth of new information regarding the evolutionary and biogeographic history of Coffea in the Indian Ocean.
Item Open Access A Next-Generation Approach to Systematics in the Classic Reticulate Polypodium vulgare Species Complex (Polypodiaceae)(2014) Sigel, Erin MackeyThe Polypodium vulgare complex (Polypodiaceae) comprises a well-studied group of fern taxa whose members are cryptically differentiated morphologically and have generated a confusing and highly reticulate species cluster. Once considered a single species spanning much of northern Eurasia and North America, P. vulgare has been segregated into approximately 17 diploid and polyploid taxa as a result of cytotaxonomic work, hybridization experiments, and isozyme studies conducted during the 20th century. Despite considerable effort, however, the evolutionary relationships among the diploid members of the P. vulgare complex remain poorly resolved, and several taxa, particularly allopolyploids and their diploid progenitors, remain challenging to delineate morphologically due to a dearth of stable diagnostic characters. Furthermore, compared to many well-studied angiosperm reticulate complexes, relatively little is known about the number of independently-derived lineages, distribution, and evolutionary significance of the allopolyploid species that have formed recurrently. This dissertation is an attempt to advance systematic knowledge of the Polypodium vulgare complex and establish it as a "model" system for investigating the evolutionary consequences of allopolyploidy in ferns.
Chapter I presents a diploids-only phylogeny of the P. vulgare complex and related species to test previous hypotheses concerning relationships within Polypodium sensu stricto. Analyses of sequence data from four plastid loci (atpA, rbcL, matK, and trnG-trnR) recovered a monophyletic P. vulgare complex comprising four well-supported clades. The P. vulgare complex is resolved as sister to the Neotropical P. plesiosorum group and these, in turn, are sister to the Asian endemic Pleurosoriopsis makinoi. Divergence time analyses incorporating previously derived age constraints and fossil data provide support for an early Miocene origin for the P. vulgare complex and a late Miocene-Pliocene origin for the four major diploid lineages of the complex, with the majority of extant diploid species diversifying from the late Miocene through the Pleistocene. Finally, node age estimates are used to reassess previous hypotheses, and to propose new hypotheses, about the historical events that shaped the diversity and current geographic distribution of the diploid species of the P. vulgare complex.
Chapter II addresses reported discrepancies regarding the occurrence of Polypodium calirhiza in Mexico. The original paper describing this taxon cited collections from Mexico, but the species was omitted from the recent Pteridophytes of Mexico. Originally treated as a tetraploid cytotype of P. californicum, P. calirhiza now is hypothesized to have arisen through hybridization between P. glycyrrhiza and P. californicum. The allotetraploid can be difficult to distinguish from either of its putative parents, but especially so from P. californicum. These analyses show that a combination of spore length and abaxial rachis scale morphology consistently distinguishes P. calirhiza from P. californicum and confirm that both species occur in Mexico. Although occasionally found growing together in the United States, the two species are strongly allopatric in Mexico, where P. californicum is restricted to coastal regions of the Baja California peninsula and neighboring Pacific islands and P. calirhiza grows at high elevations in central and southern Mexico. The occurrence of P. calirhiza in Oaxaca, Mexico, marks the southernmost extent of the P. vulgare complex in the Western Hemisphere.
Chapter III examines a case of reciprocal allopolyploid origins in the fern Polypodium hesperium and presents it as a natural model system for investigating the evolutionary potential of duplicated genomes. In allopolyploids, reciprocal crosses between the same progenitor species can yield lineages with different uniparentally inherited plastid genomes. While likely common, there are few well-documented examples of such reciprocal origins. Using a combination of uniparentally inherited plastid and biparentally inherited nuclear sequence data, we investigated the distributions and relative ages of reciprocally formed lineages in Polypodium hesperium, an allotetraploid fern that is broadly distributed in western North America. The reciprocally-derived plastid haplotypes of Polypodium hesperium are allopatric, with populations north and south of 42˚ N latitude having different plastid genomes. Biogeographic information and previously estimated ages for the diversification of its diploid progenitors, lends support for middle to late Pleistocene origins of P. hesperium. Several features of Polypodium hesperium make it a particularly promising system for investigating the evolutionary consequences of allopolyploidy. These include reciprocally derived lineages with disjunct geographic distributions, recent time of origin, and extant diploid progenitor lineages.
This dissertation concludes by demonstrating the utility of the allotetraploid Polypodium hesperium for understanding how ferns utilize the genetic diversity imparted by allopolyploidy and recurrent origins. Chapter IV details the use of high-throughput sequencing technologies to generate a reference transcriptome for Polypodium, a genus without preexisting genomic resources, and compare patterns of total and homoeolog-specific gene expression in leaf tissue of reciprocally formed lineages of P. hesperium. Genome-wide expression patterns of total gene expression and homoeolog expression ratios are strikingly similar between the lineages--total gene expression levels mirror those of the diploid progenitor P. amorphum and homoeologs derived from P. amorphum are preferentially expressed. The unprecedented levels of unbalanced expression level dominance and unbalanced homoeolog expression bias found in P. hesperium supports the hypothesis that these phenomena are pervasive consequences of allopolyploidy in plants.
Item Open Access A phylogenetic transform enhances analysis of compositional microbiota data.(Elife, 2017-02-15) Silverman, Justin D; Washburne, Alex D; Mukherjee, Sayan; David, Lawrence ASurveys of microbial communities (microbiota), typically measured as relative abundance of species, have illustrated the importance of these communities in human health and disease. Yet, statistical artifacts commonly plague the analysis of relative abundance data. Here, we introduce the PhILR transform, which incorporates microbial evolutionary models with the isometric log-ratio transform to allow off-the-shelf statistical tools to be safely applied to microbiota surveys. We demonstrate that analyses of community-level structure can be applied to PhILR transformed data with performance on benchmarks rivaling or surpassing standard tools. Additionally, by decomposing distance in the PhILR transformed space, we identified neighboring clades that may have adapted to distinct human body sites. Decomposing variance revealed that covariation of bacterial clades within human body sites increases with phylogenetic relatedness. Together, these findings illustrate how the PhILR transform combines statistical and phylogenetic models to overcome compositional data challenges and enable evolutionary insights relevant to microbial communities.Item Open Access Advances in Bayesian Modeling of Protein Structure Evolution(2018) Larson, GaryThis thesis contributes to a statistical modeling framework for protein sequence and structure evolution. An existing Bayesian model for protein structure evolution is extended in two unique ways. Each of these model extensions addresses an important limitation which has not yet been satisfactorily addressed in the wider literature. These extensions are followed by work regarding inherent statistical bias in models for sequence evolution.
Most available models for protein structure evolution do not model interdependence between the backbone sites of the protein, yet the assumption that the sites evolve independently is known to be false. I argue that ignoring such dependence leads to biased estimation of evolutionary distance between proteins. To mitigate this bias, I express an existing Bayesian model in a generalized form and introduce site-dependence via the generalized model. In the process, I show that the effect of protein structure information on the measure of evolutionary distance can be suppressed by the model formulation, and I further modify the model to help mitigate this problem. In addition to the statistical model itself, I provide computational details and computer code. I modify a well-known bioinformatics algorithm in order to preserve efficient computation under this model. The modified algorithm can be easily understood and used by practitioners familiar with the original algorithm. My approach to modeling dependence is computationally tractable and interpretable with little additional computational burden over the model on which it is based.
The second model expansion allows for evolutionary inference on protein pairs having structural discrepancies attributable to backbone flexion. Thus, the model expansion exposes flexible protein structures to the capabilities of Bayesian protein structure alignment and phylogenetics. Unlike most of the few existing methods that deal with flexible protein structures, our Bayesian flexible alignment model requires no prior knowledge of the presence or absence of flexion points in the protein structure, and uncertainty measures are available for the alignment and other parameters of interest. The model can detect subtle flexion while not overfitting non-flexible protein pairs, and is demonstrated to improve phylogenetic inference in a simulated data setting and in a difficult-to-align set of proteins. The flexible model is a unique addition to the small but growing set of tools available for analysis of flexible protein structure. The ability to perform inference on flexible proteins in a Bayesian framework is likely to be of immediate interest to the structural phylogenetics community.
Finally, I present work related to the study of bias in site-independent models for sequence evolution. In the case of binary sequences, I discuss strategies for theoretical proof of bias and provide various details to that end, including detailing efforts undertaken to produce a site-dependent sequence model with similar properties to the site-dependent structural model introduced in an earlier chapter. I highlight the challenges of theoretical proof for this bias and include miscellaneous related work of general interest to researchers studying dependent sequence models.
Item Open Access Bayesian Modeling for Identifying Selection in B cell Maturation(2023) Tang, TengjieThis thesis focuses on modeling the selection effects on B cell antibody mutations to identify amino acids under strong selection. Site-wise selection coefficients are parameterized by the fitnesses of amino acids. First, we conduct simulation studies to evaluate the accuracy of the Monte Carlo p-value approach for identifying selection for specific amino acid/location combinations. Then, we adopt Bayesian methods to infer location-specific fitness parameters for each amino acid. In particular, we propose the use of a spike-and-slab prior and implement Markov chain Monte Carlo (MCMC) algorithms for posterior sampling. Further simulation studies are conducted to evaluate the performance of the proposed Bayesian methods in inferring fitness parameters and identifying strong selection. The results demonstrate the reliable inference and detection performance of the proposed Bayesian methods. Finally, an example using real antibody sequences is provided. This work can help identify important early mutations in B cell antibodies, which is crucial for developing an effective HIV vaccine.
Item Open Access Bayesian Structural Phylogenetics(2013) Challis, ChristopherThis thesis concerns the use of protein structure to improve phylogenetic inference. There has been growing interest in phylogenetics as the number of available DNA and protein sequences continues to grow rapidly and demand from other scientific fields increases. It is now well understood that phylogenies should be inferred jointly with alignment through use of stochastic evolutionary models. It has not been possible, however, to incorporate protein structure in this framework. Protein structure is more strongly conserved than sequence over long distances, so an important source of information, particularly for alignment, has been left out of analyses.
I present a stochastic process model for the joint evolution of protein primary and tertiary structure, suitable for use in alignment and estimation of phylogeny. Indels arise from a classic Links model and mutations follow a standard substitution matrix, while backbone atoms diffuse in three-dimensional space according to an Ornstein-Uhlenbeck process. The model allows for simultaneous estimation of evolutionary distances, indel rates, structural drift rates, and alignments, while fully accounting for uncertainty. The inclusion of structural information enables pairwise evolutionary distance estimation on time scales not previously attainable with sequence evolution models. Ideally inference should not be performed in a pairwise fashion between proteins, but in a fully Bayesian setting simultaneously estimating the phylogenetic tree, alignment, and model parameters. I extend the initial pairwise model to this framework and explore model variants which improve agreement between sequence and structure information. The model also allows for estimation of heterogeneous rates of structural evolution throughout the tree, identifying groups of proteins structurally evolving at different speeds. In order to explore the posterior over topologies by Markov chain Monte Carlo sampling, I also introduce novel topology + alignment proposals which greatly improve mixing of the underlying Markov chain. I show that the inclusion of structural information reduces both alignment and topology uncertainty. The software is available as plugin to the package StatAlign.
Finally, I also examine limits on statistical inference of phylogeny through sequence information models. These limits arise due to the `cutoff phenomenon,' a term from probability which describes processes which remain far from their equilibrium distribution for some period of time before swiftly transitioning to stationarity. Evolutionary sequence models all exhibit a cutoff; I show how to find the cutoff for specific models and sequences and relate the cutoff explicitly to increased uncertainty in inference of evolutionary distances. I give theoretical results for symmetric models, and demonstrate with simulations that these results apply to more realistic and widespread models as well. This analysis also highlights several drawbacks to common default priors for phylogenetic analysis, I and suggest a more useful class of priors.
Item Open Access EVOLUTION OF THE MATING-TYPE LOCUS AND INSIGHTS INTO SEXUAL REPRODUCTION IN THE CRYPTOCOCCUS SPECIES COMPLEX(2010) Findley, Keisha MoniqueSexual reproduction in fungi is governed by a specialized genomic region called the mating-type locus (MAT). The ascomycetes, the largest phylum of fungi, primarily possess a bipolar mating system while the basidiomycetes, the second largest group, are mostly tetrapolar. The human fungal pathogen and basidiomycetous yeast Cryptococcus neoformans has evolved a bipolar mating system that encodes homeodomain (HD) and pheromone/receptor (P/R) genes. The MAT locus of C. neoformans is unusually large, spans greater than 100 kb, and encodes more than 20 genes. To understand how the pathogenic Cryptococcus species complex evolved this unique bipolar mating system, we investigated the evolution of MAT in closely and distantly related species and discovered an extant sexual cycle in Cryptococcus amylolentus.
Phylogenetic analysis using a six-gene multi-locus sequencing (MLS) approach identified the most closely related species to the pathogenic Cryptococcus species complex that are currently known. The two non-pathogenic sibling species, Tsuchiyaea wingfieldii and Cryptococcus amylolentus, and the more distantly related species Filobasidiella depauperata define the Filobasidiella clade. We also resolved the phylogeny of the species located in the sister clade, Kwoniella. A comprehensive tree dendrogram revealed that the 15 Tremellales species examined suggests a common saprobic ancestor. Moreover, the pathogenic Cryptococcus species have a saprobic origin but later emerged as pathogens. We further characterized the mating-type locus for T. wingfieldii and C. amylolentus by cloning and sequencing two unlinked genomic loci encoding the HD and P/R genes. Interestingly, linked and likely divergently transcribed homologs for SXI1 and SXI2 are present in T. wingfieldii and C. amylolentus, while the P/R alleles contain many genes also found in the MAT locus of the pathogenic Cryptococcus species. Also, hypothetical genes present in C. neoformans MAT are also MAT-linked in both species and indicate a possible translocation event between chromosomes 4 and 5 of C. neoformans. Our analysis of MAT in the sibling species indicates that T. wingfieldii is likely tetrapolar, and the C. amylolentus sequence comparison of the dimorphic SXI1 and SXI2 region and the pheromone receptor, STE3, suggests that C. amylolentus is also tetrapolar. The examination of MAT in these sibling species confirms the model for MAT evolution previously proposed in which this structure in C. neoformans and C. gattii evolved from an ancestral tetrapolar mating system. Moreover, the organization of MAT in these sibling species mirrors key aspects of the proposed intermediates in the evolution of MAT in the pathogenic Cryptococcus species, and for sex chromosomes in plants, animals, and alga in general.
We discovered an extant sexual cycle for C. amylolentus, a species previously thought to be asexual. Matings between two strains of opposite mating-types produce dikaryotic hyphae with fused clamp connections and uni- and bi-nucleate basidiospores. Genotyping of basidiospores using markers linked and unlinked to MAT revealed that genetic exchange (recombination) occurs during the sexual cycle of C. amylolentus, and it is likely that either aneuploids are generated during sex or more than one meiosis event occurs within each basidium. This is in contrast to C. neoformans, where only one meiotic event per basidium has been observed. Uniparental mitochondrial inheritance has also been observed in C. amylolentus progeny; similar to the pathogenic Cryptococcus species, mtDNA is inherited from the C. amylolentus MATa parent. Analysis of sex in C. amylolentus has provided insight into the mechanisms that phylogenetically related fungi employ in orchestrating sexual reproduction.
We also extended our analysis to include the distantly related tetrapolar basidiomycete Tremella mesenterica. We completed comparisons of MAT-specific genes between five strains of T. mesenterica and identified the regions that define its mating-type system. The HD locus is limited to the SXI1- and SXI2-like genes while the P/R locus is defined by STE3, STE12, STE20, and the pheromone gene, tremerogen a-13. Interestingly, many of the genes associated with the MAT locus of the pathogenic Cryptococcus species flank the HD and P/R locus and are not incorporated in MAT in T. mesenterica. The MAT region includes transposons and C. neoformans hypothetical genes also present in T. wingfieldii and C. amylolentus. The mating-type system in T. mesenterica reflects an ancestral intermediate in the evolution of the MAT locus in the pathogenic Cryptococcus species. In conclusion, this study provides an in-depth analysis on the structure, function, and evolution of an unusual mating-type locus with broader implications for the transitions in modes of sexual reproduction in fungi that impact gene flow in populations.
Item Open Access Stem taxa, homoplasy, long lineages, and the phylogenetic position of Dolichocebus(Journal of Human Evolution, 2010-08) Kay, RF; Fleagle, JGItem Open Access Systematics and Ecology of Truffles (Tuber)(2009) Bonito, Gregory MichaelThe truffle genus Tuber (Ascomycota, Pezizales, Tuberaceae) produces underground mushrooms widely sought as edible fungi. Tuber species are distributed throughout Northern hemisphere forests and form obligate ectomycorrhizal symbiosis with trees within the Pinaceae, Fagaceae, Betulaceae, and Juglandaceae.
The transition to a truffle form (from an epigeous form) has occurred independently, multiple times in both the Ascomycetes and Basidiomycetes. One instance has given rise to the Tuberaceae, which is composed entirely of obligate ectomycorrhizal species. Attempts to cultivate European truffle species T. melanosporum, T. aestivum, and T. borchii are underway in North America and other parts of the world and have been met with mixed success.
The overarching goal of my dissertation is to address the systematics, ecology, and biogeography of Tuber within a phylogenetic framework. Multiple loci were sequenced from Tuber ascoma collected worldwide including ectomycorrhizae, though an emphasis was placed on sampling taxon within North American. Maximum likelihood, maximum parsimony, and Bayesian inference were used for phylogenetic reconstructions.
A taxonomic and phylogenetic overview of the family Tuberaceae is presented in Chapter 1. Tuber is resolved as monophyletic. In Chapter 2, through greater taxon sampling including epigeous and hypogeous Helvellaceae outgroups and related South American taxa, a resolved multi-gene phylogeny of the Tuberaceae and putative epigeous ancestor of Tuber is presented. A previously unknown South American lineage that contains both epigeous and hypogeous taxa is resolved as sister to the Tuberaceae. Chapter 3 is focused on issues of cryptic speciation and taxonomy within the Tuber gibbosum clade. The four species resolved in the Gibbosum clade appear to be endemic to the Pacific Northwest and associated primarily with Gymnosperms. Chapter 4 is a meta-analysis of all known Tuber ITS rDNA sequences (e.g. from Genbank and generated from herbarium collections) available at the time. These were placed within the Tuber phylogeny to assess species diversity, long-distance dispersal, and host associations. In total, 120 phylotypes were detected (based on a 96% similarity criterion). Tuber shows high levels of continental endemism. I hypothesize that species shared between continents and having low ITS variability (<1%) are the result of recent human-mediated introduction events. Chapters 5 and 6 are focused on the ectomycorrhizal ecology of the economic truffle T. lyonii, which is native to Eastern and Southern North America. There is a phenomenon of Tuber lyonii fruiting in pecan orchards. Pecans (Carya illinoinensis) are in the Juglandaceae, an understudied ectomycorrhizal plant family. I sampled the ectomycorrhizal communities of pecan orchards (associated with the production of the North American truffle species Tuber lyonii). In Chapter 5 I discuss four Tuber taxa discovered in these pecan orchards, their abundance and haplotype diversity. Chapter 6 examines the ectomycorrhizal communities across the five pecan orchards sampled. I show that multiple Tuber species, including Tuber lyonii, are dominant in the ectomycorrhizal community. Chapters 7 and 8 focus on black truffles in the Melanosporum clade. In Chapter 7 I document that Tuber indicum has been introduced into North America multiple times, and through ectomycorrhizal synthesis I demonstrate that this Asian species can associate readily with angiosperm and gymnosperm hosts endemic to North American. In Chapter 8 I describe a quick and reliable method for the determination of Tuber melanosporum. The method is based on direct PCR and species-specific primers and is very useful for rapid diagnostics. I have adapted this approach for other truffle and mushroom species.
Three major findings emerge from my dissertation research: 1) Tuber is more diverse than previously realized; 2) Tuber exhibits high levels of regional and continental endemism; 3) Taxonomic issues remain in many species complexes worldwide (including the Tuber candidum complex in North America, the Tuber excavatum complex in Europe, the Tuber indicum complex in Asia). Taxonomic challenges also remain regarding species known only from ectomycorrhizal or anamorphic states. The discovery of additional Tuber species is expected as the truffle flora of undersampled regions become better studied and incorporated into the Tuberaceae phylogeny.