Browsing by Author "Uyenoyama, Marcy K"
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item Open Access A Bayesian Approach to Inferring Rates of Selfing and Locus-Specific Mutation.(Genetics, 2015-11) Redelings, Benjamin D; Kumagai, Seiji; Tatarenkov, Andrey; Wang, Liuyang; Sakai, Ann K; Weller, Stephen G; Culley, Theresa M; Avise, John C; Uyenoyama, Marcy KWe present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about reproduction under pure hermaphroditism, gynodioecy, and a model developed to describe the self-fertilizing killifish Kryptolebias marmoratus. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens sampling formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet process prior model. Our sampler is designed to accommodate additional information, including observations pertaining to the sex ratio, the intensity of inbreeding depression, and other aspects of reproduction. It can provide joint posterior distributions for the population-wide proportion of uniparental individuals, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual. Further, estimation of all basic parameters of a given model permits estimation of functions of those parameters, including the proportion of the gene pool contributed by each sex and relative effective numbers.Item Open Access Allele frequency spectra in structured populations: Novel-allele probabilities under the labelled coalescent.(Theoretical population biology, 2020-06) Uyenoyama, Marcy K; Takebayashi, Naoki; Kumagai, SeijiWe address the effect of population structure on key properties of the Ewens sampling formula. We use our previously-introduced inductive method for determining exact allele frequency spectrum (AFS) probabilities under the infinite-allele model of mutation and population structure for samples of arbitrary size. Fundamental to the sampling distribution is the novel-allele probability, the probability that given the pattern of variation in the present sample, the next gene sampled belongs to an as-yet-unobserved allelic class. Unlike the case for panmictic populations, the novel-allele probability depends on the AFS of the present sample. We derive a recursion that directly provides the marginal novel-allele probability across AFSs, obviating the need first to determine the probability of each AFS. Our explorations suggest that the marginal novel-allele probability tends to be greater for initial samples comprising fewer alleles and for sampling configurations in which the next-observed gene derives from a deme different from that of the majority of the present sample. Comparison to the efficient importance sampling proposals developed by De Iorio and Griffiths and colleagues indicates that their approximation for the novel-allele probability generally agrees with the true marginal, although it may tend to overestimate the marginal in cases in which the novel-allele probability is high and migration rates are low.Item Open Access Ancestral population genomics: the coalescent hidden Markov model approach.(Genetics, 2009-09) Dutheil, Julien Y; Ganapathy, Ganesh; Hobolth, Asger; Mailund, Thomas; Uyenoyama, Marcy K; Schierup, Mikkel HWith incomplete lineage sorting (ILS), the genealogy of closely related species differs along their genomes. The amount of ILS depends on population parameters such as the ancestral effective population sizes and the recombination rate, but also on the number of generations between speciation events. We use a hidden Markov model parameterized according to coalescent theory to infer the genealogy along a four-species genome alignment of closely related species and estimate population parameters. We analyze a basic, panmictic demographic model and study its properties using an extensive set of coalescent simulations. We assess the effect of the model assumptions and demonstrate that the Markov property provides a good approximation to the ancestral recombination graph. Using a too restricted set of possible genealogies, necessary to reduce the computational load, can bias parameter estimates. We propose a simple correction for this bias and suggest directions for future extensions of the model. We show that the patterns of ILS along a sequence alignment can be recovered efficiently together with the ancestral recombination rate. Finally, we introduce an extension of the basic model that allows for mutation rate heterogeneity and reanalyze human-chimpanzee-gorilla-orangutan alignments, using the new models. We expect that this framework will prove useful for population genomics and provide exciting insights into genome evolution.Item Open Access Bayesian co-estimation of selfing rate and locus-specific mutation rates for a partially selfing population(2017-07-02) Redelings, Benjamin D; Kumagai, Seiji; Wang, Liuyang; Tatarenkov, Andrey; Sakai, Ann K; Weller, Stephen G; Culley, Theresa M; Avise, John C; Uyenoyama, Marcy KWe present a Bayesian method for characterizing the mating system of populations reproducing through a mixture of self-fertilization and random outcrossing. Our method uses patterns of genetic variation across the genome as a basis for inference about pure hermaphroditism, androdioecy, and gynodioecy. We extend the standard coalescence model to accommodate these mating systems, accounting explicitly for multilocus identity disequilibrium, inbreeding depression, and variation in fertility among mating types. We incorporate the Ewens Sampling Formula (ESF) under the infinite-alleles model of mutation to obtain a novel expression for the likelihood of mating system parameters. Our Markov chain Monte Carlo (MCMC) algorithm assigns locus-specific mutation rates, drawn from a common mutation rate distribution that is itself estimated from the data using a Dirichlet Process Prior (DPP) model. Among the parameters jointly inferred are the population-wide rate of self-fertilization, locus-specific mutation rates, and the number of generations since the most recent outcrossing event for each sampled individual.Item Open Access Effects of polymorphism for locally adapted genes on rates of neutral introgression in structured populations.(Theoretical population biology, 2011-09) Fusco, Diana; Uyenoyama, Marcy KAdaptation to local conditions within demes balanced by migration can maintain polymorphisms for variants that reduce fitness in certain ecological contexts. Here, we address the effects of such polymorphisms on the rate of introgression of neutral marker genes, possibly genetically linked to targets of selection. Barriers to neutral gene flow are expected to increase with linkage to targets of local selection and with differences between demes in the frequencies of locally adapted alleles. This expectation is borne out under purifying and disruptive selection, regimes that promote monomorphism within demes. In contrast, overdominance within demes induces minimal barriers to neutral introgression even in the face of very large differences between demes in the frequencies of locally adapted alleles. Further, segregation distortion, a phenomenon observed in a number of interspecific hybrids, can in fact promote transmission by migrants to future generations at rates exceeding those of residents.Item Open Access Evolution of the sex ratio and effective number under gynodioecy and androdioecy.(Theoretical population biology, 2017-12) Uyenoyama, Marcy K; Takebayashi, NaokiWe address the evolution of effective number of individuals under androdioecy and gynodioecy. We analyze dynamic models of autosomal modifiers of weak effect on sex expression. In our zygote control models, the sex expressed by a zygote depends on its own genotype, while in our maternal control models, it depends on the genotype of its maternal parent. Our analysis unifies full multi-dimensional local stability analysis with the Li-Price equation, which for all its heuristic appeal, describes evolutionary change over a single generation. We define a point in the neighborhood of a fixation state from which a single-generation step indicates the asymptotic behavior of the frequency of a modifier allele initiated at an arbitrary point near the fixation state. A concept of heritability appropriate for the evolutionary modification of sex emerges from the Li-Priceframework. We incorporate our theoretical analysis into our previously-developed Bayesian inference framework to develop a new method for inferring the viability of gonochores (males or females) relative to hermaphrodites. Applying this approach to microsatellite data derived from natural populations of the gynodioecious plant Schiedea salicaria and the androdioecious killifish Kryptolebias marmoratus, we find that while female and hermaphrodite S. salicaria appear to have similar viabilities, male K. marmoratus appear to survive to reproductive age at less than half the rate of hermaphrodites.Item Open Access Genealogical histories in structured populations.(Theoretical population biology, 2015-06) Kumagai, Seiji; Uyenoyama, Marcy KIn genealogies of genes sampled from structured populations, lineages coalesce at rates dependent on the states of the lineages. For migration and coalescence events occurring on comparable time scales, for example, only lineages residing in the same deme of a geographically subdivided population can have descended from a common ancestor in the immediately preceding generation. Here, we explore aspects of genealogical structure in a population comprising two demes, between which migration may occur. We use generating functions to obtain exact densities and moments of coalescence time, number of mutations, total tree length, and age of the most recent common ancestor of the sample. We describe qualitative features of the distribution of gene genealogies, including factors that influence the geographical location of the most recent common ancestor and departures of the distribution of internode lengths from exponential.Item Open Access Heterogeneity in neutral divergence across genomic regions induced by sex-specific hybrid incompatibility(2012) Kumagai, Seiji; Uyenoyama, Marcy KItem Open Access Importance sampling for the infinite sites model.(Statistical applications in genetics and molecular biology, 2008-01) Hobolth, Asger; Uyenoyama, Marcy K; Wiuf, CarstenImportance sampling or Markov Chain Monte Carlo sampling is required for state-of-the-art statistical analysis of population genetics data. The applicability of these sampling-based inference techniques depends crucially on the proposal distribution. In this paper, we discuss importance sampling for the infinite sites model. The infinite sites assumption is attractive because it constraints the number of possible genealogies, thereby allowing for the analysis of larger data sets. We recall the Griffiths-Tavaré and Stephens-Donnelly proposals and emphasize the relation between the latter proposal and exact sampling from the infinite alleles model. We also introduce a new proposal that takes knowledge of the ancestral state into account. The new proposal is derived from a new result on exact sampling from a single site. The methods are illustrated on simulated data sets and the data considered in Griffiths and Tavaré (1994).Item Open Access Inductive determination of allele frequency spectrum probabilities in structured populations.(Theoretical population biology, 2019-10) Uyenoyama, Marcy K; Takebayashi, Naoki; Kumagai, SeijiWe present a method for inductively determining exact allele frequency spectrum (AFS) probabilities for samples derived from a population comprising two demes under the infinite-allele model of mutation. This method builds on a labeled coalescent argument to extend the Ewens sampling formula (ESF) to structured populations. A key departure from the panmictic case is that the AFS conditioned on the number of alleles in the sample is no longer independent of the scaled mutation rate (θ). In particular, biallelic site frequency spectra, widely-used in explorations of genome-wide patterns of variation, depend on the mutation rate in structured populations. Variation in the rate of substitution across loci and through time may contribute to apparent distortions of site frequency spectra exhibited by samples derived from structured populations.Item Open Access Likelihoods from summary statistics: recent divergence between species.(Genetics, 2005-11) Leman, Scotland C; Chen, Yuguo; Stajich, Jason E; Noor, Mohamed AF; Uyenoyama, Marcy KWe describe an importance-sampling method for approximating likelihoods of population parameters based on multiple summary statistics. In this first application, we address the demographic history of closely related members of the Drosophila pseudoobscura group. We base the maximum-likelihood estimation of the time since speciation and the effective population sizes of the extant and ancestral populations on the pattern of nucleotide variation at DPS2002, a noncoding region tightly linked to a paracentric inversion that strongly contributes to reproductive isolation. Consideration of summary statistics rather than entire nucleotide sequences permits a compact description of the genealogy of the sample. We use importance sampling first to propose a genealogical and mutational history consistent with the observed array of summary statistics and then to correct the likelihood with the exact probability of the history determined from a system of recursions. Analysis of a subset of the data, for which recursive computation of the exact likelihood was feasible, indicated close agreement between the approximate and exact likelihoods. Our results for the complete data set also compare well with those obtained through Metropolis-Hastings sampling of fully resolved genealogies of entire nucleotide sequences.Item Open Access Sex-specific incompatibility generates locus-specific rates of introgression between species.(Genetics, 2011-09) Fusco, Diana; Uyenoyama, Marcy KDisruption of interactions among ensembles of epistatic loci has been shown to contribute to reproductive isolation among various animal and plant species. Under the Bateson-Dobzhansky-Muller model, such interspecific incompatibility arises as a by-product of genetic divergence in each species, and the Orr-Turelli model indicates that the number of loci involved in incompatible interactions may "snowball" over time. We address the combined effect of multiple incompatibility loci on the rate of introgression at neutral marker loci across the genome. Our analysis extends previous work by accommodating sex specificity: differences between the sexes in the expression of incompatibility, in rates of crossing over between neutral markers and incompatibility loci, and in transmission of markers or incompatibility factors. We show that the evolutionary process at neutral markers in a genome subject to incompatibility selection is well approximated by a purely neutral process with migration rates appropriately scaled to reflect the influence of selection targeted to incompatibility factors. We confirm that in the absence of sex specificity and functional epistasis among incompatibility factors, the barrier to introgression induced by multiple incompatibility factors corresponds to the product of the barriers induced by the factors individually. A new finding is that barriers to introgression due to sex-specific incompatibility depart in general from multiplicativity. Our partitioning of variation in relative reproductive rate suggests that such departures derive from associations between sex and incompatibility and between sex and neutral markers. Concordant sex-specific incompatibility (for example, greater impairment of male hybrids or longer map lengths in females) induces lower barriers (higher rates of introgression) than expected under multiplicativity, and discordant sex-specific incompatibility induces higher barriers.Item Open Access Site frequency spectra from genomic SNP surveys.(Theoretical population biology, 2009-06) Ganapathy, Ganeshkumar; Uyenoyama, Marcy KGenomic survey data now permit an unprecedented level of sensitivity in the detection of departures from canonical evolutionary models, including expansions in population size and selective sweeps. Here, we examine the effects of seemingly subtle differences among sampling distributions on goodness of fit analyses of site frequency spectra constructed from single nucleotide polymorphisms. Conditioning on the observation of exactly two alleles in a random sample results in a site frequency spectrum that is independent of the scaled rate of neutral substitution (theta). Other sampling distributions, including conditioning on a single mutational event in the sample genealogy or randomly selecting a single mutation from a genealogy with multiple mutations, have distinct site frequency spectra that show highly significant departures from the predictions of the biallelic model. Some aspects of data filtering may contribute to significant departures of site frequency spectra from expectation, apart from any violation of the standard neutral model.Item Open Access The evolutionary forest algorithm.(Bioinformatics (Oxford, England), 2007-08) Leman, Scotland C; Uyenoyama, Marcy K; Lavine, Michael; Chen, YuguoMotivation
Gene genealogies offer a powerful context for inferences about the evolutionary process based on presently segregating DNA variation. In many cases, it is the distribution of population parameters, marginalized over the effectively infinite-dimensional tree space, that is of interest. Our evolutionary forest (EF) algorithm uses Monte Carlo methods to generate posterior distributions of population parameters. A novel feature is the updating of parameter values based on a probability measure defined on an ensemble of histories (a forest of genealogies), rather than a single tree.Results
The EF algorithm generates samples from the correct marginal distribution of population parameters. Applied to actual data from closely related fruit fly species, it rapidly converged to posterior distributions that closely approximated the exact posteriors generated through massive computational effort. Applied to simulated data, it generated credible intervals that covered the actual parameter values in accordance with the nominal probabilities.Availability
A C++ implementation of this method is freely accessible at http://www.isds.duke.edu/~scl13