Browsing by Subject "Mixture modeling"
- Results Per Page
- Sort Options
Item Open Access Approximate Bayesian Computation for Complex Dynamic Systems(2013) Bonassi, Fernando VieiraThis thesis focuses on the development of ABC methods for statistical modeling in complex dynamic systems. Motivated by real applications in biology, I propose computational strategies for Bayesian inference in contexts where standard Monte Carlo methods cannot be directly applied due to the high complexity of the dynamic model and/or data limitations.
Chapter 2 focuses on stochastic bionetwork models applied to data generated from the marginal distribution of a few network nodes at snapshots in time. I present a Bayesian computational strategy, coupled with an approach to summarizing and numerically characterizing biological phenotypes that are represented in terms of the resulting sample distributions of cellular markers. ABC and mixture modeling are used to define the approach to linking mechanistic mathematical models of network dynamics to snapshot data, using a toggle switch example integrating simulated and real data as context.
Chapter 3 focuses on the application of the methodology presented in Chapter 2 to the Myc/Rb/E2F network. This network involves a relatively high number of parameters and stochastic equations in the model specification and, thus, is substantially more complex than the toggle switch example. The analysis of the Myc/Rb/E2F network is performed with simulated and real data. I demonstrate that the proposed method can indicate which parameters can be learned about using the marginal data.
In Chapter 4, I present an ABC SMC method that uses data-based adaptive weights. This easily implemented and computationally trivial extension of ABC SMC can substantially improve acceptance rates. This is demonstrated through a series of examples with simulated and real data, including the toggle switch example. Theoretical justification is also provided to explain why this method is expected to improve the effectiveness of ABC SMC.
In Chapter 5, I present an integrated Bayesian computational strategy for fitting complex dynamic models to sparse time-series data. This is applied to experimental data from an immunization response study with Indian Rhesus macaques. The computational strategy consists of two stages: first, MCMC is implemented based on simplified sampling steps, and then, the resulting approximate output is used to generate a proposal distribution for the parameters that results in an efficient ABC procedure. The incorporation of ABC as a correction tool improves the model fit, as is demonstrated through predictive posterior analysis on the data sets of the study.
Chapter 6 presents additional discussion and comments on potential future research directions.
Item Open Access Ecological Modeling via Bayesian Nonparametric Species Sampling Priors(2023) Zito, AlessandroSpecies sampling models are a broad class of discrete Bayesian nonparametric priors that model the sequential appearance of distinct tags, called species or clusters, in a sequence of labeled objects. Over the last 50 years, species sampling priors have found much success in a variety of settings, including clustering and density estimation. However, despite the rich theoretical and methodological developments, these models have rarely been used as tools by applied ecologists, even though their primary investigation often involves the modeling of actual species. This dissertation aims at partially filling this gap by elucidating how species sampling models can be useful to scientists and practitioners in the ecological field. Our emphasis is on clustering and on species discovery properties linked to species sampling models. In particular, Chapter 2 illustrates how a Dirichlet process mixture model with a random precision parameter leads to greater robustness when inferring the number of clusters, or communities, in a given population. We specifically introduce a novel prior for the precision, called Stirling-gamma distribution, which allows for transparent elicitation supported by theoretical findings. We illustrate its advantages when detecting communities in a colony of ant workers. Chapter 3 presents a general Bayesian framework to model accumulation curves, which summarize the sequential discoveries of distinct species over time. This work is inspired by traditional species sampling models such as the Dirichlet process and the Pitman--Yor process. By modeling the discovery probability as a survival function of some latent variables, a flexible specification that can account for both finite and infinite species richness is developed. We apply our model to a large fungal biodiversity study from Finland. Finally, Chapter 4 presents a novel Bayesian nonparametric taxonomic classifier called BayesANT. Here, the goal is to predict the taxonomy of DNA sequences sampled from the environment. The difficulty of such a task is that the vast majority of species do not have a reference barcode or are yet unknown to science. Hence, species novelty needs to be accounted for when doing classification. BayesANT builds upon Dirichlet-multinomial kernels to model DNA sequences, and upon species sampling models to account for such potential novelty. We show how it attains excellent classification performances, especially when the true taxa of the test sequences are not observed in the training set.All methods presented in this dissertation are freely available as R packages. Our hope is that these contributions will pave the way for future utilization of Bayesian nonparametric methods in applied ecological analyses.
Item Open Access Linear Subspace and Manifold Learning via Extrinsic Geometry(2015) St Thomas, Brian StephenIn the last few decades, data analysis techniques have had to expand to handle large sets of data with complicated structure. This includes identifying low dimensional structure in high dimensional data, analyzing shape and image data, and learning from or classifying large corpora of text documents. Common Bayesian and Machine Learning techniques rely on using the unique geometry of these data types, however departing from Euclidean geometry can result in both theoretical and practical complications. Bayesian nonparametric approaches can be particularly challenging in these areas.
This dissertation proposes a novel approach to these challenges by working with convenient embeddings of the manifold valued parameters of interest, commonly making use of an extrinsic distance or measure on the manifold. Carefully selected extrinsic distances are shown to reduce the computational cost and to increase accuracy of inference. The embeddings are also used to yield straight forward derivations for nonparametric techniques. The methods developed are applied to subspace learning in dimension reduction problems, planar shapes, shape constrained regression, and text analysis.
Item Open Access Nonparametric Bayesian Dictionary Learning and Count and Mixture Modeling(2013) Zhou, MingyuanAnalyzing the ever-increasing data of unprecedented scale, dimensionality, diversity, and complexity poses considerable challenges to conventional approaches of statistical modeling. Bayesian nonparametrics constitute a promising research direction, in that such techniques can fit the data with a model that can grow with complexity to match the data. In this dissertation we consider nonparametric Bayesian modeling with completely random measures, a family of pure-jump stochastic processes with nonnegative increments. In particular, we study dictionary learning for sparse image representation using the beta process and the dependent hierarchical beta process, and we present the negative binomial process, a novel nonparametric Bayesian prior that unites the seemingly disjoint problems of count and mixture modeling. We show a wide variety of successful applications of our nonparametric Bayesian latent variable models to real problems in science and engineering, including count modeling, text analysis, image processing, compressive sensing, and computer vision.
Item Open Access Some Recent Advances in Non- and Semiparametric Bayesian Modeling with Copulas, Mixtures, and Latent Variables(2013) Murray, JaredThis thesis develops flexible non- and semiparametric Bayesian models for mixed continuous, ordered and unordered categorical data. These methods have a range of possible applications; the applications considered in this thesis are drawn primarily from the social sciences, where multivariate, heterogeneous datasets with complex dependence and missing observations are the norm.
The first contribution is an extension of the Gaussian factor model to Gaussian copula factor models, which accommodate continuous and ordinal data with unspecified marginal distributions. I describe how this model is the most natural extension of the Gaussian factor model, preserving its essential dependence structure and the interpretability of factor loadings and the latent variables. I adopt an approximate likelihood for posterior inference and prove that, if the Gaussian copula model is true, the approximate posterior distribution of the copula correlation matrix asymptotically converges to the correct parameter under nearly any marginal distributions. I demonstrate with simulations that this method is both robust and efficient, and illustrate its use in an application from political science.
The second contribution is a novel nonparametric hierarchical mixture model for continuous, ordered and unordered categorical data. The model includes a hierarchical prior used to couple component indices of two separate models, which are also linked by local multivariate regressions. This structure effectively overcomes the limitations of existing mixture models for mixed data, namely the overly strong local independence assumptions. In the proposed model local independence is replaced by local conditional independence, so that the induced model is able to more readily adapt to structure in the data. I demonstrate the utility of this model as a default engine for multiple imputation of mixed data in a large repeated-sampling study using data from the Survey of Income and Participation. I show that it improves substantially on its most popular competitor, multiple imputation by chained equations (MICE), while enjoying certain theoretical properties that MICE lacks.
The third contribution is a latent variable model for density regression. Most existing density regression models are quite flexible but somewhat cumbersome to specify and fit, particularly when the regressors are a combination of continuous and categorical variables. The majority of these methods rely on extensions of infinite discrete mixture models to incorporate covariate dependence in mixture weights, atoms or both. I take a fundamentally different approach, introducing a continuous latent variable which depends on covariates through a parametric regression. In turn, the observed response depends on the latent variable through an unknown function. I demonstrate that a spline prior for the unknown function is quite effective relative to Dirichlet Process mixture models in density estimation settings (i.e., without covariates) even though these Dirichlet process mixtures have better theoretical properties asymptotically. The spline formulation enjoys a number of computational advantages over more flexible priors on functions. Finally, I demonstrate the utility of this model in regression applications using a dataset on U.S. wages from the Census Bureau, where I estimate the return to schooling as a smooth function of the quantile index.