# Browsing by Subject "MCMC"

###### Results Per Page

###### Sort Options

Item Open Access A Bayesian approach for individual-level drug benefit-risk assessment.(Statistics in medicine, 2019-07) Li, Kan; Luo, Sheng; Yuan, Sammy; Mt-Isa, ShahrulIn existing benefit-risk assessment (BRA) methods, benefit and risk criteria are usually identified and defined separately based on aggregated clinical data and therefore ignore the individual-level differences as well as the association among the criteria. We proposed a Bayesian multicriteria decision-making method for BRA of drugs using individual-level data. We used a multidimensional latent trait model to account for the heterogeneity of treatment effects with latent variables introducing the dependencies among outcomes. We then applied the stochastic multicriteria acceptability analysis approach for BRA incorporating imprecise and heterogeneous patient preference information. We adopted an efficient Markov chain Monte Carlo algorithm when implementing the proposed method. We applied our method to a case study to illustrate how individual-level benefit-risk profiles could inform decision-making.Item Open Access A Bayesian Hierarchical Model with SNP-level Functional Priors Applied to a Pathway-wide Association Study.(2010) Huang, WeiziTremendous effort has been put into study of the etiology of complex

diseases including the breast cancer, type 2 diabetes,

cardiovascular diseases, and prostate cancers. Despite large numbers of reported disease-associated loci,

few associated loci have been replicated, and some true associations

does not belong to the group of the most significant loci

reported to be associated. We built a Bayesian hierarchical model incorporated

with SNP-level functional data that can help identify associated SNPs in pathway-wide association studies.

We applied the model to an association study for the serous invasive ovarian cancer based on the DNA repair and apoptosis pathways. We found that using our model, blocks of SNPs located in regions enriched for missense SNPs or gene inversions were more likely to be identified as candidates of the association.

Item Open Access Bayesian Computation for High-Dimensional Continuous & Sparse Count Data(2018) Wang, YeProbabilistic modeling of multidimensional data is a common problem in practice. When the data is continuous, one common approach is to suppose that the observed data are close to a lower-dimensional smooth manifold. There are a rich variety of manifold learning methods available, which allow mapping of data points to the manifold. However, there is a clear lack of probabilistic methods that allow learning of the manifold along with the generative distribution of the observed data. The best attempt is the Gaussian process latent variable model (GP-LVM), but identifiability issues lead to poor performance. We solve these issues by proposing a novel Coulomb repulsive process (Corp) for locations of points on the manifold, inspired by physical models of electrostatic interactions among particles. Combining this process with a GP prior for the mapping function yields a novel electrostatic GP (electroGP) process.

Another popular approach is to suppose that the observed data are closed to one or a union of lower-dimensional linear subspaces. However, popular methods such as probabilistic principal component analysis scale poorly computationally. We introduce a novel empirical Bayesian method that we term geometric density estimation (GEODE), which assumes the data is centered near a low-dimensional linear subspace. We show that, with mild assumptions on the prior, the subspace spanned by the principal axes of the data maximizes the posterior mode. Hence, leveraged on the geometric information of the data, GEODE easily scales to massive dimensional problems. It is also capable of learning the intrinsic dimension via a novel shrinkage prior. Finally we mix GEODE across a dyadic clustering tree to account for nonlinear cases.

When data is discrete, a common strategy is to define a generalized linear model (GLM) for each variable, with dependence in the different variables induced through including multivariate latent variables in the GLMs. The Bayesian inference for these models usually

rely on data augmented Markov chain Monte Carlo (DA-MCMC) method, which has a provable slow mixing rate when the data is imbalanced. For more scalable inference, we proposes Bayesian mosaic, a parallelizable composite posterior, for scalable Bayesian inference on a subclass of the multivariate discrete data models. Sampling is embarrassingly parallel since Bayesian mosaic is a multiplication of component posteriors that can be independently sampled from. Analogous to composite likelihood methods, these component posteriors are based on univariate or bivariate marginal densities. Utilizing the fact that the score functions of these densities are unbiased, we have shown that Bayesian mosaic is consistent and asymptotically normal under mild conditions. Since the evaluation of univariate or bivariate marginal densities could be done via numerical integration, sampling from Bayesian mosaic completely bypasses the traditional data augmented Markov chain Monte Carlo (DA-MCMC) method. Moreover, we have shown that sampling from Bayesian mosaic also has better scalability to large sample size than DA-MCMC.

The performance of the proposed methods and models will be demonstrated via both simulation studies and real world applications.

Item Open Access Bayesian Models for Combining Information from Multiple Sources(2022) Tang, JiuruiThis dissertation develops Bayesian methods for combining information from multiple sources. I focus on developing Bayesian bipartite modeling for simultaneous regression and record linkage, as well as leveraging auxiliary information on marginal distributions for handling item and unit nonresponse and accounting for survey weights.

The first contribution is a Bayesian hierarchical model that allows analysts to perform simultaneous linear regression and probabilistic record linkage. This model allows analysts to leverage relationships among the variables to improve linkage quality. It also potentially offers more accurate estimates of regression parameters compared to approaches that use a two-step process, i.e., link the records first, then estimate the linear regression on the linked data. I propose and evaluate three Markov chain Monte Carlo algorithms for implementing the Bayesian model.

The second contribution is examining the performance of an approach for generating multiple imputation data sets for item nonresponse. The method allows analysts to use auxliary information. I examine the approach via simulation studies with Poisson sampling. I also give suggestions on parameter tuning.

The third contribution is a model-based imputation approach that can handle both item and unit nonresponse while accounting for auxiliary margins and survey weights. This approach includes an innovative combination of a pattern mixture model for unit nonresponse and a selection model for item nonresponse. Both unit and item nonresponse can be nonignorable. I demonstrate the model performance with simulation studies under the situations when the design weights for unit respondents are known and when they are not. I show that the model can generate multiple imputation data sets that both retain the relationship among survey variables and yield design-based estimates that agree with auxiliary margins. I use the model to analyze voter turnout overall and across subgroups in North Carolina, with data from the 2018 Current Population Survey.

Item Open Access Comparison of Bayesian Inference Methods for Probit Network Models(2021) Shen, YueMingThis thesis explores and compares Bayesian inference procedures for probit network models. Network data typically exhibit high dyadic correlation due to reciprocity. For binary network data, presence of dyadic correlation often leads to inefficiency of a basic implementation of Markov chain Monte Carlo (MCMC). We first explore variational inference as a fast approximation to the posterior distribution. Aware of its insufficiency in quantifying posterior uncertainties, we propose an alternative MCMC algorithm which is more efficient and accurate. In particular, we propose to update the dyadic correlation parameter $\rho$ using the marginal likelihood unconditional of the latent relations $Z$. This reduces autocorrelations in the posterior samples of $\rho$ and hence improves mixing. Simulation study and real data examples are provided to compare the performance of these Bayesian inference methods.

Item Open Access MCMC Sampling Geospatial Partitions for Linear Models(2021) Wyse, Evan TGeospatial statistical approaches must frequently confront the problem of correctlypartitioning a group of geographical sub-units, such as counties, states, or precincts,into larger blocks which share information. Since the space of potential partitions isquite large, sophisticated approaches are required, particularly when this partitioninginteracts with other parts of a larger model, as is frequent with Bayesian inference.Authors such as Balocchi et al. (2021) provide stochastic search algorithms whichprovide certain theoretical guarantees about this partition in the context of Bayesianmodel averaging. We borrow tools from Herschlag et al. (2020) to examine a potentialapproach to sampling these clusters efficiently using a Markov Chain Monte Carlo(MCMC) approach.

Item Open Access Modeling Time-Varying Networks with Applications to Neural Flow and Genetic Regulation(2010) Robinson, Joshua WestlyMany biological processes are effectively modeled as networks, but a frequent assumption is that these networks do not change during data collection. However, that assumption does not hold for many phenomena, such as neural growth during learning or changes in genetic regulation during cell differentiation. Approaches are needed that explicitly model networks as they change in time and that characterize the nature of those changes.

In this work, we develop a new class of graphical models in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. We first present the model, explain how to derive it from Bayesian networks, and develop an efficient MCMC sampling algorithm that easily generalizes under varying levels of uncertainty about the data generation process. We then characterize the nature of evolving networks in several biological datasets.

We initially focus on learning how neural information flow networks change in songbirds with implanted electrodes. We characterize how they change in response to different sound stimuli and during the process of habituation. We continue to explore the neurobiology of songbirds by identifying changes in neural information flow in another habituation experiment using fMRI data. Finally, we briefly examine evolving genetic regulatory networks involved in Drosophila muscle differentiation during development.

We conclude by suggesting new experimental directions and statistical extensions to the model for predicting novel neural flow results.

Item Open Access Stratified MCMC Sampling of non-Reversible Dynamics(2020) Earle, Gabriel JosephThe study of stratified sampling is of interest in systems which canbe solved accurately on small scales, or which depend heavily on rare transitions of particles from one subspace to another. We present a new form of stratified MCMC algorithm built with non-reversible stochastic dynamics in mind. The method has potential usefulness in that many systems of interest are non-reversible, and can also benefit from stratification at the same time. It may also be useful for sampling on complex manifolds, and hence manifold learning. Our method is a generalization of previous stratified or nested sampling schemes which extend QSD sampling schemes. It can also be viewed as a generalization of the exact milestoning method previously studied by D. Aristoff. The primary advantages of our new results over such previous studies are generalization to non-reversible processes, expressions for the convergence rate in terms of the process's behavior within each stratum and large scale behavior between strata, and less restrictive assumptions for convergence. We show that the algorithm has a unique fixed point which corresponds to the invariant measure of the process without stratification. We will show how the speeds of two versions of the new algorithm, one with an extra eigenvalue problem step and one without, relate to the mixing rate of a discrete process on the strata, and the mixing probability of the process being sampled within each stratum. The eigenvalue problem version also relates to local and global perturbation results of discrete Markov chains, such as those given by J. Weare. Finally, we will propose a way to relate the accuracy of finite approximations of a process using our stratified scheme to its expected exit times from each stratum and its approximation of the true process's generator, by means of a Poisson equation argument.