<p>Identifying a lower-dimensional latent space for representation of high-dimensional observations is of significant importance in numerous biomedical and machine learning applications. In many such applications, it is ...
<p>In this thesis, we develop some Bayesian sparse learning methods for high dimensional data analysis. There are two important topics that are related to the idea of sparse learning -- variable selection and factor analysis. ...
<p>Cell division is a biological process fundamental to all life. One aspect of the process that is still under investigation is whether or not cells in a lineage are correlated in their cell-cycle progression. Data on ...
<p>This thesis concerns the use of protein structure to improve phylogenetic inference. There has been growing interest in phylogenetics as the number of available DNA and protein sequences continues to grow rapidly and ...
<p>Clustering methods are designed to separate heterogeneous data into groups of similar objects such that objects within a group are similar, and objects in different groups are dissimilar. From the machine learning ...
<p>I consider the problem of clustering multiple related groups of data. My approach entails mixture models in the context of hierarchical Dirichlet processes, focusing on their ability to perform inference on the unknown ...
<p>Our interest is the risk assessment of rare natural hazards, such as</p><p>large volcanic pyroclastic flows. Since catastrophic consequences of</p><p>volcanic flows are rare events, our analysis benefits from the use ...
<p>This thesis develops Bayesian latent class models for nested categorical data, e.g., people nested in households. The applications focus on generating synthetic microdata for public release and imputing missing data for ...
<p>If a distant star happens to host an orbiting exoplanet, then that planet will exert a gravitational influence on the star that may be detectable from the earth by the apparent ``stellar wobble''---regular, periodic ...
<p>In cargo logistics, a key performance measure is transport risk, defined as the deviation of the actual arrival time from the planned arrival time. Neither earliness nor tardiness is desirable for the customer and freight ...
<p>An important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because ...
<p>This thesis presents a new framework for constituting a group of dependent completely random measures, unifying and extending methods in the literature. The dependent completely random measures are constructed based on ...
We formulate a novel approach to infer conditional independence models or Markov structure of a multivariate distribution. Specifically, our objective is to place informative prior distributions over graphs (decomposable ...
<p>The study of the effect of the environment (e.g., climate and land use) on disease typically relies on aggregate disease data collected by the government surveillance network. The usual approach to analyze these data, ...
<p>Integral projection model (IPM) is an important tool to study population dynamics and demography in ecology. Traditional IPMs are handled first with a fitting stage at individual-level transitions, then with a projection ...
<p>In many spatio-temporal applications a vector of covariates is measured alongside a spatio-temporal response. In such cases, the purpose of the statistical model is to quantify the change, in expectation or otherwise, ...
<p>Social networks represent two different facets of social life: (1) stable paths for diffusion, or the spread of something through a connected population, and (2) random draws from an underlying social space, which ...
<p>Mixture modeling of continuous data is an extremely effective and popular method for density estimation and clustering. However as the size of the data grows, both in terms of dimension and number of observations, many ...
<p>Most panel surveys are subject to missing data problems caused by panel attrition. The Additive Non-ignorable (AN) model proposed by Hirano et al. (2001) utilizes refreshment samples in panel surveys to impute missing ...