Browsing by Subject "Factor analysis"
Results Per Page
Sort Options
Item Open Access Bayesian Gaussian Copula Factor Models for Mixed Data.(J Am Stat Assoc, 2013-06-01) Murray, Jared S; Dunson, David B; Carin, Lawrence; Lucas, Joseph EGaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.Item Open Access Bayesian Modeling and Computation for Mixed Data(2012) Cui, KaiMultivariate or high-dimensional data with mixed types are ubiquitous in many fields of studies, including science, engineering, social science, finance, health and medicine, and joint analysis of such data entails both statistical models flexible enough to accommodate them and novel methodologies for computationally efficient inference. Such joint analysis is potentially advantageous in many statistical and practical aspects, including shared information, dimensional reduction, efficiency gains, increased power and better control of error rates.
This thesis mainly focuses on two types of mixed data: (i) mixed discrete and continuous outcomes, especially in a dynamic setting; and (ii) multivariate or high dimensional continuous data with potential non-normality, where each dimension may have different degrees of skewness and tail-behaviors. Flexible Bayesian models are developed to jointly model these types of data, with a particular interest in exploring and utilizing the factor models framework. Much emphasis has also been placed on the ability to scale the statistical approaches and computation efficiently up to problems with long mixed time series or increasingly high-dimensional heavy-tailed and skewed data.
To this end, in Chapter 1, we start with reviewing the mixed data challenges. We start developing generalized dynamic factor models for mixed-measurement time series in Chapter 2. The framework allows mixed scale measurements in different time series, with the different measurements having distributions in the exponential family conditional on time-specific dynamic latent factors. Efficient computational algorithms for Bayesian inference are developed that can be easily extended to long time series. Chapter 3 focuses on the problem of jointly modeling of high-dimensional data with potential non-normality, where the mixed skewness and/or tail-behaviors in different dimensions are accurately captured via the proposed heavy-tailed and skewed factor models. Chapter 4 further explores the properties and efficient Bayesian inference for the generalized semiparametric Gaussian variance-mean mixtures family, and introduce it as a potentially useful family for modeling multivariate heavy-tailed and skewed data.
Item Open Access Data-driven investigations of disgust(2019) Hanna, EleanorDisgust features prominently in many facets of human life, from dining etiquette to spider phobia to genocide. For some applications, such as public health campaigns, it might be desirable to know how to increase disgust, whereas for things like legal and political decision-making it might be desirable to know how to suppress disgust. However, interventions in neither direction can take place until the basic structure of disgust is better understood. Disgust is notoriously difficult to model, largely due to the fact that it is a highly individually variable, multifactorial construct, with a great breadth of eliciting stimuli and contexts. As such, many of the theories which attempt to comprehensively describe disgust come into conflict with each other, impeding progress towards more efficient and effective ways of predicting disgust-related outcomes. The aim of this dissertation is to explore the possible contribution of data-driven methods to resolving theoretical questions, evaluating extant theories, and the generation of novel conceptual structures from bottom-up insights. Data were collected to sample subjective experience as well as psychophysiological reactivity. Through the use of techniques such as factor analysis and support vector machine classification, several insights about the approaching the study of disgust emerged. In one study, results indicated that the level of abstraction across subdivisions of disgust is not necessarily constant, in spite of a priori theoretical expectations: in other words, some domains of disgust are more general than others, and recognizing as much will improve the predictive validity of a model. Another study highlighted the importance of recognizing one particular category of disgust elicitors (mutilation) as a separate entity from the superordinate domains into which extant theories placed it. Finally, another study investigated the influence of concurrent emotions on variability in disgust physiology, and demonstrated the difference in the representations of the structure of disgust between the level of subjective experience and the level of autonomic activity. In total, the studies conducted as part of this dissertation suggest that for constructs as complex as disgust, data-driven approaches investigations can be a boon to scientists looking to evaluate the quality of the theoretical tools at their disposal.
Item Open Access Easy and Efficient Bayesian Infinite Factor Analysis(2020) Poworoznek, EvanBayesian latent factor models are key tools for modeling linear structure in data and performing dimension reduction for correlated variables. Recent advances in prior specification allow the estimation of semi- and non-parametric infinite factor mod- els. These models provide significant theoretical and practical advantages at the cost of computationally intensive sampling and non-identifiability of some parameters. We provide a package for the R programming environment that includes functions for sampling from the posterior distributions of several recent latent factor mod- els. These computationally efficient samplers are provided for R with C++ source code to facilitate fast sampling of standard models and provide component sam- pling functions for more complex models. We also present an efficient algorithm to remove the non-identifiability that results from the included shrinkage priors. The infinitefactor package is available in developmental version on GitHub at https://github.com/poworoznek/infinitefactor and in release version on the CRAN package repository.
Item Open Access New tools for Bayesian clustering and factor analysis(2022) Song, HanyuTraditional model-based clustering faces challenges when applied to mixed scale multivariate data, consisting of both categorical and continuous variables. In such cases, there is a tendency for certain variables to overly influence clustering. In addition, as dimensionality increases, clustering can becomemore sensitive to kernel misspecification and less reliable. In Chapter 1, we propose a simple local-global Bayesian clustering framework designed to address both of these problems. The model assigns a separate cluster ID to each variable from each subject to define the local component of the model. These local clustering IDs are dependent on a global clustering ID for each subject through a simple hierarchical model. The proposed framework builds on previous related ideas including consensus clustering, the enriched Dirichlet process, and mixed membership models. We show its property of local-global borrowing of information and ease of handling missing data. As a canonical special case, we focus on a simple Dirichlet over-fitted local-global mixture, for which we show that the extra global components of the posterior can be emptied asymptotically. This is the first such result applicable to a broad class of over-fitted finite mixture of mixtures models. We also propose kernel and prior specification for the canonical case and show it leads to a simple Gibbs sampler for posterior computation. We illustrate the approach using simulation studies and applications, through which we see the model is able to identify relevant variables for clustering. Large data have become the norm in many modern applications; they often cannot be easily moved across computers or loaded into memory on a single computer. In such cases, model-based clustering, which typically uses the inherently serial Markov chain Monte Carlo for computation, faces challenges. Existing distributed algorithms have emphasized nonparametric Bayesian mixture models and typically require moving raw data across workers. In Chapter 2, we introduce a nearly embarrassingly parallel algorithm for clustering under a Bayesian overfitted finite mixture of Gaussian mixtures, which we term distributed Bayesian clustering (DIB-C). DIB-C can flexibly accommodate data sets with various shapes (e.g. skewed or multi-modal). With data randomly partitioned and distributed, we first run Markov chain Monte Carlo in an embarrassingly parallel manner to obtain local clustering draws and then refine across workers for a final clustering estimate based on \emph{any} loss function on the space of partitions. DIB-C can also estimate cluster densities, quickly classify new subjects and provide a posterior predictive distribution. Both simulation studies and real data applications show superior performance of DIB-C in terms of robustness and computational efficiency.
Chapter 3develops a simple factor analysis model in light of the need for new models for characterizing dependence in multivariate data. The multivariate Gaussian distribution is routinely used, but cannot characterize nonlinear relationships in the data. Most non-linear extensions tend to be highly complex; for example, involving estimation of a non-linear regression model in latent variables. We propose a relatively simple class of Ellipsoid-Gaussian multivariate distributions, which are derived by using a Gaussian linear factor model involving latent variables having a von Mises-Fisher distribution on a unit hyper-sphere. We show that the Ellipsoid-Gaussian distribution can flexibly model curved relationships among variables with lower-dimensional structures. Taking a Bayesian approach, we propose a hybrid of gradient-based geodesic Monte Carlo and adaptive Metropolis for posterior sampling. We derive basic properties and illustrate the utility of the Ellipsoid-Gaussian distribution on a variety of simulated and real data applications.
Item Open Access Relating Traits to Electrophysiology using Factor Models(2020) Talbot, Austin BTargeted stimulation of the brain has the potential to treat mental illnesses. The objective of this work is to develop methodology that enables scientists to design stimulation methods based on the electrophysiological dynamics. We first develop several factor models that characterize aspects of the dynamics relevant to these illnesses. Using a novel approach, we can then find a single predictive factor of the trait of interest. To improve the quality of the associated loadings, we develop a method for removing concomitant variables that can dominate the observed dynamics. We also develop a novel inference technique that increases the relevance of the predictive loadings. Finally, we demonstrate the efficacy of our methodology by finding a single factor responsible for social behavior. This factor is stimulated in new subjects and modifies behavior in the new individuals. These results indicate that our methodology has high potential in developing future cures of mental illness.
Item Open Access The Hierarchical Organization of Impulse Control: Implications for Decision Making(2014) Coutlee, Christopher GilbertThe research studies presented as this dissertation constitute a methodologically diverse and conceptually integrative approach to understanding impulsiveness in the context of cognitive control and decision making. Broadly, these findings address the validity of current conceptions of trait impulsiveness, relationships between those traits and brain or laboratory measures of cognitive control, and links between impulsive traits and economic decisions under conditions of delay or uncertainty. The findings presented in this thesis affirm the multidimensional nature of impulsiveness as a construct, and link individual differences in specific impulsive types to behavioral and neurobiological measures of control function. The nature of motor, attentional, and nonplanning impulsive types are contextualized by reference to evidence supporting a broad theory of behavioral control based on hierarchical organization of action, ranging from concrete acts to abstract plans and strategies. We provide evidence linking concrete forms of urgent/motor impulsiveness to behavior and brain activation during response-related control, and more abstract and future-oriented premedititative/nonplanning impulsiveness to strategic control signals in more rostral PFC. Finally, these findings are complemented by causal evidence from a neurostimulation study linking a contextual control network to risky decision making and attentional impulsiveness.