Browsing by Subject "Bayesian analysis"
- Results Per Page
- Sort Options
Item Open Access Causal Inference for Natural Language Data and Multivariate Time Series(2023) Tierney, GrahamThe central theme of this dissertation is causal inference for complex data, and highlighting how for certain estimation problems, collecting more data has limited benefit. The central application areas are natural language data and multivariate time series. For text, large language models are trained on predictive tasks not necessarily well-suited for causal inference. Moreover, documents that vary in some treatment feature will often also vary systematically in other, unknown ways that prohibit attribution of causal effects to the feature of interest. Multivariate time series, even with high-quality contemporaneous predictors, still exhibit positive dependencies such that even with many treated and control units, the amount of information available to estimate causal quantities is quite low.
Chapter 2 builds a model for short text, as is typically found on social media platforms. Chapter 3 analyzes a randomized experiment that paired Democrats and Republicans to have a conversation about politics, then develops a sensitivity procedure to test for mediation effects attributable to the politeness of the conversation. Chapter 4 expands on the limitations of observational, model-based methods for causal inference with text and designs an experiment to validate how significant those limitations are. Chapter 5 covers experimentation with multivariate time series.
The general conclusion from these chapters is that causal inference always requires untestable assumptions. A researcher trying to make causal conclusions needs to understand the underlying structure of the problem they are studying to validate whether those assumptions hold. The work here shows how to still conduct causal analysis when commonly made assumptions are violated.
Item Open Access Communities in Social Networks: Detection, Heterogeneity and Experimentation(2022) Mathews, HeatherThe study of network data in the social and health sciences frequently concentrates on understanding how and why connections form. In particular, the task of determining latent mechanisms driving connection has received a lot of attention across statistics, machine learning, and information theory. In social networks, this mechanism often manifests as community structure. As a result, this work provides methods for discovering and leveraging these communities to better understand networks and the data they generate.
We provide three main contributions. First, we present methodology for performing community detection in challenging regimes. Existing literature has focused on modeling the spectral embedding of a network using Gaussian mixture models (GMMs) in scaling regimes where the ability to detect community memberships improves with the size of the network. However, these regimes are not very realistic. As such, we provide tractable methodology motivated by new theoretical results for networks with non-vanishing noise by using GMMs that incorporate truncation and shrinkage effects.
Further, when covariate information is available, often we want to understand how covariates impact connections. It is likely that the effects of covariates on edge formation differ between communities (e.g. age might play a different role in friendship formation in communities across a city). To address this issue, we introduce a latent space network model where coefficients associated with certain covariates can depend on latent community membership of the nodes. We show that ignoring such structure can lead to either over- or under-estimation of covariate importance to edge formation and propose a Markov Chain Monte Carlo approach for simultaneously learning the latent community structure and the community specific coefficients.
Finally, we consider how community structure can impact experimentation. It is evident that communities can act in different ways, and it is natural that this propagates into experimental design. As as result, this observation motivates our development of community informed experimental design. This design recognizes that information between individuals likely flows along within community edges rather than across community edges. We demonstrate that this design improves estimation of global average treatment effect, even when the community structure of the graph needs to be estimated.
Item Open Access Experimental Study of Structured Light Using a Free-electron Laser Oscillator(2021) Liu, PeifanOver the past three decades, laser beams with complex amplitude and phase structures, especially orbital angular momentum (OAM) beams, have been extensively investigated. Researchers have found a wide range of applications for OAM beams spanning a vast range of distance scales, from fundamental physics at the atomic level with modified selection rules, to macroscopic use such as optical tweezers, to probing of the universe such as detection of rotating black holes.
While structured light beams in the visible and longer wavelength regimes can be generated using many techniques, at shorter wavelengths, from vacuum ultraviolet to x-rays to gamma rays, it is much more challenging to produce such light beams. In recent years, to generate structured light in the shorter wavelengths, particle accelerator-based light sources, such as magnetic undulators and free-electron lasers (FELs), have been explored as a promising candidate. While the FEL work was mostly limited to single-pass FELs, we recognized that the oscillator FEL is very attractive for producing high-quality OAM beams with high intracavity power. In this work, we report the first experimental generation of a particular kind of structured light, a coherently mixed (CM) OAM beam, using the Duke storage ring FEL. The coherently mixed OAM beams have been generated up to the fourth order. This was made possible by modifying the FEL cavity to obtain cylindrical symmetry, while suppressing the low-order transverse modes. The cavity modification was implemented using a set of specially developed masks, including an annulus mask and a disk mask.
On the other hand, a reliable and rapid assessment of the structured light has a wide range of applications in the laser development, including high-quality OAM beam generation, optical characterization of beam quality and mode contents, and manipulation and correction of distorted OAM beams. While the diagnostic methods for structured light have been widely investigated in long wavelengths during recent years, they are not available for the short-wavelength regimes due to wavelength limitations of optics used. We report here two general diagnostic techniques for structured light: a phase retrieval method for wavefront reconstruction; and a modal analysis method for assessing the mode contents and beam quality of a structured laser beam. These newly developed methods involve very few optics, and in principle, can be used in a wide range of wavelengths, from infrared to visible to UV and x-ray.
The produced coherently mixed OAM FEL beams are found to possess good beam quality, excellent stability and reproducibility, and substantial intracavity power. Using the aforementioned diagnostic techniques, we have analyzed the measured FEL beam images to retrieve the complex wavefront and mode content. These beams have been found to have good mode quality, dominated by two degenerate OAM modes of the same order but opposite helicities. A pulsed mode operation of the OAM FEL beam has also been developed using an external drive, in which the OAM beams exhibit a highly reproducible temporal structure when the pulsing frequency is varied from 1 Hz to 30 Hz.
The development of OAM FEL beams using the storage ring FEL has paved the way for short-wavelength OAM laser beam generation using future FEL oscillators operating in the extreme ultraviolet and x-ray regimes. The operation of the storage ring FEL also paves the way for the generation of OAM gamma-ray beams via Compton scattering.
Item Open Access Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data(2011) Mayrink, Vinicius DinizAn important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, is still highly co-expressed in another data set. For some microarray platforms there are many, relatively short, probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted transcript, but rather a different gene with a similar region (called cross-hybridization). Similarly, the incorrect mapping of short nucleotide sequences to a target gene is a common issue related to the young technology producing RNA-Seq data. The expression pattern across samples is a valuable source of information, which can be used to address distinct problems through the application of factor models. Our first study is focused on the identification of the presence/absence status of a gene in a sample. We compare our factor model to state-of-the-art detection methods; the results suggest superior performance of the factor analysis for detecting transcripts. In the second study, we apply factor models to investigate gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. Copy number alteration is detected for a group of genes in breast cancer; our goal is to examine this abnormality in the same chromosomal region for other types of tumors (Ovarian, Lung and Brain). In the third application, the expression pattern related to RNA-Seq count data is evaluated through a factor model based on the Poisson distribution. Here, the presence/absence of coherent patterns is closely associated with the number of incorrect read mappings. The final study of this dissertation is dedicated to the analysis of multi-factor models with linear and non-linear structure of interactions between latent factors. The interaction terms can have important implications in the model; they represent relationships between genes which cannot be captured in an ordinary analysis.