Browsing by Subject "Nonparametric statistics"
- Results Per Page
- Sort Options
Item Open Access Methods for Systematic Exploratory Analysis of Gene Expression Data with Applications to Cancer Genomics(2017) Wagner, FlorianAdvances in technologies for gene expression profiling have resulted in an unprecedented abundance of gene expression data. However, computational methods available for the exploratory analysis of such data are limited in their ability to generate an interpretable overview of biologically relevant similarities and differences among samples. This work first introduces the XL-mHG test, a sensitive and specific hypothesis test for detecting gene set enrichment, and discusses its algorithmic and statistical properties. It further introduces GO-PCA, a method for exploratory analysis of gene expression data using prior knowledge. The XL-mHG test serves as a building block for GO-PCA. The output of GO-PCA consists of functional expression signatures, designed to provide an interpretable representation of biologically meaningful variation in the data. The power and versatility of the method is demonstrated on heterogeneous human and mouse expression data. Finally, applications of the proposed methods to carcinoma and lymphoma expression data aim to demonstrate their clinical relevance. The effective utilization of prior knowledge in the exploratory analysis of gene expression data through carefully designed computational methods is essential for successfully harnessing the power of current and future platforms for gene expression profiling, with the aim of generating clinically relevant insights into complex diseases such as cancer.
Item Open Access Modeling Heterogeneity With Bayesian Additive Regression Trees(2023) Orlandi, VittorioThis work focuses on using Bayesian Additive Regression Trees (BART), a flexible and computationally efficient regression method, to model heterogeneity in data. In particular, we focus on the closely related tasks of hierarchical modeling, latent variable modeling, and density regression. We begin by introducing BART in Chapter 2, presenting the prior, various extensions, and an in-depth case study using BART to analyze the impact of ABO-incompatible cardiac transplant on infants. Chapter 3 describes a methodological contribution, in which we use BART to model data structured within known groups by allowing for group-specific forests, each of which is only updated using units corresponding to that group. We further introduce an intercept forest common to all units and a hierarchical prior across the leaf variances in order to allow for sharing of information. We find that such an approach yields more parsimonious models than other BART-based approaches in the literature, which in turn translates to better out-of-sample accuracy, at virtually no added computational cost. In Chapter 4, we consider models involving latent variables within BART. The original motivation is to extend the known-group approach in Chapter 3 to a setting where group information is unavailable. However, this idea lends itself well to many different analyses, including those involving continuous omitted or latent variables. Another application is a generalization of a BART-based approach to sensitivity analysis, in which we allow for the unobserved confounder to flexibly influence the outcome. The latent variable framework we consider is computationally efficient, can help BART model data much more accurately than if restricting oneself to observed covariates, and is widely applicable to many different settings. In Chapter 5, we study one such application in great detail: using BART for density regression. By integrating out the latent variable in our model, we can model conditional densities in a way that outperforms a variety of other approaches on simulated tasks, and also allows us to bound its posterior concentration rate. We hope that the tools we develop in this work are useful to practitioners seeking to model heterogeneity in their data and also serve as a foundation for future methodological advances.