Advanced Topics in Introductory Statistics

Loading...
Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

33
views
71
downloads

Abstract

It is now common practice in many scientific disciplines to collect large amounts of experimental or observational data in the course of a research study. The abundance of such data creates a circumstance in which even simply posed research questions may, or sometimes must, be answered using multivariate datasets with complex structure. Introductory-level statistical tools familiar to practitioners may be applied to these types of data, but inference will either be sub-optimal or invalid if properties of the data violate the assumptions made by these statistical procedures. In this thesis, we provide examples of how basic statistical procedures may be adapted to suit the complexity of modern datasets while preserving the simplicity of low-dimensional parametric models. In the context of genomics studies, we propose a frequentist-assisted-by-Bayes (FAB) method for conducting hypothesis tests for the means of normal models when auxiliary information about the means is available. If the auxiliary information accurately describes the means, then the proposed FAB hypothesis tests may be more powerful than the corresponding classical $t$-tests. If the information is not accurate, then the FAB tests retain type-I error control. For multivariate financial and climatological data, we develop a semiparametric model in order to characterize the dependence between two sets of random variables. Our approach is inspired by a multivariate notion of the sample rank and extends classical concepts such as canonical correlation analysis (CCA) and the Gaussian copula model. The proposed model allows for the analysis of multivariate dependence between variable sets with arbitrary marginal distributions. Motivated by fluorescence spectroscopy data collected from sites along the Neuse River, we also propose a least squares estimator for quantifying the contribution of various land-use sources to the water quality of the river. The estimator can be computed quickly relative to estimators derived using parallel factor analysis (PARAFAC) and it performs favorably in two source apportionment tasks.

Description

Provenance

Citation

Citation

Bryan, Jordan Grey (2023). Advanced Topics in Introductory Statistics. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27685.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.