Advanced Topics in Introductory Statistics

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



It is now common practice in many scientific disciplines to collect large amounts of experimental or observational data in the course of a research study. The abundance of such data creates a circumstance in which even simply posed research questions may, or sometimes must, be answered using multivariate datasets with complex structure. Introductory-level statistical tools familiar to practitioners may be applied to these types of data, but inference will either be sub-optimal or invalid if properties of the data violate the assumptions made by these statistical procedures. In this thesis, we provide examples of how basic statistical procedures may be adapted to suit the complexity of modern datasets while preserving the simplicity of low-dimensional parametric models. In the context of genomics studies, we propose a frequentist-assisted-by-Bayes (FAB) method for conducting hypothesis tests for the means of normal models when auxiliary information about the means is available. If the auxiliary information accurately describes the means, then the proposed FAB hypothesis tests may be more powerful than the corresponding classical $t$-tests. If the information is not accurate, then the FAB tests retain type-I error control. For multivariate financial and climatological data, we develop a semiparametric model in order to characterize the dependence between two sets of random variables. Our approach is inspired by a multivariate notion of the sample rank and extends classical concepts such as canonical correlation analysis (CCA) and the Gaussian copula model. The proposed model allows for the analysis of multivariate dependence between variable sets with arbitrary marginal distributions. Motivated by fluorescence spectroscopy data collected from sites along the Neuse River, we also propose a least squares estimator for quantifying the contribution of various land-use sources to the water quality of the river. The estimator can be computed quickly relative to estimators derived using parallel factor analysis (PARAFAC) and it performs favorably in two source apportionment tasks.





Bryan, Jordan Grey (2023). Advanced Topics in Introductory Statistics. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.