Utilizing Network Structure to Flexibly Model Areal Data
Abstract
The majority of statistical methods for the analysis of spatially indexed data have been developed for settings in which data are observed at a collection of point-indexed locations. Region-indexed data, also known as areal data, are common in many disciplines including economics, sociology, public health, and ecology. While statistical methods methods for the analysis of spatially correlated, point-indexed data typically characterize covariance as a function of the distances between observed locations, methods for areal data tend to characterize between-region spatial dependence using a graphical representation of the spatial domain. In most cases this has been done by defining covariance as a function of the unweighted adjacency matrix of the data, resulting in the fairly rigid assumption that all pairs of neighboring regions interact in essentially the same manner. This dissertation presents multiple methods by which the intrinsic network structure of areal data may be utilized to define more flexible models for graph and areal data than those commonly used at present. In Chapter 2 we introduce the graph deformation (GDEF) framework which implicitly defines an embedding of a graph into high-dimensional Euclidean space, wherein between-node covariance is defined using the Mat\'{e}rn covariance function. The model is parameterized by an unknown edge weights matrix and defines a class of covariance matrices that are both highly flexible and interpretable. We compare the GDEF model to alternatives, illustrating the advantages of the approach, before concluding with an analysis of bird abundance data for several species in North Carolina. Chapter 3 extends the work presented in Chapter 2, proposing a method by which unknown edge weights matrices may be more efficiently estimated using a basis function representation. We show how this framework improves the GDEF model as well as other existing models for areal data. We provide several illustrations of the properties of the GDEF and other models when integrated with the proposed method, concluding with a number of simulation studies and a data analysis that demonstrate the utility of the GDEF framework. In Chapter 4 we explore how crowdsourced observations of migratory birds may be utilized to better understand avian migratory trends. We propose a novel hidden Markov model that draws from circuit theory and electrical physics to characterize its transition structure, and conclude by fitting our model to observations of the Baltimore oriole and yellow-rumped warbler within the eastern United States, with data spanning a five year period. We conclude the dissertation with a brief discussion of the work presented.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Christensen, Michael Fredrick (2024). Utilizing Network Structure to Flexibly Model Areal Data. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/31921.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.