Modeling Point Patterns, Measurement Error and Abundance for Exploring Species Distributions
This dissertation focuses on solving some common problems associated with ecological field studies. In the core of the statistical methodology, lies spatial modeling that provides greater flexibility and improved predictive performance over existing algorithms. The applications involve prevalence datasets for hundreds of plants over a large area in the Cape Floristic Region (CFR) of South Africa.
In Chapter 2, we begin with modeling the categorical abundance data with a multi level spatial model using background information such as environmental and soil-type factors. The empirical pattern is formulated as a degraded version of the potential pattern, with the degradation effect accomplished in two stages. First, we adjust for land use transformation and then we adjust for measurement error, hence misclassification error, to yield the observed abundance classifications. With data on a regular grid over CFR, the analysis is done with a conditionally autoregressive prior on spatial random effects. With around ~ 37000 cells to work with, a novel paralleilization algorithm is developed for updating the spatial parameters to efficiently estimate potential and transformed abundance surfaces over the entire region.
In Chapter 3, we focus on a different but increasingly common type of prevalence data in the so called <italic>presence-only</italic> setting. We detail the limitations associated with a usual presence-absence analysis for this data and advocate modeling the data as a point pattern realization. The underlying intensity surface is modeled with a point-level spatial Gaussian process prior, after taking into account sampling bias and change in land-use pattern. The large size of the region enforces using an computational approximation with a bias-corrected predictive process. We compare our methodology against the the most commonly used maximum entropy method, to highlight the improvement in predictive performance.
In Chapter 4, we develop a novel hierarchical model for analyzing noisy point pattern datasets, that arise commonly in ecological surveys due to multiple sources of bias, as discussed in previous chapters. The effect of the noise leads to displacements of locations as well as potential loss of points inside a bounded domain. Depending on the assumption on existence of locations outside the boundary, a couple of different models -- <italic>island</italic> and <italic>subregion</italic>, are specified. The methodology assumes informative knowledge of the scale of measurement error, either pre-specified or learned from a training sample. Its performance is tested against different scales of measurement error related to the data collection techniques in CFR.
In Chapter 5, we suggest an alternative model for prevalence data, different from the one in Chapter 3, to avoid numerical approximation and subsequent computational complexities for a large region. A mixture model, similar to the one in Chapter 4 is used, with potential dependence among the weights and locations of components. The covariates as well as a spatial process are used to model the dependence. A novel birth-death algorithm for the number of components in the mixture is under construction.
Lastly, in Chapter 6, we proceed to joint modeling of multiple-species datasets. The challenge is to infer about inter-species competition with a large number of populations, possibly running into several hundreds. Our contribution involves applying hierarchical Dirichlet process to cluster the presence localities and subsequently developing measures of range overlap from posterior draws. This kind of simultaneous inference can potentially have implications for questions related to biodiversity and conservation studies. .
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations