Browsing by Subject "Spatial statistics"
- Results Per Page
- Sort Options
Item Open Access Small and Stable Descriptors of Distributions for Geometric Statistical Problems(2009) Phillips, Jeff M.This thesis explores how to sparsely represent distributions of points for geometric statistical problems. A coreset C is a small summary of a point set P such that if a certain statistic is computed on P and C, then the difference in the results is guaranteed to be bounded by a parameter ε. Two examples of coresets are ε-samples and ε-kernels. An ε-sample can estimate the density of a point set in any range from a geometric family of ranges (e.g., disks, axis-aligned rectangles). An ε-kernel approximates the width of a point set in all directions. Both coresets have size that depends only on ε, the error parameter, not the size of the original data set. We demonstrate several improvements to these coresets and how they are useful for geometric statistical problems.
We reduce the size of ε-samples for density queries in axis-aligned rectangles to nearly a square root of the size when the queries are with respect to more general families of shapes, such as disks. We also show how to construct ε-samples of probability distributions.
We show how to maintain “stable” ε-kernels, that is if the point set P changes by a small amount, then the ε-kernel also changes by a small amount. This is useful in surveillance tracking problems and the stable properties leads to more efficient algorithms for maintaining ε-kernels.
We next study when the input point sets are uncertain and their uncertainty is modeled by probability distributions. Statistics on these point sets (e.g., radius of smallest enclosing ball) do not have exact answers, but rather distributions of answers. We describe data structures to represent approximations of these distributions and algorithms to compute them. We also show how to create distributions of ε-kernels and ε-samples for these uncertain data sets.
Finally, we examine a spatial anomaly detection problem: computing a spatial scan statistic. The input is a point set P and measurements on the point set. The spatial scan statistic finds the range (e.g., an axis-aligned bounding box) where the measurements inside the range are the most different from measurements outside of the range. We show how to compute this statistic efficiently while allowing for a bounded amount of approximation error. This result generalizes to several statistical models and types of input point sets.
Item Open Access Topics in Bayesian Spatiotemporal Prediction of Environmental Exposure(2019) White, Philip AndrewWe address predictive modeling for spatial and spatiotemporal modeling in a variety of settings. First, we discuss spatial and spatiotemporal data and corresponding model types used in later chapters. Specifically, we discuss Markov random fields, Gaussian processes, and Bayesian inference. Then, we outline the dissertation.
In Chapter 2, we consider the setting where areal unit data are only partially observed. First, we consider setting where a portion of the areal units have been observed, and we seek prediction of the remainder. Second, we leverage these ideas for model comparison where we fit models of interest to a portion of the data and hold out the rest for model comparison.
In Chapters 3 and 4, we consider pollution data from Mexico City in 2017. In Chapter 3 we forecast pollution emergencies. Mexico City defines pollution emergencies using thresholds that rely on regional maxima for ozone and for particulate matter with diameter less than 10 micrometers (PM10). To predict local pollution emergencies and to assess compliance with Mexican ambient air quality standards, we analyze hourly ozone and PM10 measurements from 24 stations across Mexico City from 2017 using a bivariate spatiotemporal model. With this model, we predict future pollutant levels using current weather conditions and recent pollutant concentrations. Employing hourly pollutant projections, we predict regional maxima needed to estimate the probability of future pollution emergencies. We discuss how predicted compliance with legislated pollution limits varies across regions within Mexico City in 2017.
In Chapter 4, we propose a continuous spatiotemporal model for Mexico City ozone levels that accounts for distinct daily seasonality, as well as variation across the city and over the peak ozone season (April and May) of 2017. To account for these patterns, we use covariance models over space, circles, and time. We review relevant existing covariance models and develop new classes of nonseparable covariance models appropriate for seasonal data collected at many locations. We compare the predictive performance of a variety of models that utilize various nonseparable covariance functions. We use the best model to predict hourly ozone levels at unmonitored locations in April and May to infer compliance with Mexican air quality standards and to estimate respiratory health risk associated with ozone exposure.