Bayesian Models for Combining Information from Multiple Sources

dc.contributor.advisor

Reiter, Jerome P

dc.contributor.author

Tang, Jiurui

dc.date.accessioned

2022-06-15T18:42:30Z

dc.date.available

2022-06-15T18:42:30Z

dc.date.issued

2022

dc.department

Statistical Science

dc.description.abstract

This dissertation develops Bayesian methods for combining information from multiple sources. I focus on developing Bayesian bipartite modeling for simultaneous regression and record linkage, as well as leveraging auxiliary information on marginal distributions for handling item and unit nonresponse and accounting for survey weights.

The first contribution is a Bayesian hierarchical model that allows analysts to perform simultaneous linear regression and probabilistic record linkage. This model allows analysts to leverage relationships among the variables to improve linkage quality. It also potentially offers more accurate estimates of regression parameters compared to approaches that use a two-step process, i.e., link the records first, then estimate the linear regression on the linked data. I propose and evaluate three Markov chain Monte Carlo algorithms for implementing the Bayesian model.

The second contribution is examining the performance of an approach for generating multiple imputation data sets for item nonresponse. The method allows analysts to use auxliary information. I examine the approach via simulation studies with Poisson sampling. I also give suggestions on parameter tuning.

The third contribution is a model-based imputation approach that can handle both item and unit nonresponse while accounting for auxiliary margins and survey weights. This approach includes an innovative combination of a pattern mixture model for unit nonresponse and a selection model for item nonresponse. Both unit and item nonresponse can be nonignorable. I demonstrate the model performance with simulation studies under the situations when the design weights for unit respondents are known and when they are not. I show that the model can generate multiple imputation data sets that both retain the relationship among survey variables and yield design-based estimates that agree with auxiliary margins. I use the model to analyze voter turnout overall and across subgroups in North Carolina, with data from the 2018 Current Population Survey.

dc.identifier.uri

https://hdl.handle.net/10161/25143

dc.subject

Statistics

dc.subject

Bayesian modeling

dc.subject

MCMC

dc.subject

Missing data

dc.subject

Multiple imputation

dc.subject

Record Linkage

dc.subject

Survey Method

dc.title

Bayesian Models for Combining Information from Multiple Sources

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Tang_duke_0066D_16554.pdf
Size:
738.73 KB
Format:
Adobe Portable Document Format

Collections