A Comparison of Strategies for Generating Synthetic Data for Complex Survey

dc.contributor.advisor

Reiter, Jerome

dc.contributor.author

Chen, Min

dc.date.accessioned

2024-06-06T13:50:14Z

dc.date.available

2024-06-06T13:50:14Z

dc.date.issued

2024

dc.department

Statistical Science

dc.description.abstract

Synthetic data is a type of method for protecting data privacy. In the context of disseminating confidential data for public utilization, some statistical agencies employ the generation of fully synthetic datasets. This practice is applied to census and administrative records. It is important to note that many research datasets come from surveys with complex sampling methods, which is not ignorable when constructing synthetic data. The thesis presents an illustration for the comparison of three different synthetic data strategies. Each of them has different procedures to generate the synthetic data. Two of them are based on the bootstrap methods, one is Bayesian bootstrap, and the other is regular bootstrap. The third method is based on the posterior inference with pseudo-likelihood. Using simulation studies with probability proportional to size sampling, we show that all three methods can result in accurate estimates of the mean of a finite population. However, when estimating the sampling statistic's variance, only the method based on the Bayesian bootstrap method can provide an approximately unbiased estimate in these simulations.

dc.identifier.uri

https://hdl.handle.net/10161/31056

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Statistics

dc.title

A Comparison of Strategies for Generating Synthetic Data for Complex Survey

dc.type

Master's thesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Chen_duke_0066N_17971.pdf
Size:
449.66 KB
Format:
Adobe Portable Document Format

Collections