dc.description.abstract |
<p>This thesis develops Bayesian latent class models for nested categorical data,
e.g., people nested in households. The applications focus on generating synthetic
microdata for public release and imputing missing data for household surveys, such
as the 2010 U.S. Decennial Census.</p><p>The first contribution is methods for evaluating
disclosure risks in fully synthetic categorical data. I quantify disclosure risks
by computing Bayesian posterior probabilities that intruders can learn confidential
values given the released data and assumptions about their prior knowledge. I demonstrate
the methodology on a subset of data from the American Community Survey (ACS). The
methods can be adapted to synthesizers for nested data, as demonstrated in later chapters
of the thesis.</p><p>The second contribution is a novel two-level latent class model
for nested categorical data. Here, I assume that all configurations of groups and
units are theoretically possible. I use a nested Dirichlet Process prior distribution
for the class membership probabilities. The nested structure facilitates simultaneous
modeling of variables at both group and unit levels. I illustrate the modeling by
generating synthetic data and imputing missing data for a subset of data from the
2012 ACS household data. I show that the model can capture within group relationships
more effectively than standard one-level latent class models.</p><p>The third contribution
is a version of the nested latent class model adapted for theoretically impossible
combinations, e.g. a household with two household heads or a child older than her
biological father. This version assigns zero probability to those impossible groups
and units. I present a proof that the Markov Chain Monte Carlo (MCMC) sampling strategy
estimates the desired target distribution. I illustrate this model by generating synthetic
data and imputing missing data for a subset of data from the 2011 ACS household data.
The results indicate that this version can estimate the joint distribution more effectively
than the previous version.</p>
|
|