Browsing by Subject "Census"
Results Per Page
Sort Options
Item Open Access Bayesian Models for Imputing Missing Data and Editing Erroneous Responses in Surveys(2019) Akande, Olanrewaju MichaelThis thesis develops Bayesian methods for handling unit nonresponse, item nonresponse, and erroneous responses in large scale surveys and censuses containing categorical data. I focus on applications to nested household data where individuals are nested within households and certain combinations of the variables are not allowed, such as the U.S. Decennial Census, as well as surveys subject to both unit and item nonresponse, such as the Current Population Survey.
The first contribution is a Bayesian model for imputing plausible values for item nonresponse in data nested within households, in the presence of impossible combinations. The imputation is done using a nested data Dirichlet process mixture of products of multinomial distributions model, truncated so that impossible household configurations have zero probability in the model. I show how to generate imputations from the Markov Chain Monte Carlo sampler, and describe strategies for improving the computational efficiency of the model estimation. I illustrate the performance of the approach with data that mimic the variables collected in the U.S. Decennial Census. The results indicate that my approach can generate high quality imputations in such nested data.
The second contribution extends the imputation engine in the first contribution to allow for the editing and imputation of household data containing faulty values. The approach relies on a Bayesian hierarchical model that uses the nested data Dirichlet process mixture of products of multinomial distributions as a model for the true unobserved data, but also includes a model for the location of errors, and a reporting model for the observed responses in error. I illustrate the performance of the edit and imputation engine using data from the 2012 American Community Survey. I show that my approach can simultaneously estimate multivariate relationships in the data accurately, adjust for measurement errors, and respect impossible combinations in estimation and imputation.
The third contribution is a framework for using auxiliary information to specify nonignorable models that can handle both item and unit nonresponse simultaneously. My approach focuses on how to leverage auxiliary information from external data sources in nonresponse adjustments. This method is developed for specifying imputation models so that users can posit distinct specifications of missingness mechanisms for different blocks of variables, for example, a nonignorable model for variables with auxiliary marginal information and an ignorable model for the variables exclusive to the survey.
I illustrate the framework using data on voter turnout in the Current Population Survey.
The final contribution extends the framework in the third contribution to complex surveys, specifically, handling nonresponse in complex surveys, such that we can still leverage auxiliary data while respecting the survey design through survey weights. Using several simulations, I illustrate the performance of my approach when the sample is generated primarily through stratified sampling.
Item Open Access Can census data alone signal heterogeneity in the estimation of poverty maps?(2011-07) Tarozzi, AlessandroMethodologies now commonly used for the construction of poverty maps assume a substantial degree of homogeneity within geographical areas in the relationship between income and its predictors. However, local labor and rental markets and other local environmental differences are likely to generate heterogeneity in such relationships, at least to some extent. The purpose of this paper is to argue that useful if only indirect and suggestive evidence on the extent of area heterogeneity is readily available in virtually any census. Such indirect evidence is provided by non-monetary indicators–such as literacy, asset ownership or access to sanitation–which are routinely included in censuses. These indicators can be used to perform validation exercises to gauge the extent of heterogeneity in their distribution conditional on predictors analogous to those commonly used in poverty mapping. We argue that the same factors which are likely to generate area heterogeneity in poverty mapping are also likely to generate heterogeneity in such kind of validation exercises. We construct a very simple model to illustrate this point formally. Finally, we evaluate empirically the argument using data from Mexico. In our empirical illustrations, the performance of imputation methodologies to construct maps of indicators typically feasible with census data alone is indeed informative about how effectively such methodologies can produce correct inference in poverty mapping.Item Open Access Creating linked datasets for SME energy-assessment evidence-building: Results from the U.S. Industrial Assessment Center Program(Energy Policy, 2017-12-01) Dalzell, NM; Boyd, GA; Reiter, JP© 2017 Elsevier Ltd Lack of information is commonly cited as a market failure resulting in an energy-efficiency gap. Government information policies to fill this gap may enable improvements in energy efficiency and social welfare because of the externalities of energy use. The U.S. Department of Energy Industrial Assessment Center (IAC) program is one such policy intervention, providing no-cost assessments to small and medium enterprises (SME). The IAC program has assembled a wealth of data on these assessments, but the database does not include information about participants after the assessment or on non-participants. This study addresses that lack by creating a new linked dataset using the public IAC and non-public data at the Census Bureau. The IAC database excludes detail needed for an exact match, so the study developed a linking methodology to account for uncertainty in the matching process. Based on the linking approach, a difference in difference analysis for SME that received an assessment was done; plants that received an assessment improve their performance over time, relative to industry peers that did not. This new linked dataset is likely to shed even more light on the impact of the IAC and similar programs in advancing energy efficiency.Item Open Access Simultaneous Edit and Imputation for Household Data with Structural Zeros(Journal of Survey Statistics and Methodology) Akande, Olanrewaju; Barrientos, Andres; Reiter, JeromeMultivariate categorical data nested within households often include reported values that fail edit constraints---for example, a participating household reports a child's age as older than his biological parent's age---as well as missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.Item Open Access The Decision to Work by Married Immigrant Women(1993-07) Duleep, Harriet Orcutt; Sanders, SethUsing 1980 Census data, the authors analyze the labor force participation of married immigrant Asian women by country of origin, compared with that of married immigrant women from Europe and Canada. The results suggest the existence of a family investment strategy: evidence from both across groups and within groups indicates that a woman's decision to work is affected by whether she has a husband who invests in skills specific to the U.S. labor market, and also by the extent of that investment. Such a family response may help offset the low earnings of immigrant men who initially lack skills for which there is a demand in the American labor market.Item Open Access What Drives Forest Fragmentation in the Brazilian Amazon? Examining Spatial Patterns(2009-04-24T13:27:58Z) Hurwit, NicholasUnderstanding forest fragmentation and deforestation patterns with respect to human presence and development is important for governing bodies to provide adequate protection for vulnerable tropical ecosystems. In recent decades, Brazil has seen increasing pressures to clear rainforest in the interior of the Brazilian Amazon through increased agricultural production and government infrastructure initiatives. As a result the area of cleared forest in Brazil is currently larger than France, and continues to increase annually. This project looks at the statistical relationships between forest fragmentation, deforestation rates, and census variables such as agricultural investment and population trends. Agricultural production and income can be linked to forest fragmentation and deforestation as more contemporary drivers.