Browsing by Subject "Matching"
- Results Per Page
- Sort Options
Item Open Access An Investigation into the Bias and Variance of Almost Matching Exactly Methods(2021) Morucci, MarcoThe development of interpretable causal estimation methods is a fundamental problem for high-stakes decision settings in which results must be explainable. Matching methods are highly explainable, but often lack the accuracy of black-box nonparametric models for causal effects. In this work, we propose to investigate theoretically the statistical bias and variance of Almost Matching Exactly (AME) methods for causal effect estimation. These methods aim to overcome the inaccuracy of matching by learning on a separate training dataset an optimal metric to match units on. While these methods are both powerful and interpretable, we currently lack an understanding of their statistical properties. In this work we present a theoretical characterization of the finite-sample and asymptotic properties of AME. We show that AME with discrete data has bounded bias in finite samples, and is asymptotically normal and consistent at a root-n rate. Additionally, we show that AME methods for matching on networked data also have bounded bias and variance in finite-samples, and achieve asymptotic consistency in sparse enough graphs. Our results can be used to motivate the construction of approximate confidence intervals around AME causal estimates, providing a way to quantify their uncertainty.
Item Open Access Interpretable Almost-Matching Exactly with Instrumental Variables(2019) Liu, YamengWe aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework.
The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; To match units on as many relevant variables as possible, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of the units' optimization problems simultaneously. Notable advantages of our method over existing matching procedures are its high-quality interpretable matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible. We also adapt the matching framework by using instrumental variables (IV) to the presence of observed categorical confounding that breaks the randomness assumptions and propose an approximate algorithm which speedily generates high-quality interpretable solutions.We show that our algorithms construct better matches than other existing methods on simulated datasets, produce interesting results in applications to crime intervention and political canvassing.
Item Open Access Protected Area Impacts on Land Cover in Mexico(2013-04-15) Santiago-Ávila, Francisco J.Although national and international efforts to mitigate deforestation during the last few decades have had some limited impact, they have failed to substantially slow the loss of tropical forests. This MP applies an approach for providing more evidence on what has worked or not worked in terms of conservation policies intended to reduce tropical natural land cover. Specifically, the work and approaches used in my analysis should help to illuminate the tradeoffs currently facing Mexico, a country which is seriously considering pursuing REDD policies, but also knows it would not be without economic costs. My main objective is to answer the question: "have conservation parks affected change in land cover in Mexico?” while a related objective is to assess if some types of parks have had reliably more impact. Due to the nonrandom establishment of protected areas (PAs), I employ a matching approach (propensity score) in order to construct a plausible counterfactual by controlling explicitly for land characteristics that proved to be significant drivers of both land cover change and protection status. My results indicate not only that my approach improved impact estimates, but also, in particular, that PAs lower land cover change pressure by 3.1%, and that strict protection seems to avoid more land cover change (5.3%) than loose (multi-use) protection (2.7%). While these results are suggestive, I would recommend also trying to get better and more data to test their robustness.Item Open Access Protected Area Impacts on Land Cover in Mexico(2013-04-15) SantiagoÁvila, Francisco JAlthough national and international efforts to mitigate deforestation during the last few decades have had some limited impact, they have failed to substantially slow the loss of tropical forests. This MP applies an approach for providing more evidence on what has worked or not worked in terms of conservation policies intended to reduce tropical natural land cover. Specifically, the work and approaches used in my analysis should help to illuminate the tradeoffs currently facing Mexico, a country which is seriously considering pursuing REDD policies, but also knows it would not be without economic costs. My main objective is to answer the question: "have conservation parks affected change in land cover in Mexico?” while a related objective is to assess if some types of parks have had reliably more impact. Due to the nonrandom establishment of protected areas (PAs), I employ a matching approach (propensity score) in order to construct a plausible counterfactual by controlling explicitly for land characteristics that proved to be significant drivers of both land cover change and protection status. My results indicate not only that my approach improved impact estimates, but also, in particular, that PAs lower land cover change pressure by 3.1%, and that strict protection seems to avoid more land cover change (5.3%) than loose (multi-use) protection (2.7%). While these results are suggestive, I would recommend also trying to get better and more data to test their robustness.Item Open Access Record Linkage Methods with Applications to Causal Inference and Election Voting Data(2019) Wortman, Joan Pearson HeckProbabilistic record linkage enables researchers and analysts to combine data from multiple data sources to conduct statistical analysis. This analysis may be to answer causal questions, to predict future outcomes, or to provide descriptive statistics. In this dissertation, I develop methodology for probabilistic record linkage for two scenarios: general causal inference applications with linked data, and identifying previously removed voters in North Carolina who cast provisional ballots in 2016.
In Chapter 2, we develop methodology for causal inference in observational studies when using propensity score subclassification on data constructed with probabilistic record linkage techniques. We focus on scenarios where covariates and binary treatment assignments are in one file and outcomes are in another file, and the goal is to estimate an additive treatment effect by merging the files. We assume that the files can be linked using variables common to both files, e.g., names or birth dates, but that links are subject to errors, e.g., due to reporting errors in the linking variables. We develop methodology for cases where such reporting errors are independent of the other variables on the files. We describe conceptually how linkage errors can affect causal estimates in subclassification contexts. We also present and evaluate several algorithms for deciding which record pairs to use in estimation of causal effects. Using simulation studies, we demonstrate that case selection procedures can result in improved accuracy in estimates of treatment effects from linked data compared to using only cases known to be true links.
In Chapter 3, we introduce a model for Bayesian record linkage and clustered sub-models, which we call BRACS. The model is designed for combining two sets of data in which there are differences in the comparison distributions for links and non-links, conditional on attributes observed in one of the files. We use simulation studies to demonstrate that the proposed approach can yield improvements in classifying record pairs as links versus non-links.
In Chapter 4, we apply BRACS to 2016 voting data from North Carolina. We describe the process of provisional voting and the list of provisional voters provided by the North Carolina Board of Elections. We provide background on the North Carolina voter file, of which we use a snapshot from November 2016. We outline the limitations of exact-matching the two files using only the state-provided identifiers. Finally, we use BRACS to link the two files, with and without the state-provided identifiers, in order to estimate the number of removed voters who cast provisional ballots in the November 2016 election in North Carolina.
In Chapter 5, we modify BRACS to relax the assumption of conditionally independent field comparisons, motivated by the correlation between party registration and race in North Carolina. We outline a method for accounting for this correlation, in which we combine two dependent comparison fields into one joint comparison field. We use simulation studies to demonstrate that this can yield improvements in linkage quality, and we also outline when it may not be appropriate to use. Finally, we apply the results to the data in Chapter 4 and re-estimate the number of removed voters with the joint-comparison BRACS.
Item Open Access Stochastic Optimization in Market Design and Incentive Management Problems(2020) Chen, MingliuThis dissertation considers practical operational settings, in which a decision maker needs to either coordinate preferences or to align incentives among different parties. We formulate these issues into stochastic optimization problems and use a variety of techniques from the theories of applied probability, queueing and dynamic programming.
First, we study a stochastic matching problem. We consider matching over time with short and long-lived players who are very sensitive to mismatch, and propose a novel method to characterize the mismatch. In particular, players' preferences are uniformly distributed on a circle, so the mismatch between two players is characterized by the one-dimensional circular angle between them. This framework allows us to capture matching processes in applications ranging from ride sharing to job hunting. Our analytical framework relies on threshold matching policies, and is focused on a limiting regime where players demonstrate low tolerance towards mismatch. This framework yields closed-form optimal matching thresholds. If the matching process is controlled by a centralized social planner (e.g. an online matching platform), the matching threshold reflects the trade-off between matching rate and matching quality. The corresponding optimal matching threshold is smaller than myopic matching threshold, which helps building market thickness. We further compare the centralized system with decentralized systems, where players decide their matching partners. We find that matching controlled by either side of the market may achieve optimal social welfare.
Second, we consider a dynamic incentive management problem in which a principal induces effort from an agent to reduce the arrival rate of a Poisson process of adverse events. The effort is costly to the agent, and unobservable to the principal, unless the principal is monitoring the agent. Monitoring ensures effort but is costly to the principal. The optimal contract involves monetary payments and monitoring sessions that depend on past arrival times. We formulate the problem as a stochastic optimal control model and solve the problem analytically. The optimal schedules of payment and monitoring demonstrate different structures depending on model parameters. Overall, the optimal dynamic contracts are simple to describe, easy to compute and implement, and intuitive to explain.
Item Open Access Syntactic rules predict song type matching in a songbird(Behavioral Ecology and Sociobiology, 2023-01-01) Searcy, WA; Chronister, LM; Nowicki, SAbstract: Song type matching has been hypothesized to be a graded signal of aggression; however, it is often the case that variation in matching behavior is unrelated to variation in aggressiveness. An alternative view is that whether an individual matches a song is determined mainly by syntactic rules governing how songs are sequenced. In song sparrows (Melospiza melodia), two such rules are the cycling rule, which directs that a bird cycles through its song types in close to the minimum number of bouts, and the bout length rule, which directs that a long bout of a song type is followed by a long interval before that song type is sung again. The effect of these rules on matching is confirmed here for a population of eastern song sparrows. Territorial males were challenged at the end of a recording session with playback of one of their own song types. Logistic regression showed that the probability of matching the playback song type increased with the length of the interval since the subject had last sung that song type, as predicted by the cycling rule. The probability of matching decreased as prior bout length increased, as predicted by the bout length rule. In a multivariate logistic regression, interval length and prior bout length were both associated with matching and together correctly predicted matching in 81.3% of cases. The results support the syntactic constraints hypothesis, which proposes that matching is a non-signaling by-product of internal rules governing the ordering of song type sequences. Significance statement: Vocal matching has attracted widespread interest in large part because it seems an effective method of directing an aggressive message at a particular recipient. Here, we show that in an eastern population of song sparrows, decisions on whether to match another bird are largely determined by internal rules of syntax governing how a singer sequences its song types, rather than by variation in aggressiveness or other individual traits. These results support the view that vocal matching is an incidental byproduct of internal mechanisms controlling the ordering of vocalization types and so is not a signal at all. This hypothesis may be broadly applicable to vocal matching in other species.