Statistical Advances in Data Linkage and Model Evaluation
Abstract
This dissertation is about statistical contributions to data linkage and model evaluation. The two subjects fall at the extremities of traditional model development, with data linkage used to enrich data fed into downstream models and analyses, and evaluation used to maximize the utility of deployed models. We report on five research projects where we developed generalizable statistical methodologies to solve important practical problems in these areas. This includes the evaluation of statistical models for the quantification of modern slavery, methods to estimate and monitor the generalization performance of entity resolution systems, a novel F-score optimization algorithm for bipartite record linkage, and the introduction of an estimands framework to improve the validity and practical usefulness of AI/ML evaluations.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Binette, Olivier (2024). Statistical Advances in Data Linkage and Model Evaluation. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/31936.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.