Methods, Models and Metrics for Annotation Error Assessment in Image Segmentation
Abstract
Image segmentation systems touch many facets of society, being deployed in technologies for healthcare, transportation, security, manufacturing, entertainment, and others. These systems must be trained using large numbers of finely labeled images and obtaining enough high-quality training data continues to be a key bottleneck. Image segmentation workflows suffer from three major limitations related to the annotation methods, models, and metrics they use, which limit the quality of training data that is produced and hinder the performance of models trained on that data. First, there is an absence of principled guidelines on how to structure crowdsourced annotation tasks to maximize the quality of data produced, and task configuration decisions continue to be driven primarily by cost and convenience. Second, there are insufficient models describing the geometric characteristics of annotation errors, so prior efforts studying the effects of annotation error geometry on training are incomplete and may fail to fully characterize the most harmful classes of errors. Third, prevailing annotation quality metrics are based on pixel counts and do not convey spatial properties of annotation errors, so if harmful error types exist, they cannot be easily detected in existing quality control practices.Overall, developers of image segmentation systems would benefit from improved annotation workflows, but achieving this requires (1) a better understanding of the errors produced by different annotation task configurations, (2) insight into which types of errors are most harmful to training, and (3) tools and methods that can detect and eliminate the most harmful error types. To address these gaps, I present research covering three core areas of segmentation annotation methods, models, and metrics. First, I empirically characterize the relationship between annotation task configuration and annotation quality while also providing new tools and the first ever dataset of naturally derived annotation errors to promote future study. Second, I identify a new class of segmentation annotation error geometry characterized by the removal of concave regions from object annotations, define a set of reusable geometric transforms for simulating these errors, and present a comprehensive study describing the relationship between error geometry, error size, error frequency, dataset properties, model architecture, and resulting segmentation performance. Third, I present a new classifier system that can detect the costliest annotation error geometries, even when deployed to new image distributions for which no curated reference annotations are available. This research immediately benefits the computer vision community by providing insight into how segmentation annotation methods, models, and metrics can be refined to extract higher quality training data. In addition to the research findings, this work also yields (1) the Crowdsource Segmentation Experiment Tool (CSET), a unique web platform for running crowdsource segmentation research, (2) the VACES dataset, the first ever dataset of systematically generated noisy crowdsource annotations, (3) and the Concave Error Model geometric transforms, which can be used to generate realistic synthetic noisy annotations. All of these contributions support the community in pursuing further study of segmentation annotation, ensuring continued progress toward improving annotation workflows and the segmentation systems that depend on them.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Bauchwitz, Benjamin Rogers (2025). Methods, Models and Metrics for Annotation Error Assessment in Image Segmentation. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32738.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.