Understanding Feature Learning and Calibration of Machine Learning Models Through Simple Data Distributions
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Abstract
Deep learning applications have now permeated nearly every facet of daily life. Much of the success of these models over their shallower counterparts has been attributed to their ability to automatically learn useful features from data. However, training deep learning models to perform well requires a variety of tricks and heuristics, most of which are introduced with a non-rigorous explanation of why they work.
In the first part of this thesis, we peer more deeply into when and why a particular class of such heuristics - namely data augmentation - can improve model performance, from a feature learning perspective. We focus on understanding the highly influential Mixup and label smoothing techniques in the context of simple data distributions, and show that these techniques can lead to models that have learned low variance features in the data, which can consequently lead to better generalization (but can also lead to pitfalls!).
The main benefits of the aforementioned techniques are typically viewed as improving the generalization and robustness of models. However, there is an additional axis along which they help: calibration. We say that a (classification) model is calibrated if its predicted probability for an outcome matches the observed frequency for that outcome conditional on the model prediction.
In the second part of this thesis, we explore different notions of model calibration and how they are measured in practice. In particular, we characterize and fix certain deficiencies of Expected Calibration Error (ECE), which is the most popular way to measure calibration in applications. We additionally make recommendations for how calibration should be reported in practice, based on an analysis of the relationship between calibration and generalization metrics. Finally, we tie our analysis of calibration back to our initial foray into understanding data augmentation by showing rigorously how Mixup can be used to obtain better calibration.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Chidambaram, Muthuraman (2024). Understanding Feature Learning and Calibration of Machine Learning Models Through Simple Data Distributions. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32596.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.