EFFICIENT LOW-RESOURCE TRAINING WITH PRE-TRAINED DEEP NEURAL NETWORKS
Date
2023
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
The performance of machine learning systems has been dramatically improving in recent years thanks to the advent of pre-trained deep neural networks. Such models are generally adopted as the foundation for feature extraction, yielding state-of-the-art results when being further trained (or fine-tuned) on downstream tasks. Nonetheless, the pre-trained deep neural networks are generally huge in size, e.g., with millions or billions of parameters, thus demanding abundant data and computation resources during fine-tuning. Therefore, it is of pragmatic merit to investigate efficient training approaches with such pre-trained deep neural networks for low-resource scenarios, i.e., when there is limited computation budget or annotated data for the down stream tasks. In this dissertation, we consider low-resource scenarios for efficient training with the following tasks: i) natural language understanding, ii) fair text generation, and iii) compositional image retrieval.
We first study natural language understanding through its sub-tasks of sequence classification and sequence labeling.We propose an attention-based architecture and a data-free distillation framework, respectively, both designed for the scenario where there are limited data annotations. These approaches improves the data efficiency in fine-tuning pre-trained deep neural networks for better understanding of natural language. We then explore strategies in fine-tuning pre-trained language models for demographic fairness in text generation. Specifically, we propose to minimize the mutual information between the semantics in the generated text sentences and their demographic polarity, i.e., the demographic group to which the sentence is referring. We develop a computational efficient approach in estimating the upper bound of such mutual information via importance sampling, reducing the number of model inference required during training. Finally, for compositional image retrieval, we propose a visual prompt tuning mechanism and a self-supervised auxiliary task that adapt a pre-trained vision-language model to perform image retrieval tasks with only few annotations, while improving the computation efficiency by allowing the pre-trained parameters to be frozen during training.
Overall, my research work draws attention to the data and computational efficiency of the current large pre-trained deep neural networks, improving the flexibility in deploying such models in the considered low-resource scenarios.
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Wang, Rui (2023). EFFICIENT LOW-RESOURCE TRAINING WITH PRE-TRAINED DEEP NEURAL NETWORKS. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/29180.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.