On the Knowledge Transfer via Pretraining, Distillation and Federated Learning

Loading...

Date

2022

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

576
views
936
downloads

Abstract

Modern machine learning technology based on a revival of deep neural networks has been successfully applied in many pragmatic domains such as computer vision(CV) and natural language processing(NLP). The very standard paradigm is \emph{pre-training}: a large model with billions of parameters is trained on a surrogate task and then adapted to the downstream task of interest via fine-tuning. Knowledge transfer is what makes the pre-training possible, but the scale is what makes it powerful, which requires the availability of much more training data and computing resources.

\hspace{0.5cm} Along with the great success of deep learning, fueled by larger datasets and more computation capability, however, come a series of interesting research topics. First, most pre-trained models learn on one-modal(vision or text) dataset and are designed for the single-step downstream task such as classification. Does pre-training for more complex tasks such as reinforcement learning still work? Second, pre-trained models obtain impressive empirical performance at the price of deployment challenges on low-resource(both memory and computation) platforms. How to compress the large models into smaller ones in an efficient way? Third, collecting sufficient training data is often expensive, time-consuming, or even unrealistic in many scenarios due to privacy constraints. Does it exist a training paradigm without data exchange?

\hspace{0.5cm}For less explored questions mentioned above, I conducted several projects related but not limited to: $\RN{1}$) large-scale pre-training based on multi-modal input for vision and language navigation, proofing the effectiveness of knowledge transfer across complex tasks via ~\emph{pre-training}; $\RN{2}$) data augmentation for compressing large-scale language models, improving the efficiency of knowledge transfer in the teacher-student \emph{distillation} framework; $\RN{3}$) weight factorization for model weights sharing in \emph{Federated Learning}, achieving the trade-off between model performance and data privacy.

Description

Provenance

Subjects

Computer engineering, Electrical engineering, Deep learning, Federated learning, Machine learning, Representation learning

Citation

Citation

Hao, Weituo (2022). On the Knowledge Transfer via Pretraining, Distillation and Federated Learning. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/25279.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.