On the Knowledge Transfer via Pretraining, Distillation and Federated Learning

Hao, Weituo

On the Knowledge Transfer via Pretraining, Distillation and Federated Learning

dc.contributor.advisor	Carin, Lawrence
dc.contributor.author	Hao, Weituo
dc.date.accessioned	2022-06-15T18:44:25Z
dc.date.available	2022-06-15T18:44:25Z
dc.date.issued	2022
dc.department	Electrical and Computer Engineering
dc.description.abstract	Modern machine learning technology based on a revival of deep neural networks has been successfully applied in many pragmatic domains such as computer vision(CV) and natural language processing(NLP). The very standard paradigm is \emph{pre-training}: a large model with billions of parameters is trained on a surrogate task and then adapted to the downstream task of interest via fine-tuning. Knowledge transfer is what makes the pre-training possible, but the scale is what makes it powerful, which requires the availability of much more training data and computing resources. \hspace{0.5cm} Along with the great success of deep learning, fueled by larger datasets and more computation capability, however, come a series of interesting research topics. First, most pre-trained models learn on one-modal(vision or text) dataset and are designed for the single-step downstream task such as classification. Does pre-training for more complex tasks such as reinforcement learning still work? Second, pre-trained models obtain impressive empirical performance at the price of deployment challenges on low-resource(both memory and computation) platforms. How to compress the large models into smaller ones in an efficient way? Third, collecting sufficient training data is often expensive, time-consuming, or even unrealistic in many scenarios due to privacy constraints. Does it exist a training paradigm without data exchange? \hspace{0.5cm}For less explored questions mentioned above, I conducted several projects related but not limited to: $\RN{1}$) large-scale pre-training based on multi-modal input for vision and language navigation, proofing the effectiveness of knowledge transfer across complex tasks via ~\emph{pre-training}; $\RN{2}$) data augmentation for compressing large-scale language models, improving the efficiency of knowledge transfer in the teacher-student \emph{distillation} framework; $\RN{3}$) weight factorization for model weights sharing in \emph{Federated Learning}, achieving the trade-off between model performance and data privacy.
dc.identifier.uri	https://hdl.handle.net/10161/25279
dc.subject	Computer engineering
dc.subject	Electrical engineering
dc.subject	Deep learning
dc.subject	Federated learning
dc.subject	Machine learning
dc.subject	Representation learning
dc.title	On the Knowledge Transfer via Pretraining, Distillation and Federated Learning
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Hao_duke_0066D_16765.pdf
Size:: 17.23 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations