On the Knowledge Transfer via Pretraining, Distillation and Federated Learning

dc.contributor.advisor

Carin, Lawrence

dc.contributor.author

Hao, Weituo

dc.date.accessioned

2022-06-15T18:44:25Z

dc.date.available

2022-06-15T18:44:25Z

dc.date.issued

2022

dc.department

Electrical and Computer Engineering

dc.description.abstract

Modern machine learning technology based on a revival of deep neural networks has been successfully applied in many pragmatic domains such as computer vision(CV) and natural language processing(NLP). The very standard paradigm is \emph{pre-training}: a large model with billions of parameters is trained on a surrogate task and then adapted to the downstream task of interest via fine-tuning. Knowledge transfer is what makes the pre-training possible, but the scale is what makes it powerful, which requires the availability of much more training data and computing resources.

\hspace{0.5cm} Along with the great success of deep learning, fueled by larger datasets and more computation capability, however, come a series of interesting research topics. First, most pre-trained models learn on one-modal(vision or text) dataset and are designed for the single-step downstream task such as classification. Does pre-training for more complex tasks such as reinforcement learning still work? Second, pre-trained models obtain impressive empirical performance at the price of deployment challenges on low-resource(both memory and computation) platforms. How to compress the large models into smaller ones in an efficient way? Third, collecting sufficient training data is often expensive, time-consuming, or even unrealistic in many scenarios due to privacy constraints. Does it exist a training paradigm without data exchange?

\hspace{0.5cm}For less explored questions mentioned above, I conducted several projects related but not limited to: $\RN{1}$) large-scale pre-training based on multi-modal input for vision and language navigation, proofing the effectiveness of knowledge transfer across complex tasks via ~\emph{pre-training}; $\RN{2}$) data augmentation for compressing large-scale language models, improving the efficiency of knowledge transfer in the teacher-student \emph{distillation} framework; $\RN{3}$) weight factorization for model weights sharing in \emph{Federated Learning}, achieving the trade-off between model performance and data privacy.

dc.identifier.uri

https://hdl.handle.net/10161/25279

dc.subject

Computer engineering

dc.subject

Electrical engineering

dc.subject

Deep learning

dc.subject

Federated learning

dc.subject

Machine learning

dc.subject

Representation learning

dc.title

On the Knowledge Transfer via Pretraining, Distillation and Federated Learning

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Hao_duke_0066D_16765.pdf
Size:
17.23 MB
Format:
Adobe Portable Document Format

Collections