Exploring Deep Representation Learning on Vision and Language Intelligence

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Deep neural networks have achieved tremendous success in recent years, with applications in various tasks involving both computer vision and natural language processing. Representation learning is often adopted to extract useful latent features for these tasks. In this dissertation, I will discuss the contributions that I have made in using representation learning methodologies for deep generative models, as well as unsupervised domain adaptation.

The first part of the dissertation will mainly focus on deep generative models for vision and language intelligence. I will present Symmetric Variational Autoencoder, which unifies the Variational Bayesian and adversarial training frameworks. Then, I will show the application of such generative models in the natural language domain, and present a VAE framework with a hyperbolic latent space.

For the second part, I will mainly focus on representation learning for unsupervised domain adaptation (UDA). In this problem setup, we want to extract representative features that contain mostly task-oriented information but little domain-related information. I will first present to learn such features in a contrastive manner: pulling data of the same class together while pushing those that are not away from each other. Next, I will focus on UDA where large domain gaps exist. To tackle such a UDA problem, I propose to use unlabeled domain bridges, and transform the original problem into several intermediate ones.





Dai, Shuyang (2021). Exploring Deep Representation Learning on Vision and Language Intelligence. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/23764.


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.