Deep Generative Models for Vision, Languages and Graphs

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Deep generative models have achieved remarkable success in modeling various types of data, ranging from vision, languages and graphs etc. They offer flexible and complementary representations for both labeled and unlabeled data. Moreover, they are naturally capable of generating realistic data. In this thesis, novel variations of generative models have been proposed for various learning tasks, which can be categorized into three parts.

In the first part, generative models are designed to learn generalized representation for images under Zero-Shot Learning (ZSL) setting. An attribute conditioned variational autoencoder is introduced, representing each class as a latent-space distribution and enabling learning highly discriminative and robust feature representations. It endows the generative model discriminative power by choosing one class that maximize the variational lower bound. I further show that the model can be naturally generalized to transductive and few-shot setting.

In the second part, generative models are proposed for controllable language generation. Specifically, two types of topic enrolled language generation models have been proposed. The first introduces a topic compositional neural language model for controllable and interpretable language generation via a mixture-of-expert model design. While the second solve the problem via a VAE framework with a topic-conditioned GMM model design. Both of the two models have boosted the performance of existing language generation systems with controllable properties.

In the third part, generative models are introduced for the broaden graph data. First, a variational homophilic embedding (VHE) model is proposed. It is a fully generative model that learns network embeddings by modeling the textual semantic information with a variational autoencoder, while accounting for the graph structure information through a homophilic prior design. Secondly, for the heterogeneous multi-task learning, a novel graph-driven generative model is developed to unifies them into the same framework. It combines graph convolutional network (GCN) with multiple VAEs, thus embedding the nodes of graph in a uniform manner while specializing their organization and usage to different tasks.





Wang, Wenlin (2019). Deep Generative Models for Vision, Languages and Graphs. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.