Deep Generative Models for Vision and Language Intelligence
Date
2018
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Deep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts.
In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling.
In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Gan, Zhe (2018). Deep Generative Models for Vision and Language Intelligence. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/16810.
Collections
Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.