Deep Generative Models for Vision and Language Intelligence

Gan, Zhe

Deep Generative Models for Vision and Language Intelligence

dc.contributor.advisor	Carin, Lawrence
dc.contributor.author	Gan, Zhe
dc.date.accessioned	2018-05-31T21:12:25Z
dc.date.available	2018-05-31T21:12:25Z
dc.date.issued	2018
dc.department	Electrical and Computer Engineering
dc.description.abstract	Deep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts. In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling. In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing.
dc.identifier.uri	https://hdl.handle.net/10161/16810
dc.subject	Artificial intelligence
dc.subject	Electrical engineering
dc.subject	Computer science
dc.subject	Deep generative models
dc.subject	Deep learning
dc.subject	generative adversarial networks
dc.subject	Machine learning
dc.subject	sigmoid belief networks
dc.subject	visual captioning
dc.title	Deep Generative Models for Vision and Language Intelligence
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Gan_duke_0066D_14414.pdf
Size:: 11.8 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations