Deep Generative Models for Vision and Language Intelligence

dc.contributor.advisor

Carin, Lawrence

dc.contributor.author

Gan, Zhe

dc.date.accessioned

2018-05-31T21:12:25Z

dc.date.available

2018-05-31T21:12:25Z

dc.date.issued

2018

dc.department

Electrical and Computer Engineering

dc.description.abstract

Deep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts.

In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling.

In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing.

dc.identifier.uri

https://hdl.handle.net/10161/16810

dc.subject

Artificial intelligence

dc.subject

Electrical engineering

dc.subject

Computer science

dc.subject

Deep generative models

dc.subject

Deep learning

dc.subject

generative adversarial networks

dc.subject

Machine learning

dc.subject

sigmoid belief networks

dc.subject

visual captioning

dc.title

Deep Generative Models for Vision and Language Intelligence

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gan_duke_0066D_14414.pdf
Size:
11.8 MB
Format:
Adobe Portable Document Format

Collections