Deep Generative Models for Vision and Language Intelligence
dc.contributor.advisor | Carin, Lawrence | |
dc.contributor.author | Gan, Zhe | |
dc.date.accessioned | 2018-05-31T21:12:25Z | |
dc.date.available | 2018-05-31T21:12:25Z | |
dc.date.issued | 2018 | |
dc.department | Electrical and Computer Engineering | |
dc.description.abstract | Deep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts. In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling. In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing. | |
dc.identifier.uri | ||
dc.subject | Artificial intelligence | |
dc.subject | Electrical engineering | |
dc.subject | Computer science | |
dc.subject | Deep generative models | |
dc.subject | Deep learning | |
dc.subject | generative adversarial networks | |
dc.subject | Machine learning | |
dc.subject | sigmoid belief networks | |
dc.subject | visual captioning | |
dc.title | Deep Generative Models for Vision and Language Intelligence | |
dc.type | Dissertation |