Deep Latent-Variable Models for Natural Language Understanding and Generation
Deep latent-variable models have been widely adopted to model various types of data, due to its ability to: 1) infer rich high-level information from the input data (especially in a low-resource setting); 2) result in a generative network that can synthesize samples unseen during training. In this dissertation, I will present the contributions I have made to leverage the general framework of latent-variable model to various natural language processing problems, which is especially challenging given the discrete nature of text sequences. Specifically, the dissertation is divided into two parts.
In the first part, I will present two of my recent explorations on leveraging deep latent-variable models for natural language understanding. The goal here is to learn meaningful text representations that can be helpful for tasks such as sentence classification, natural language inference, question answering, etc. Firstly, I will propose a variational autoencoder based on textual data to digest unlabeled information. To alleviate the observed posterior collapse issue, a specially-designed deconvolutional decoder is employed as the generative network. The resulting sentence embeddings greatly boost the downstream tasks performances. Then I will present a model to learn compressed/binary sentence embeddings, which is storage-efficient and applicable to on-device applications.
As to the second part, I will introduce a multi-level Variational Autoencoder (VAE) to model long-form text sequences (with as many as 60 words). A multi-level generative network is leveraged to capture the word-level, sentence-level coherence, respectively. Moreover, with a hierarchical design of the latent space, long-form and coherent texts can be more reliably produced (relative to baseline text VAE models). Semantically-rich latent representations are also obtained in such an unsupervised manner. Human evaluation further demonstrates the superiority of the proposed method.
Natural Language Processing
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info