Browsing by Subject "Text generation"
- Results Per Page
- Sort Options
Item Open Access Deep Latent-Variable Models for Natural Language Understanding and Generation(2020) Shen, DinghanDeep latent-variable models have been widely adopted to model various types of data, due to its ability to: 1) infer rich high-level information from the input data (especially in a low-resource setting); 2) result in a generative network that can synthesize samples unseen during training. In this dissertation, I will present the contributions I have made to leverage the general framework of latent-variable model to various natural language processing problems, which is especially challenging given the discrete nature of text sequences. Specifically, the dissertation is divided into two parts.
In the first part, I will present two of my recent explorations on leveraging deep latent-variable models for natural language understanding. The goal here is to learn meaningful text representations that can be helpful for tasks such as sentence classification, natural language inference, question answering, etc. Firstly, I will propose a variational autoencoder based on textual data to digest unlabeled information. To alleviate the observed posterior collapse issue, a specially-designed deconvolutional decoder is employed as the generative network. The resulting sentence embeddings greatly boost the downstream tasks performances. Then I will present a model to learn compressed/binary sentence embeddings, which is storage-efficient and applicable to on-device applications.
As to the second part, I will introduce a multi-level Variational Autoencoder (VAE) to model long-form text sequences (with as many as 60 words). A multi-level generative network is leveraged to capture the word-level, sentence-level coherence, respectively. Moreover, with a hierarchical design of the latent space, long-form and coherent texts can be more reliably produced (relative to baseline text VAE models). Semantically-rich latent representations are also obtained in such an unsupervised manner. Human evaluation further demonstrates the superiority of the proposed method.
Item Open Access LEARNING BOTH EXPERT AND UNIVERSAL KNOWLEDGE USING TRANSFORMERS(2022) Li, YuanThe Transformer has demonstrated superior performance in various natural language processing (NLP) tasks, including machine translation, language understanding, and text generation. The proposed multihead attention mechanism provides strong flexibility for fusing contextual information and therefore facilitates long-range relation modeling. Further, Transformers have proved effective for learning universal knowledge at scale, and representative models are BERT, GPT, and their subsequent variants. It is observed that the Transformer is more tolerant to the convergence plateau and is capable of scaling to more than one hundred billion parameters.Despite these advances, we believe that the Transformer can be further pushed toward the two extremes of knowledge learning: expert knowledge and universal knowledge. On the one hand, professional knowledge, such as medical knowledge accumulated by humans through a large amount of education and practice, plays a vital role in professional disciplines. However, due to the various forms of expert knowledge (e.g., knowledge graphs, textual templates, and tables of statistics) and the need to develop different Transformer models to deal with different forms of knowledge, there is an urgent need for a unified framework to efficiently encode and decode different types of knowledge. On the other hand, learning universal knowledge requires substantial training data and a large model size to absorb the information from unlabeled data in a self-supervised manner. However, the existing self- supervised language models lack a structured encoding of the input and therefore fail to generate plausible text in a controllable way. Moreover, learning from high-dimensional input, such as image pixels, is challenging for the Transformer due to the heavy computational consumption and sparse semantic information represented by the pixels. In this proposal, we address these challenges by first defining a unified formulation for acquiring both expert and universal knowledge and then developing several novel Transformer models and their variants, including the Graph Transformers, the Variational Autoencoders (VAEs) implemented by the Transformer architecture, and the Visual-Linguistic Masked Autoencoders (VL-MAEs) for learning visual representation with additional language supervision. Finally, the techniques developed within this proposal will alleviate the burden and lower the entry bar of learning with universal knowledge and expertise for ML researchers and practitioners. They will also reduce the cost of research.