Browsing by Author "Sundararaman, Dhanasekar"
Results Per Page
Sort Options
Item Open Access Structure and Feedback-based Natural Language Processing(2022) Sundararaman, DhanasekarThe development of deep learning models has revolutionized the way information is processed by computers and has made significant advancements in fields such as speech, vision, and language. In language, giant strides have been made ranging from seq2seq models that process one word at a time to more sophisticated Transformer networks that can feed on paragraphs of text. Due to their ability to generate coherent and meaningful sentences at scale, natural language processing (NLP) models have become so prevalent. Despite their effectiveness, these models often have room for improvement when presented with additional linguistic information. In this dissertation, I discuss my contributions to the use of (a) structural information, namely syntax, and numeric structures, often overlooked and underutilized by language models, and (b) feedback-based models in which the objectives of the main model are guided by feedback from a supporting model.
In the first part, I will present three of my contributions which explore the use of structural information to develop effective NLP models that outperform their baselines on tasks that require encoders and decoders, such as machine translation, as well as downstream tasks, such as text classification, question answering, fill-in-the-blanks, etc. The first contribution proposes techniques for consuming syntactic information such as part of speech, word position, and case in order to improve the performance of machine translation on data-heavy Transformer models. An accompanying case study compares and contrasts a seq2seq model with a Transformer in its ability to absorb syntax across many language pairs. The second and third contributions concern utilizing a numeric structure that is prevalent in languages, as a means of incorporating numeral reasoning into language models. Collectively, these contributions contribute to the improvement of translation performance, numerical question-answer reasoning, and other downstream tasks.
In the second part, I will present my two contributions that utilize feedback signals from a supporting model to achieve an optimization objective that enhances the performance of the main model. The first contribution deals with the main model as a multi-task model that performs language inference across multiple task languages, whereas the supporting model uses reinforcement learning to ascertain the importance of each task that is not known apriori. Based on this approach, the resultant mix of tasks led to significant improvements in the performance of the target language task. In the second contribution, a supporting model is used to select tokens that will most likely be out-of-distribution (OOD) tokens by using Mahalanobis distance and performing a technique known to language models as self-supervision. Using a novel regularization loss, the distance between in-domain tokens and pseudo-OOD tokens is maximized, which results in significant performance improvements when detecting OODs.