Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

Wu, Chenwei

Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

View / Download3.64 MB

Date

2023

Authors

Wu, Chenwei

Advisors

Ge, Rong

Repository Usage Stats

72
views

162
downloads

Abstract

Neural networks have achieved remarkable empirical success in various areas. One key factor of their success is their ability to automatically learn useful representations from data. Self-supervised representation learning, which learns the representations during pre-training and applies learned representations in downstream tasks, has become the dominant approach for representation learning in recent years. However, theoretical understanding of self-supervised representation learning is scarce. Two main bottlenecks in understanding self-supervised representation learning are the big differences between pre-training and downstream tasks and the difficulties in neural network optimization. In this thesis, we present an initial exploration into analyzing the benefit of pre-training in self-supervised representation learning and two heuristics in neural network optimization.

The first part of this thesis presents our attempts to understand why the representations produced by pre-trained models are useful in downstream tasks. We assume we can optimize the training objective well in this part. For the over-realized sparse coding model with noise, we show that the masking objective used in pre-training ensures the recovery of ground-truth model parameters. For a more complicated log-linear word model, we characterize what downstream tasks can benefit from the learned representations in pre-training. Our experiments validate these theoretical results.

The second part of this thesis provides explanations about two important phenomena in the neural network optimization landscape. We first propose and rigorously prove a novel conjecture that explains the low-rank structure of the layer-wise neural network Hessian. Our conjecture is verified experimentally and can be used to tighten generalization bounds for neural networks. We also study the training stability and generalization problem in the learning-to-learn framework where machine learning algorithms are used to learn parameters for training neural networks. We rigorously proved our conjectures in simple models and empirically verified our theoretical results in our experiments with practical neural networks and real data.

Our results provide theoretical understanding of the benefits of pre-training for downstream tasks and two important heuristics of neural network optimization landscape. We hope these insights could further improve the performance of self-supervised representation learning approaches and inspire the design of new algorithms.

Type

Dissertation

Department

Computer Science

Subjects

Computer science, Neural networks, Optimization, Representation learning, Self-supervised learning

Permalink

https://hdl.handle.net/10161/27696

Citation

Wu, Chenwei (2023). Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/27696.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections