Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

Wu, Chenwei

Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

dc.contributor.advisor	Ge, Rong
dc.contributor.author	Wu, Chenwei
dc.date.accessioned	2023-06-08T18:22:42Z
dc.date.available	2023-06-08T18:22:42Z
dc.date.issued	2023
dc.department	Computer Science
dc.description.abstract	Neural networks have achieved remarkable empirical success in various areas. One key factor of their success is their ability to automatically learn useful representations from data. Self-supervised representation learning, which learns the representations during pre-training and applies learned representations in downstream tasks, has become the dominant approach for representation learning in recent years. However, theoretical understanding of self-supervised representation learning is scarce. Two main bottlenecks in understanding self-supervised representation learning are the big differences between pre-training and downstream tasks and the difficulties in neural network optimization. In this thesis, we present an initial exploration into analyzing the benefit of pre-training in self-supervised representation learning and two heuristics in neural network optimization. The first part of this thesis presents our attempts to understand why the representations produced by pre-trained models are useful in downstream tasks. We assume we can optimize the training objective well in this part. For the over-realized sparse coding model with noise, we show that the masking objective used in pre-training ensures the recovery of ground-truth model parameters. For a more complicated log-linear word model, we characterize what downstream tasks can benefit from the learned representations in pre-training. Our experiments validate these theoretical results. The second part of this thesis provides explanations about two important phenomena in the neural network optimization landscape. We first propose and rigorously prove a novel conjecture that explains the low-rank structure of the layer-wise neural network Hessian. Our conjecture is verified experimentally and can be used to tighten generalization bounds for neural networks. We also study the training stability and generalization problem in the learning-to-learn framework where machine learning algorithms are used to learn parameters for training neural networks. We rigorously proved our conjectures in simple models and empirically verified our theoretical results in our experiments with practical neural networks and real data. Our results provide theoretical understanding of the benefits of pre-training for downstream tasks and two important heuristics of neural network optimization landscape. We hope these insights could further improve the performance of self-supervised representation learning approaches and inspire the design of new algorithms.
dc.identifier.uri	https://hdl.handle.net/10161/27696
dc.subject	Computer science
dc.subject	Neural networks
dc.subject	Optimization
dc.subject	Representation learning
dc.subject	Self-supervised learning
dc.title	Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wu_duke_0066D_17297.pdf
Size:: 3.64 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations