Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

dc.contributor.advisor

Ge, Rong

dc.contributor.author

Wu, Chenwei

dc.date.accessioned

2023-06-08T18:22:42Z

dc.date.available

2023-06-08T18:22:42Z

dc.date.issued

2023

dc.department

Computer Science

dc.description.abstract

Neural networks have achieved remarkable empirical success in various areas. One key factor of their success is their ability to automatically learn useful representations from data. Self-supervised representation learning, which learns the representations during pre-training and applies learned representations in downstream tasks, has become the dominant approach for representation learning in recent years. However, theoretical understanding of self-supervised representation learning is scarce. Two main bottlenecks in understanding self-supervised representation learning are the big differences between pre-training and downstream tasks and the difficulties in neural network optimization. In this thesis, we present an initial exploration into analyzing the benefit of pre-training in self-supervised representation learning and two heuristics in neural network optimization.

The first part of this thesis presents our attempts to understand why the representations produced by pre-trained models are useful in downstream tasks. We assume we can optimize the training objective well in this part. For the over-realized sparse coding model with noise, we show that the masking objective used in pre-training ensures the recovery of ground-truth model parameters. For a more complicated log-linear word model, we characterize what downstream tasks can benefit from the learned representations in pre-training. Our experiments validate these theoretical results.

The second part of this thesis provides explanations about two important phenomena in the neural network optimization landscape. We first propose and rigorously prove a novel conjecture that explains the low-rank structure of the layer-wise neural network Hessian. Our conjecture is verified experimentally and can be used to tighten generalization bounds for neural networks. We also study the training stability and generalization problem in the learning-to-learn framework where machine learning algorithms are used to learn parameters for training neural networks. We rigorously proved our conjectures in simple models and empirically verified our theoretical results in our experiments with practical neural networks and real data.

Our results provide theoretical understanding of the benefits of pre-training for downstream tasks and two important heuristics of neural network optimization landscape. We hope these insights could further improve the performance of self-supervised representation learning approaches and inspire the design of new algorithms.

dc.identifier.uri

https://hdl.handle.net/10161/27696

dc.subject

Computer science

dc.subject

Neural networks

dc.subject

Optimization

dc.subject

Representation learning

dc.subject

Self-supervised learning

dc.title

Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Wu_duke_0066D_17297.pdf
Size:
3.64 MB
Format:
Adobe Portable Document Format

Collections