Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning

Thumbnail Image




Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Neural networks have achieved remarkable empirical success in various areas. One key factor of their success is their ability to automatically learn useful representations from data. Self-supervised representation learning, which learns the representations during pre-training and applies learned representations in downstream tasks, has become the dominant approach for representation learning in recent years. However, theoretical understanding of self-supervised representation learning is scarce. Two main bottlenecks in understanding self-supervised representation learning are the big differences between pre-training and downstream tasks and the difficulties in neural network optimization. In this thesis, we present an initial exploration into analyzing the benefit of pre-training in self-supervised representation learning and two heuristics in neural network optimization.

The first part of this thesis presents our attempts to understand why the representations produced by pre-trained models are useful in downstream tasks. We assume we can optimize the training objective well in this part. For the over-realized sparse coding model with noise, we show that the masking objective used in pre-training ensures the recovery of ground-truth model parameters. For a more complicated log-linear word model, we characterize what downstream tasks can benefit from the learned representations in pre-training. Our experiments validate these theoretical results.

The second part of this thesis provides explanations about two important phenomena in the neural network optimization landscape. We first propose and rigorously prove a novel conjecture that explains the low-rank structure of the layer-wise neural network Hessian. Our conjecture is verified experimentally and can be used to tighten generalization bounds for neural networks. We also study the training stability and generalization problem in the learning-to-learn framework where machine learning algorithms are used to learn parameters for training neural networks. We rigorously proved our conjectures in simple models and empirically verified our theoretical results in our experiments with practical neural networks and real data.

Our results provide theoretical understanding of the benefits of pre-training for downstream tasks and two important heuristics of neural network optimization landscape. We hope these insights could further improve the performance of self-supervised representation learning approaches and inspire the design of new algorithms.





Wu, Chenwei (2023). Theoretical Understanding of Neural Network Optimization Landscape and Self-Supervised Representation Learning. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.