GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

Zhu, Hanjing

GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

dc.contributor.advisor	Xu, Jiaming
dc.contributor.author	Zhu, Hanjing
dc.date.accessioned	2024-03-07T18:39:08Z
dc.date.available	2024-03-07T18:39:08Z
dc.date.issued	2023
dc.department	Business Administration
dc.description.abstract	Modern machine learning methods have demonstrated remarkable success in many indus- tries. For instance, the famous ChatGPT relies on a machine learning model trained with a substantial volume of text and conversation data. To achieve optimal model performance, an efficient optimization algorithm is essential for learning the model parameters. Among optimization methods, gradient descent (GD) methods are the simplest ones. Traditionally, GD methods have shown excellent performance in conventional machine learning problems with nice objective functions and simple training paradigms where computation occurs on a single server storing all the data. However, the understanding of how GD methods perform in modern machine learning problems with non-convex and non-smooth objective functions or more complex training paradigm remains limited.This thesis is dedicated to providing a theoretical understanding of why gradient descent methods excel in modern machine learning problems. In the first half of the thesis, we study stochastic gradient descent(SGD) in training multi-layer fully connected feedforward neural networks with Rectified Linear Unit (ReLU) activation. Since the loss function in training deep neural networks is non-convex and non-smooth, the standard convergence guarantees of GD for convex loss functions cannot be applied. Instead, through a kernel perspective, we demonstrate that when fresh data arrives in a stream, SGD ensures the exponential convergence of the average prediction error. In the second half, we investigate the utilization of GD methods in a new training paradigm, featuring a central parameter server (PS) and numerous clients storing data locally. Privacy constraints prevent the local data from being revealed to the PS, making this distributed learning setting particularly relevant in the current big-data era where data is often sensitive and too large to be stored on a single device. In practical applications, this distributed setting presents two major challenges: data heterogeneity and adversarial attacks. To overcome these challenges and achieve accurate estimates of model parameters, we propose a GD-based algorithm and provide convergence guarantees for both strongly convex and non-convex loss functions.
dc.identifier.uri	https://hdl.handle.net/10161/30273
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Business administration
dc.subject	Operations research
dc.subject	Computer science
dc.subject	Byzantine Model
dc.subject	Deep learning
dc.subject	Federated learning
dc.subject	Gradient Descent
dc.subject	Machine learning
dc.subject	Neural network
dc.title	GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Zhu_duke_0066D_17625.pdf
Size:: 2.7 MB
Format:: Adobe Portable Document Format

Download

Name:: Zhu_duke_0066D_17/emb_dis.pdf
Size:: 176.52 KB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations