GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

dc.contributor.advisor

Xu, Jiaming

dc.contributor.author

Zhu, Hanjing

dc.date.accessioned

2024-03-07T18:39:08Z

dc.date.available

2024-03-07T18:39:08Z

dc.date.issued

2023

dc.department

Business Administration

dc.description.abstract

Modern machine learning methods have demonstrated remarkable success in many indus- tries. For instance, the famous ChatGPT relies on a machine learning model trained with a substantial volume of text and conversation data. To achieve optimal model performance, an efficient optimization algorithm is essential for learning the model parameters. Among optimization methods, gradient descent (GD) methods are the simplest ones. Traditionally, GD methods have shown excellent performance in conventional machine learning problems with nice objective functions and simple training paradigms where computation occurs on a single server storing all the data. However, the understanding of how GD methods perform in modern machine learning problems with non-convex and non-smooth objective functions or more complex training paradigm remains limited.This thesis is dedicated to providing a theoretical understanding of why gradient descent methods excel in modern machine learning problems. In the first half of the thesis, we study stochastic gradient descent(SGD) in training multi-layer fully connected feedforward neural networks with Rectified Linear Unit (ReLU) activation. Since the loss function in training deep neural networks is non-convex and non-smooth, the standard convergence guarantees of GD for convex loss functions cannot be applied. Instead, through a kernel perspective, we demonstrate that when fresh data arrives in a stream, SGD ensures the exponential convergence of the average prediction error. In the second half, we investigate the utilization of GD methods in a new training paradigm, featuring a central parameter server (PS) and numerous clients storing data locally. Privacy constraints prevent the local data from being revealed to the PS, making this distributed learning setting particularly relevant in the current big-data era where data is often sensitive and too large to be stored on a single device. In practical applications, this distributed setting presents two major challenges: data heterogeneity and adversarial attacks. To overcome these challenges and achieve accurate estimates of model parameters, we propose a GD-based algorithm and provide convergence guarantees for both strongly convex and non-convex loss functions.

dc.identifier.uri

https://hdl.handle.net/10161/30273

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Business administration

dc.subject

Operations research

dc.subject

Computer science

dc.subject

Byzantine Model

dc.subject

Deep learning

dc.subject

Federated learning

dc.subject

Gradient Descent

dc.subject

Machine learning

dc.subject

Neural network

dc.title

GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Zhu_duke_0066D_17625.pdf
Size:
2.7 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Zhu_duke_0066D_17/emb_dis.pdf
Size:
176.52 KB
Format:
Adobe Portable Document Format

Collections