GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

Zhu, Hanjing

GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

Files

Zhu_duke_0066D_17625.pdf2.7 MB

Zhu_duke_0066D_17/emb_dis.pdf176.52 KB

Date

2023

Authors

Zhu, Hanjing

Advisors

Xu, Jiaming

Repository Usage Stats

12
views

48
downloads

Abstract

Modern machine learning methods have demonstrated remarkable success in many indus- tries. For instance, the famous ChatGPT relies on a machine learning model trained with a substantial volume of text and conversation data. To achieve optimal model performance, an efficient optimization algorithm is essential for learning the model parameters. Among optimization methods, gradient descent (GD) methods are the simplest ones. Traditionally, GD methods have shown excellent performance in conventional machine learning problems with nice objective functions and simple training paradigms where computation occurs on a single server storing all the data. However, the understanding of how GD methods perform in modern machine learning problems with non-convex and non-smooth objective functions or more complex training paradigm remains limited.This thesis is dedicated to providing a theoretical understanding of why gradient descent methods excel in modern machine learning problems. In the first half of the thesis, we study stochastic gradient descent(SGD) in training multi-layer fully connected feedforward neural networks with Rectified Linear Unit (ReLU) activation. Since the loss function in training deep neural networks is non-convex and non-smooth, the standard convergence guarantees of GD for convex loss functions cannot be applied. Instead, through a kernel perspective, we demonstrate that when fresh data arrives in a stream, SGD ensures the exponential convergence of the average prediction error. In the second half, we investigate the utilization of GD methods in a new training paradigm, featuring a central parameter server (PS) and numerous clients storing data locally. Privacy constraints prevent the local data from being revealed to the PS, making this distributed learning setting particularly relevant in the current big-data era where data is often sensitive and too large to be stored on a single device. In practical applications, this distributed setting presents two major challenges: data heterogeneity and adversarial attacks. To overcome these challenges and achieve accurate estimates of model parameters, we propose a GD-based algorithm and provide convergence guarantees for both strongly convex and non-convex loss functions.

Type

Dissertation

Department

Business Administration

Subjects

Business administration, Operations research, Computer science, Byzantine Model, Deep learning, Federated learning, Gradient Descent, Machine learning, Neural network

Permalink

https://hdl.handle.net/10161/30273

Rights

https://creativecommons.org/licenses/by-nc-nd/4.0/

Citation

Zhu, Hanjing (2023). GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30273.

Collections

Dissertations

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES

Files

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Rights

Citation

Collections