Browsing by Author "Xu, Jiaming"
Results Per Page
Sort Options
Item Open Access GRADIENT DESCENT METHODS IN MODERN MACHINE LEARNING PROBLEMS: PROVABLE GUARANTEES(2023) Zhu, HanjingModern machine learning methods have demonstrated remarkable success in many indus- tries. For instance, the famous ChatGPT relies on a machine learning model trained with a substantial volume of text and conversation data. To achieve optimal model performance, an efficient optimization algorithm is essential for learning the model parameters. Among optimization methods, gradient descent (GD) methods are the simplest ones. Traditionally, GD methods have shown excellent performance in conventional machine learning problems with nice objective functions and simple training paradigms where computation occurs on a single server storing all the data. However, the understanding of how GD methods perform in modern machine learning problems with non-convex and non-smooth objective functions or more complex training paradigm remains limited.This thesis is dedicated to providing a theoretical understanding of why gradient descent methods excel in modern machine learning problems. In the first half of the thesis, we study stochastic gradient descent(SGD) in training multi-layer fully connected feedforward neural networks with Rectified Linear Unit (ReLU) activation. Since the loss function in training deep neural networks is non-convex and non-smooth, the standard convergence guarantees of GD for convex loss functions cannot be applied. Instead, through a kernel perspective, we demonstrate that when fresh data arrives in a stream, SGD ensures the exponential convergence of the average prediction error. In the second half, we investigate the utilization of GD methods in a new training paradigm, featuring a central parameter server (PS) and numerous clients storing data locally. Privacy constraints prevent the local data from being revealed to the PS, making this distributed learning setting particularly relevant in the current big-data era where data is often sensitive and too large to be stored on a single device. In practical applications, this distributed setting presents two major challenges: data heterogeneity and adversarial attacks. To overcome these challenges and achieve accurate estimates of model parameters, we propose a GD-based algorithm and provide convergence guarantees for both strongly convex and non-convex loss functions.
Item Open Access Matching in networks: fundamental limits and efficient algorithms(2023) Yu, Sophie H.In recent years, there has been an explosion of online platforms and E-commerce, which generate massive amounts of network data. For example, a friend link on Facebook represents a connection between users, and a rating on Amazon is a connection between a customer and a product. These network data contain extensive information on customers' behaviors and preferences. However, deriving meaningful insights from the data can be challenging. On the one hand, networks are often extremely large and have millions of nodes and billions of edges. On the other hand, the network can be sparse, given that some nodes interact only with a few other nodes with a significant amount of noise from missing or faulty connections. Thus, extracting valuable information from such noisy and vast amounts of data requires highly efficient approaches that can process a large amount of data and detect tenuous statistical signatures. As such, my thesis has developed along the following two interrelated streams.
The first stream aims at developing the fundamental limits and designing simple and scalable algorithms for learning complex networks. We base our study upon the decision-theoretic framework wherein an unknown structure underlies the network data. We aim to detect and recover this hidden structure based on partial or noisy observation. In particular, we focus on the problem of graph matching (network alignment), which refers to finding the bijection between the vertex sets of the two graphs that maximizes the total number of common edges. We have initiated a statistical analysis of detection and recovery problems by matching two randomly generated graphs with an underlying vertex correspondence and correlated edges. We characterize the sharp fundamental limits for both problems and develop new algorithms that are the first to achieve detection and recovery with an explicit constant correlation for both sparse and dense regimes, which settled important open problems in this field.
The second stream focuses on improving network systems' decision-making efficiency under uncertainties and limited information. We study the dynamic matching problem in which a new agent enters the market randomly and waits to be matched, and the arrival rates for different types of agents are unknown. The goal is to maximize market efficiency and, at the same time, reduce the wait time for each participant in the system. This problem is inspired by the car-pooling platform, in which after they make a request, riders may have to wait on the platform for some time for potential matches with other riders traveling in the same direction based on their pick-up and drop-off locations. We develop hindsight-optimal primal-dual policies that achieve both constant regret and wait time for each type of agent on average at all times.