Cheng, Xiuyuan XCTan, Yixuan2025-07-022025-07-022024https://hdl.handle.net/10161/32578<p>This thesis focuses on graph-based methods, providing both theoretical analysis of graph Laplacians under the manifold setting and practical applications to time-series data analysis.</p><p>%The first part of the thesis presents an improved convergence rate for $k$-nearest neighbor ($k$NN) graph Laplacians. We consider a general class of $k$NN graph where the graph affinity is $W_{ij} = \epsilon^{-d/2} k_0 ( \| x_i - x_j \|^2 / \epsilon \phi( \hrho(x_i), \hrho(x_j) )^2 ) $, with $\hat{\rho}(x)$ being the (rescaled) $k$NN distance at the point $x$, $\phi$ a symmetric bi-variate function, and $k_0$ a non-negative kernel function. Under the manifold setting, we prove the point-wise convergence of the $k$NN graph Laplacian to the limiting manifold operator at a rate of $O(N^{-2/(d+6)})$ when $k_0$ and $\phi$ is $C^3$ differentiable. When $k_0$ and $\phi$ have lower regularity, including when $k_0$ is a compactly supported function as in the standard $k$NN graph, the convergence rate degenerates into $O(N^{-1/(d+4)})$. These findings are validated through numerical experiments on simulated data.</p><p>In the second part, we propose a method for constructing heat kernel estimators by iteratively multiplying the graph transition matrix $n$ times. % When data are i.i.d. sampled from one-dimensional compact manifolds, we establish the convergence of the estimators to the manifold’s heat kernel with time $t = n\sigma^2$. % By setting the bandwidth parameter $\sigma \sim N^{-{1}/{d+6}}$ and the diffusion time $n \sim \sigma^{-2}$, the convergence rate is $O(N^{-{2}/{d+6}})$ up to a log factor. % Based on the heat kernel estimator, we then construct an affinity estimator that approximates $\exp(- {d_\sM(x_i,x_j)^2}/{(4\alpha n\sigma^2)})$. % We further develop an embedding algorithm that uses the heat kernel or affinity estimator as the similarity matrix for the t-SNE algorithm. % The theoretical results are validated on the simulated data, and the embedding algorithm’s effectiveness is demonstrated on both the simulated and real-world datasets.</p><p>The last part of the thesis addresses time-series data analysis.First, we propose a stochastic dynamic model on a graph for the epidemic data, incorporating the transportation between regions. We infer the transmission parameters using graph Laplacian regularization and demonstrate that this method improves prediction accuracy on real-world COVID datasets. Then, we introduce an RNN-based model that leverages the expressiveness of neural networks, and we propose an algorithm that adaptively selects time steps based on the steepness of changes in the data over time. This algorithm enables more efficient training for time series with sharp spikes. We provide an approximation analysis showing the benefits of adaptive steps and demonstrate that the proposed model improves prediction accuracy while reducing computational costs in both simulated dynamic system data and real-world electrocardiography data. </p>https://creativecommons.org/licenses/by-nc-nd/4.0/MathematicsGraph-based Learning of Manifold Diffusion and Time-Series ModelingDissertation