Stochastic nested variance reduction for nonconvex optimization
dc.contributor.author | Zhou, D | |
dc.contributor.author | Xu, P | |
dc.contributor.author | Gu, Q | |
dc.date.accessioned | 2024-06-03T21:53:09Z | |
dc.date.available | 2024-06-03T21:53:09Z | |
dc.date.issued | 2020-05-01 | |
dc.description.abstract | We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K + 1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ε-approximate first-order stationary point within Oe(n∧ε−2 + ε−3 ∧n1/2ε−2)1 number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+ n2/3ε−2) and that of SCSG O(n∧ε−2 + ε−10/3 ∧n2/3ε−2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG + Neon2finite algorithm achieves Oe(n1/2ε−2 + nε−H3 + n3/4ε−H7/2) gradient complexity to converge to an (ε, εH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG + Neon2online achieves Oe(ε−3 + ε−H5 + ε−2ε−H3) gradient complexity, which is better than both SVRG + Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory. | |
dc.identifier.issn | 1532-4435 | |
dc.identifier.issn | 1533-7928 | |
dc.identifier.uri | ||
dc.relation.ispartof | Journal of Machine Learning Research | |
dc.rights.uri | ||
dc.subject | Nonconvex Optimization | |
dc.subject | Finding Local Minima | |
dc.subject | Variance Reduction | |
dc.title | Stochastic nested variance reduction for nonconvex optimization | |
dc.type | Journal article | |
duke.contributor.orcid | Xu, P|0000-0002-2559-8622 | |
pubs.organisational-group | Duke | |
pubs.organisational-group | Pratt School of Engineering | |
pubs.organisational-group | School of Medicine | |
pubs.organisational-group | Trinity College of Arts & Sciences | |
pubs.organisational-group | Basic Science Departments | |
pubs.organisational-group | Biostatistics & Bioinformatics | |
pubs.organisational-group | Electrical and Computer Engineering | |
pubs.organisational-group | Computer Science | |
pubs.organisational-group | Biostatistics & Bioinformatics, Division of Integrative Genomics | |
pubs.publication-status | Published | |
pubs.volume | 21 |
Files
Original bundle
- Name:
- jmlr20_SNVRG.pdf
- Size:
- 1.09 MB
- Format:
- Adobe Portable Document Format
- Description:
- Published version