Stochastic nested variance reduction for nonconvex optimization

Zhou, D; Xu, P; Gu, Q

Stochastic nested variance reduction for nonconvex optimization

dc.contributor.author	Zhou, D
dc.contributor.author	Xu, P
dc.contributor.author	Gu, Q
dc.date.accessioned	2024-06-03T21:53:09Z
dc.date.available	2024-06-03T21:53:09Z
dc.date.issued	2020-05-01
dc.description.abstract	We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K + 1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ε-approximate first-order stationary point within Oe(n∧ε−2 + ε−3 ∧n1/2ε−2)1 number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+ n2/3ε−2) and that of SCSG O(n∧ε−2 + ε−10/3 ∧n2/3ε−2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG + Neon2finite algorithm achieves Oe(n1/2ε−2 + nε−H3 + n3/4ε−H7/2) gradient complexity to converge to an (ε, εH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG + Neon2online achieves Oe(ε−3 + ε−H5 + ε−2ε−H3) gradient complexity, which is better than both SVRG + Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.
dc.identifier.issn	1532-4435
dc.identifier.issn	1533-7928
dc.identifier.uri	https://hdl.handle.net/10161/30776
dc.relation.ispartof	Journal of Machine Learning Research
dc.rights.uri	https://creativecommons.org/licenses/by-nc/4.0
dc.subject	Nonconvex Optimization
dc.subject	Finding Local Minima
dc.subject	Variance Reduction
dc.title	Stochastic nested variance reduction for nonconvex optimization
dc.type	Journal article
duke.contributor.orcid	Xu, P\|0000-0002-2559-8622
pubs.organisational-group	Duke
pubs.organisational-group	Pratt School of Engineering
pubs.organisational-group	School of Medicine
pubs.organisational-group	Trinity College of Arts & Sciences
pubs.organisational-group	Basic Science Departments
pubs.organisational-group	Biostatistics & Bioinformatics
pubs.organisational-group	Electrical and Computer Engineering
pubs.organisational-group	Computer Science
pubs.organisational-group	Biostatistics & Bioinformatics, Division of Integrative Genomics
pubs.publication-status	Published
pubs.volume	21

Files

Original bundle

Now showing 1 - 1 of 1

Name:: jmlr20_SNVRG.pdf
Size:: 1.09 MB
Format:: Adobe Portable Document Format
Description:: Published version

Download

Collections

Scholarly Articles