Stochastic nested variance reduction for nonconvex optimization

dc.contributor.author

Zhou, D

dc.contributor.author

Xu, P

dc.contributor.author

Gu, Q

dc.date.accessioned

2024-06-03T21:53:09Z

dc.date.available

2024-06-03T21:53:09Z

dc.date.issued

2020-05-01

dc.description.abstract

We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, namely, Stochastic Nested Variance-Reduced Gradient descent (SNVRG). Compared with conventional stochastic variance reduced gradient (SVRG) algorithm that uses two reference points to construct a semi-stochastic gradient with diminishing variance in each iteration, our algorithm uses K + 1 nested reference points to build a semi-stochastic gradient to further reduce its variance in each iteration. For smooth nonconvex functions, SNVRG converges to an ε-approximate first-order stationary point within Oe(n∧ε−2 + ε−3 ∧n1/2ε−2)1 number of stochastic gradient evaluations. This improves the best known gradient complexity of SVRG O(n+ n2/3ε−2) and that of SCSG O(n∧ε−2 + ε−10/3 ∧n2/3ε−2). For gradient dominated functions, SNVRG also achieves better gradient complexity than the state-of-the-art algorithms. Based on SNVRG, we further propose two algorithms that can find local minima faster than state-of-the-art algorithms in both finite-sum and general stochastic (online) nonconvex optimization. In particular, for finite-sum optimization problems, the proposed SNVRG + Neon2finite algorithm achieves Oe(n1/2ε−2 + nε−H3 + n3/4ε−H7/2) gradient complexity to converge to an (ε, εH)-second-order stationary point, which outperforms SVRG+Neon2finite (Allen-Zhu and Li, 2018), the best existing algorithm, in a wide regime. For general stochastic optimization problems, the proposed SNVRG + Neon2online achieves Oe(ε−3 + ε−H5 + ε−2ε−H3) gradient complexity, which is better than both SVRG + Neon2online (Allen-Zhu and Li, 2018) and Natasha2 (Allen-Zhu, 2018a) in certain regimes. Thorough experimental results on different nonconvex optimization problems back up our theory.

dc.identifier.issn

1532-4435

dc.identifier.issn

1533-7928

dc.identifier.uri

https://hdl.handle.net/10161/30776

dc.relation.ispartof

Journal of Machine Learning Research

dc.rights.uri

https://creativecommons.org/licenses/by-nc/4.0

dc.subject

Nonconvex Optimization

dc.subject

Finding Local Minima

dc.subject

Variance Reduction

dc.title

Stochastic nested variance reduction for nonconvex optimization

dc.type

Journal article

duke.contributor.orcid

Xu, P|0000-0002-2559-8622

pubs.organisational-group

Duke

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

School of Medicine

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Electrical and Computer Engineering

pubs.organisational-group

Computer Science

pubs.organisational-group

Biostatistics & Bioinformatics, Division of Integrative Genomics

pubs.publication-status

Published

pubs.volume

21

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
jmlr20_SNVRG.pdf
Size:
1.09 MB
Format:
Adobe Portable Document Format
Description:
Published version