
Proximal Gradient Descent (and Acceleration)
Hoàng Nam Dũng
Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội

Last time: subgradient method
Consider the problem
min
xf(x)
with fconvex, and dom(f) = Rn.
Subgradient method: choose an initial x(0)∈Rn, and repeat:
x(k)=x(k−1)−tk·g(k−1),k=1,2,3, . . .
where g(k−1)∈∂f(x(k−1)). We use pre-set rules for the step sizes
(e.g., diminshing step sizes rule).
If fis Lipschitz, then subgradient method has a convergence rate
O(1/ε2).
Upside: very generic. Downside: can be slow — addressed today.
1

Outline
Today
◮Proximal gradient descent
◮Convergence analysis
◮ISTA, matrix completion
◮Special cases
◮Acceleration
2

Decomposable functions
Suppose
f(x) = g(x) + h(x)
where
◮gis convex, differentiable, dom(g) = Rn
◮his convex, not necessarily differentiable.
If fwere differentiable, then gradient descent update would be
x+=x−t· ∇f(x)
Recall motivation: minimize quadratic approximation to faround
x, replace ∇2f(x)by 1
tI
x+= argminzf(x) + ∇f(x)T(z−x) + 1
2tkz−xk2
2
|{z }
˜
ft(z)
.
3

Decomposable functions
In our case fis not differentiable, but f=g+h,gdifferentiable.
Why don’t we make quadratic approximation to g, leave halone?
I.e., update
x+= argminz˜gt(z) + h(z)
= argminzg(x) + ∇g(x)T(z−x) + 1
2tkz−xk2
2+h(z)
= argminz
1
2tkz−(x−t∇g(x))k2
2+h(z).
1
2tkz−(x−t∇g(x))k2
2stay close to gradient update for g
h(z)also make hsmall
4

