Proximal Gradient Descent (and Acceleration)
Hoàng Nam Dũng
Khoa Toán - - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Nội
Last time: subgradient method
Consider the problem
min
xf(x)
with fconvex, and dom(f) = Rn.
Subgradient method: choose an initial x(0)Rn, and repeat:
x(k)=x(k1)tk·g(k1),k=1,2,3, . . .
where g(k1)f(x(k1)). We use pre-set rules for the step sizes
(e.g., diminshing step sizes rule).
If fis Lipschitz, then subgradient method has a convergence rate
O(12).
Upside: very generic. Downside: can be slow addressed today.
1
Outline
Today
Proximal gradient descent
Convergence analysis
ISTA, matrix completion
Special cases
Acceleration
2
Decomposable functions
Suppose
f(x) = g(x) + h(x)
where
gis convex, differentiable, dom(g) = Rn
his convex, not necessarily differentiable.
If fwere differentiable, then gradient descent update would be
x+=xt· f(x)
Recall motivation: minimize quadratic approximation to faround
x, replace 2f(x)by 1
tI
x+= argminzf(x) + f(x)T(zx) + 1
2tkzxk2
2
|{z }
˜
ft(z)
.
3
Decomposable functions
In our case fis not differentiable, but f=g+h,gdifferentiable.
Why don’t we make quadratic approximation to g, leave halone?
I.e., update
x+= argminz˜gt(z) + h(z)
= argminzg(x) + g(x)T(zx) + 1
2tkzxk2
2+h(z)
= argminz
1
2tkz(xtg(x))k2
2+h(z).
1
2tkz(xtg(x))k2
2stay close to gradient update for g
h(z)also make hsmall
4