Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam Dũng
lượt xem 2
download
Bài giảng "Tối ưu hóa nâng cao - Chương 7: Subgradient method" cung cấp cho người học các kiến thức: Last last time - gradient descent, subgradient method, step size choices, convergence analysis, lipschitz continuity, convergence analysis - Proof,... Mời các bạn cùng tham khảo.
Bình luận(0) Đăng nhập để gửi bình luận!
Nội dung Text: Bài giảng Tối ưu hóa nâng cao: Chương 7 - Hoàng Nam Dũng
- Subgradient Method Hoàng Nam Dũng Khoa Toán - Cơ - Tin học, Đại học Khoa học Tự nhiên, Đại học Quốc gia Hà Nội
- Last last time: gradient descent Consider the problem min f (x) x for f convex and differentiable, dom(f ) = Rn . Gradient descent: choose initial x (0) ∈ Rn , repeat: x (k) = x (k−1) − tk · ∇f (x (k−1) ), k = 1, 2, 3, . . . Step sizes tk chosen to be fixed and small, or by backtracking line search. If ∇f Lipschitz, gradient descent has convergence rate O(1/ε). Downsides: I Requires f differentiable — addressed this lecture. I Can be slow to converge — addressed next lecture. 1
- Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . 2
- Subgradient method Now consider f convex, having dom(f ) = Rn , but not necessarily differentiable. Subgradient method: like gradient descent, but replacing gradients with subgradients, i.e., initialize x (0) , repeat: x (k) = x (k−1) − tk · g (k−1) , k = 1, 2, 3, . . . where g (k−1) ∈ ∂f (x (k−1) ) any subgradient of f at x (k−1) . Subgradient method is not necessarily a descent method, so we (k) keep track of best iterate xbest among x (0) , ...x (k) so far, i.e., (k) f (xbest ) = min f (x (i) ). i=0,...,k 2
- Outline Today: I How to choose step sizes I Convergence analysis I Intersection of sets I Projected subgradient method 3
- Step size choices I Fixed step sizes: tk = t all k = 1, 2, 3, . . . I Fixed step length, i.e., tk = s/kg (k−1) k2 , and hence ktk g (k−1) k2 = s. I Diminishing step sizes: choose to meet conditions X∞ X∞ tk2 < ∞, tk = ∞, k=1 k=1 i.e., square summable but not summable. Important here that step sizes go to zero, but not too fast. There are several other options too, but key difference to gradient descent: step sizes are pre-specified, not adaptively computed. 4
- Convergence analysis Assume that f convex, dom(f ) = Rn , and also that f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y . Theorem For a fixed step size t, subgradient method satisfies (k) kx (0) − x ∗ k22 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 For fixed step length, i.e., tk = s/kg (k−1) k2 , we have (k) Lkx (0) − x ∗ k22 Ls f (xbest ) − f ∗ ≤ + . 2ks 2 For diminishing step sizes, subgradient method satisfies kx (0) − x ∗ k22 + L2 ki=1 ti2 P (k) ∗ f (xbest ) − f ≤ , 2 ki=1 ti P (k) i.e., lim f (xbest ) = f ∗ . 5 k→∞
- Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. ⇐=: Choose subgradients gx and gy at x and y . We have gxT (x − y ) ≥ f (x) − f (y ) ≥ gyT (x − y ). Apply Cauchy-Schwarz inequality get Lkx − y k2 ≥ f (x) − f (y ) ≥ −Lkx − y k2 . 6
- Lipschitz continuity Before the proof let consider the Lipschitz continuity assumption. Lemma f is Lipschitz continuous with constant L > 0, i.e., |f (x) − f (y )| ≤ L kx − y k2 for all x, y , is equivalent to kg k2 ≤ L for all x and g ∈ ∂f (x). Chứng minh. =⇒: Assume kg k2 > L for some g ∈ ∂f (x). Take y = x + g /kg k2 we have ky − xk2 = 1 and f (y ) ≥ f (x) + g T (y − x) = f (x) + kg k2 > f (x) + L, contradiction. 7
- Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient kx (k) − x ∗ k22 = kx (k−1) − tk g (k−1) − x ∗ k22 = kx (k−1) − x ∗ k22 − 2tk g (k−1) (x (k−1) − x ∗ ) + tk2 kg (k−1) k22 ≤ kx (k−1) − x ∗ k22 − 2tk (f (x (k−1) ) − f (x ∗ )) + tk2 kg (k−1) k22 . 8
- Convergence analysis - Proof Can prove both results from same basic inequality. Key steps: I Using definition of subgradient kx (k) − x ∗ k22 = kx (k−1) − tk g (k−1) − x ∗ k22 = kx (k−1) − x ∗ k22 − 2tk g (k−1) (x (k−1) − x ∗ ) + tk2 kg (k−1) k22 ≤ kx (k−1) − x ∗ k22 − 2tk (f (x (k−1) ) − f (x ∗ )) + tk2 kg (k−1) k22 . I Iterating last inequality kx (k) − x ∗ k22 k X k X ≤ kx (0) − x ∗ k22 − 2 ti (f (x (i−1) ) − f (x ∗ )) + ti2 kg (i−1) k22 . i=1 i=1 8
- Convergence analysis - Proof I Using kx (k) − x ∗ k2 ≥ 0 and letting R = kx (0) − x ∗ k2 , we have k X k X 0 ≤ R2 − 2 ti (f (x (i−1) ) − f (x ∗ )) + ti2 kg (i−1) k22 . i=1 i=1 9
- Convergence analysis - Proof I Using kx (k) − x ∗ k2 ≥ 0 and letting R = kx (0) − x ∗ k2 , we have k X k X 0 ≤ R2 − 2 ti (f (x (i−1) ) − f (x ∗ )) + ti2 kg (i−1) k22 . i=1 i=1 (k) I Introducing f (xbest ) = mini=0,...k f (x (i) ) and rearranging, we have the basic inequality Pk (i−1) k2 (k) ∗ R2 + 2 i=1 ti kg 2 f (xbest ) − f (x ) ≤ Pk . 2 i=1 ti 9
- Convergence analysis - Proof I Using kx (k) − x ∗ k2 ≥ 0 and letting R = kx (0) − x ∗ k2 , we have k X k X 0 ≤ R2 − 2 ti (f (x (i−1) ) − f (x ∗ )) + ti2 kg (i−1) k22 . i=1 i=1 (k) I Introducing f (xbest ) = mini=0,...k f (x (i) ) and rearranging, we have the basic inequality Pk (i−1) k2 (k) ∗ R2 + 2 i=1 ti kg 2 f (xbest ) − f (x ) ≤ Pk . 2 i=1 ti For different step sizes choices, convergence results can be directly obtained from this bound. E.g., theorems for fixed and diminishing step sizes follow. 9
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With fixed step size t, this and the Lipschitz continuity assumption give (k) R2 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 10
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With fixed step size t, this and the Lipschitz continuity assumption give (k) R2 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 I Does not guarantee convergence (as k → ∞). (k) 2 I For large k, f (xbest ) is approximately L2t -suboptimal. I To make the gap ≤ ε, let’s make each term ≤ ε/2. So we can choose t = ε/L2 , and k = R 2 /t · 1/ε = R 2 L2 /ε2 . 10
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With fixed step size t, this and the Lipschitz continuity assumption give (k) R2 L2 t f (xbest ) − f ∗ ≤ + . 2kt 2 I Does not guarantee convergence (as k → ∞). (k) 2 I For large k, f (xbest ) is approximately L2t -suboptimal. I To make the gap ≤ ε, let’s make each term ≤ ε/2. So we can choose t = ε/L2 , and k = R 2 /t · 1/ε = R 2 L2 /ε2 . I.e., subgradient method guarantees the gap ε in k = O(1/ε2 ) iterations ... compare this to O(1/ε) rate of gradient descent. 10
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With fixed step length, i.e., ti = s/kg (i−1) k2 , we have (k) R 2 + ks 2 R 2 + ks 2 LR 2 Ls f (xbest )−f (x ∗ ) ≤ ≤ = + . 2s ki=1 kg (i−1) k−1 2s ki=1 L−1 2ks 2 P P 2 11
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With fixed step length, i.e., ti = s/kg (i−1) k2 , we have (k) R 2 + ks 2 R 2 + ks 2 LR 2 Ls f (xbest )−f (x ∗ ) ≤ ≤ = + . 2s ki=1 kg (i−1) k−1 2s ki=1 L−1 2ks 2 P P 2 I Does not guarantee convergence (as k → ∞). (k) Ls I For large k, f (xbest ) is approximately 2 -suboptimal. I To make the gap ≤ ε, let’s make each term ≤ ε/2. So we can choose s = ε/L, and k = LR 2 /s · 1/ε = R 2 L2 /ε2 . 11
- Convergence analysis - Proof The basic inequality tells us that after k steps, we have R 2 + ki=1 ti2 kg (i−1) k22 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P From this and the Lipschitz continuity, we have R 2 + L2 ki=1 ti2 P (k) ∗ f (xbest ) − f (x ) ≤ . 2 ki=1 ti P With diminishing step size, ∞ P P∞ 2 i=1 ti = ∞ and i=1 ti < ∞, there holds (k) lim f (xbest ) = f ∗ . k→∞ 12
CÓ THỂ BẠN MUỐN DOWNLOAD
-
Hệ thống tự động xử lý nước thải mạ Crom – Niken
2 p | 211 | 54
-
HyperChem v8.0.6
4 p | 151 | 13
-
Các chất trợ dệt và các chất xử lý hoàn tất vải công năng cao mang lại lợi ích chi phí
4 p | 127 | 12
-
Phân tích và đánh giá hàm lượng Cafein, Theobromin, Theophyllin trong các loại chè xanh Việt Nam có nguồn gốc địa lý khác nhau
6 p | 128 | 9
-
Bài giảng Tối ưu hóa nâng cao: Chương 4 - Hoàng Nam Dũng
54 p | 55 | 5
-
Bài giảng Tối ưu hóa nâng cao: Chương 3 - Hoàng Nam Dũng
47 p | 33 | 4
-
Bài giảng Tối ưu hóa nâng cao: Chương 1 - Hoàng Nam Dũng
30 p | 47 | 4
-
Bài giảng Tối ưu hóa nâng cao: Chương 2 - Hoàng Nam Dũng
76 p | 49 | 3
-
Bài giảng Tối ưu hóa nâng cao: Chương 5 - Hoàng Nam Dũng
31 p | 47 | 3
-
Bài giảng Tối ưu hóa nâng cao: Chương 10 - Hoàng Nam Dũng
22 p | 38 | 3
-
Bài giảng Tối ưu hóa nâng cao: Chương 6 - Hoàng Nam Dũng
36 p | 42 | 2
-
Bài giảng Tối ưu hóa nâng cao: Chương 8 - Hoàng Nam Dũng
50 p | 21 | 2
-
Bài giảng Tối ưu hóa nâng cao: Chương 9 - Hoàng Nam Dũng
24 p | 49 | 2
-
Đánh giá mức độ rủi ro vùng biển ven bờ khu vực Mỹ Giang - Hòn Đỏ - Bãi Cỏ thuộc xã Ninh Phước, Ninh Hòa, Khánh Hòa
8 p | 68 | 1
Chịu trách nhiệm nội dung:
Nguyễn Công Hà - Giám đốc Công ty TNHH TÀI LIỆU TRỰC TUYẾN VI NA
LIÊN HỆ
Địa chỉ: P402, 54A Nơ Trang Long, Phường 14, Q.Bình Thạnh, TP.HCM
Hotline: 093 303 0098
Email: support@tailieu.vn