3 February 25, 2009
1 Armijo Wolfe Newton 2 Newton Lagrange Newton 2 SQP
2 1 2.1 ( ) S R n (n N) f (x) : R n x f R x S f (x ) = min x S R n f (x) (nonlinear programming) x 0 S k = 0, 1, 2, h k R n ɛ k > 0 ( ɛ k R ) x k+1 = x k + ɛ k h k
x S x S x k (k = 0, 1, 2, ) x k+1 x < r k x k x p p (convergence order) { r k} k 0 p
3 2.1 3.1 ( ) H k R n n H α > 0 : x H k x α x 2 x R n 2.1 x k S f ( x k) R n h k R n (gradient method) h k = ( H k) 1 ( ) f x k or h k H k y = f ( x k) y y R n (3.1)
3.1 ( ) 2.1 f (x) x k S f ( x k) R n H R n n (3.1) h k R n f (x) x k f ( x k + ɛh k) f ( x k) = ɛ f ( x k) h k + o(ɛ) = ɛh k Hh k + o(ɛ) αɛ h k 2 + o(ɛ)
3.2 ( ) 2.1 x k S f ( x k) R n θ [ π, π] cos θ = f ( x k) h k h k ( f x k ) (3.2) 3.3 ( ) 3.1 H R n n = I (I ) h k R n h k = f ( x k) or h k y = f ( x k) y y R n
Armijo Armijo 3.4 (Armijo ) 2.1 x k S f ( x k) R n h k R n α > 0 0 < ɛ < α ɛ R Armijo (Armijo criterion) ξ (0, 1) f ( x k + ɛh k) f ( x k) ξ f ( x k) (αh k) f(x k ) f(x k +αh k ) f(x k ) f(x k ) (αh k ) 0 α ²
f (x) 2 ξ = 1/2 ɛ α x = x k + αh k α f ( x k + αh k) = f ( x k) + T f ( x k) ( αh k) = 0 f ( x k + αh k) f ( x k) = f ( x k) (αh k) + 1 ( ) αh k T f ( x k) ( αh k) 2 = 1 2 f ( x k) (αh k) = ξ f ( x k) (αh k) f(x k ) 1 f(x k ) (αh k ) 2 0 α ²
Wolfe Wolfe 3.5 (Wolfe ) 2.1 x k S f ( x k) R n h k R n α > 0 α < ɛ ɛ R Wolfe (Wolfe criterion) Armijo ξ (0, 1) ξ (0, 1) 0 < ξ < µ < 1 µ f ( x k) (αh k) f ( x k + ɛh k) (αh k) (3.3) f(x k ) f(x k +αh k ) (αh k ) f(x k ) (αh k ) 0 α ²
3.2 ( ) 2.1 f (x) L = { x R n f (x) f (x 0 ) } U f (x) U Lipschitz h k ɛ Armijo Wolfe { x k} k θ k (3.2) f ( x k) 2 cos 2 θ k < (3.4) k=1 [1] 4.1
h k f ( x k) cos θ k > 0 and lim k f ( x k) = 0 f ( x k) 2 cos 2 θ k < k=1
Newton Newton f ( x k + h k) = f ( x k) + T f ( x k) h k = 0 3.6 (Newton ) 2.1 x k S f ( x k) R n Hesse H ( x k) = T f ( x k) R n n h k R n ɛ = 1 Newton (Newton method) h k = H 1 ( x k) f ( x k) or h k H ( x k) y = f ( x k) y y R n (3.5)
3.3 (Newton ) 2.1 f (x) 2 Hesse H(x) = T f (x) R n n a S a x Lipschiz ( C 0 : H(a) H(x) C a x ) a Newton { x k} k a 2 [1] 4.2 3.1 (Newton ) Newton Hesse T f (x) R n n (3.1) (3.5) Hesse Newton
Newton 3.7 ( Newton ) Newton Newton (quasi-newton method) BFGS (Broyden-Fletcher-Goldfalb-Shanno) DFP (Davidon-Fletcher-Powell) 3.8 ( ) A x, y x Ay = 0 x, y A = T f ( x k) h k (conjugate direction method) h k = f ( x k) + α k 1 h k 1 α 0 = 1, α k = f ( x k) ( f ( x k) f ( x k 1)) f ( x k 1) 2 (k 1)
4 1 4.1 ( ) f : R n R g (l) : R n R (l = 1, 2,, m), g = ( g (l)) l Rm x R n f (x ) = min { f (x) g(x) 0} x Rn
4.1 ( ) (barrier function method) (inner point method) P ( k x, ρ k) m = f (x) ρ k max 0, log ( g (l) (x) ) { ρ k} k (penalty function method) (outer point method) P k ( x, ρ k) = f (x) + ρ k l=1 m max { 0, g (l) (x) } { ρ k} k l=1
Lagrange Newton 4.2 (Lagrange Newton ) 4.1 x k S Lagrange L ( x k, λ ) g(x) = { g (l) (x) } l Rm, λ = { λ (l)} l Rm L ( x k, λ ) = f ( x k) + λ g ( x k) L(x, λ) x L(x, λ) R n Hesse H L (x, λ) = T L(x, λ) R n n g ( x k) + ( g ( T x k)) T h = 0 h k R n ɛ = 1 Lagrange Newton (Newton method) ( H L x k, λ ) g ( T x k) ( ) h ( ( k g T x k)) T 0 = λ f ( x k) g ( x k) (4.1)
4.1 (Lagrange Newton ) 4.1 f (x), g (x) 2 (a, µ) S R m Hesse H L (x, λ) ( 1 3.2) (x, λ) Lipschiz (a, µ) {( Lagrange Newton x k, λ )} (a, µ) k 2 (4.1) λ I A (x k+1 ) = { l {1, 2,, m} λ (l) 0 } (4.1) 4.1 h k 3.1
2 4.2 ( 2 ) 4.1 x k S f ( x k), g T ( x k) H k R n n ɛs R n Q (ɛs) = min {Q (ɛh) = 12 ( ) 1ɛ ɛh ɛh R Hk (ɛh) + f ( x k) } (ɛh) n s.t. g ( x k) + ( g T ( x k)) T (ɛh) 0 4.3 ( 2 ) 4.1 4.2 ɛs ɛh k R n { x k} k 2 (sequential quadratic programming) SQP ɛ Q (ɛh) Armijo Wolfe
4.2 ( 2 ) 4.3 g ( x k) + ( g ( T x k)) T (ɛh) = 0 ɛs = ɛh k (4.1) 1 ɛ Hk g ( T x k) ( ) ɛh ( ( k g T x k)) T 0 = λ f ( x k) g ( x k) (4.2) (4.1) H k Hesse ɛ Armijo Wolfe 4.1 f (x), g (x) x k S ( 1 3.2) (4.2) ( ɛh k, λ )
f (x) g (l) (x) (l = 1, 2,, m) H k h (0)k, h (l)k (l = 1, 2,, m) h k = h (0)k + m λ (l) h (l)k (4.3) l=1 h (0)k = ( H k) 1 f ( x k ), h (l)k = ( H k) 1 g (l) ( x k) (4.4). (4.2) 1 2 g ( (1) x k) (ɛh (1)k) g ( (1) x k)..... g ( (m) x k) (ɛh (1)k) g ( (m) x k) g ( (1) x k) + g ( (1) x k) (ɛh (0)k) =. g ( (m) x k) + g ( (m) x k) (ɛh (0)k) (ɛh (m)k) (ɛh (m)k) λ (1). λ (m) (4.5) λ R m 1
4.3 g ( x k) + ( g ( T x k)) T (ɛh) 0 KKT 1 3.4 (4.2) 2 g ( x k) + ( g ( (1)T x k)) T m ɛh(0)k + λ (l) ɛh (l)k 0 (4.6) l=1 λ g ( x k) + ( g ( (1)T x k)) T m ɛh(0)k + λ (1) ɛh (l)k = 0 (4.7) λ 0 l=1 (4.8) (4.5) λ (4.6) (4.8) I A (x k+1 ) (4.5) 4.1 ɛh k
2.... 4.1 ( 2 ) 1 k = 0 x 0 S, ɛ 2 f ( x k), g ( x k) f ( x k), g T ( x k) 3 H k R n n (3.1) h k = h (l)k h (l)k (l = 0, 1, 2,, m) 4 (4.5) λ (4.6) (4.8) (4.6) (4.8) (4.6) (4.8) I A (x k+1 ) (4.5) 5 (4.3) h k Armijo Wolfe (3.3) (3.4) f ( x k) L ( x k, λ ) ξ, µ (0, 1) (ξ < µ) Armijo ɛ ɛ/2 Wolfe Armijo ɛ 4 6 k + 1 k 2
5 Armijo Wolfe Newton 2 Newton Lagrange Newton 2 SQP
[1],,... [2]. :.. [3],.., 2002. [4] G.L. Nemhauser, A.H.G. Rinnooy Kan, and M.J. Todd, editors. [5]. :., 2003. [6] L. Armijo. Minimization of functions having lipschitz-continuous first partial derivatives. Pacific Journal of Mathematics, Vol. 16, pp. 1 3, 1966.