? (EM),, EM? (, 2004/ 2002) von Mises-Fisher ( 2004) HMM (MacKay 1997) LDA (Blei et al. 2001) PCFG ( 2004)... Variational Bayesian methods for Natural

Similar documents
1 (Berry,1975) 2-6 p (S πr 2 )p πr 2 p 2πRγ p p = 2γ R (2.5).1-1 : : : : ( ).2 α, β α, β () X S = X X α X β (.1) 1 2

ohpmain.dvi

201711grade1ouyou.pdf

ばらつき抑制のための確率最適制御

N cos s s cos ψ e e e e 3 3 e e 3 e 3 e

6 2 2 x y x y t P P = P t P = I P P P ( ) ( ) ,, ( ) ( ) cos θ sin θ cos θ sin θ, sin θ cos θ sin θ cos θ y x θ x θ P

GJG160842_O.QXD

Overview (Gaussian Process) GPLVM GPDM 2 / 59

.2 ρ dv dt = ρk grad p + 3 η grad (divv) + η 2 v.3 divh = 0, rote + c H t = 0 dive = ρ, H = 0, E = ρ, roth c E t = c ρv E + H c t = 0 H c E t = c ρv T

meiji_resume_1.PDF

2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n

Part () () Γ Part ,

I A A441 : April 15, 2013 Version : 1.1 I Kawahira, Tomoki TA (Shigehiro, Yoshida )

tnbp59-21_Web:P2/ky132379509610002944

0 ϕ ( ) (x) 0 ϕ (+) (x)ϕ d 3 ( ) (y) 0 pd 3 q (2π) 6 a p a qe ipx e iqy 0 2Ep 2Eq d 3 pd 3 q 0 (2π) 6 [a p, a q]e ipx e iqy 0 2Ep 2Eq d 3 pd 3 q (2π)


Kullback-Leibler

日本内科学会雑誌第98巻第4号

日本内科学会雑誌第97巻第7号

Gmech08.dvi

(Compton Scattering) Beaming 1 exp [i (k x ωt)] k λ k = 2π/λ ω = 2πν k = ω/c k x ωt ( ω ) k α c, k k x ωt η αβ k α x β diag( + ++) x β = (ct, x) O O x

統計学のポイント整理

スライド 1

四変数基本対称式の解放

simx simxdx, cosxdx, sixdx 6.3 px m m + pxfxdx = pxf x p xf xdx = pxf x p xf x + p xf xdx 7.4 a m.5 fx simxdx 8 fx fx simxdx = πb m 9 a fxdx = πa a =

Kullback-Leibler

1 1.1 ( ). z = a + bi, a, b R 0 a, b 0 a 2 + b 2 0 z = a + bi = ( ) a 2 + b 2 a a 2 + b + b 2 a 2 + b i 2 r = a 2 + b 2 θ cos θ = a a 2 + b 2, sin θ =

(1.2) T D = 0 T = D = 30 kn 1.2 (1.4) 2F W = 0 F = W/2 = 300 kn/2 = 150 kn 1.3 (1.9) R = W 1 + W 2 = = 1100 N. (1.9) W 2 b W 1 a = 0

zz + 3i(z z) + 5 = 0 + i z + i = z 2i z z z y zz + 3i (z z) + 5 = 0 (z 3i) (z + 3i) = 9 5 = 4 z 3i = 2 (3i) zz i (z z) + 1 = a 2 {

°ÌÁê¿ô³ØII

Kullback-Leibler


x () g(x) = f(t) dt f(x), F (x) 3x () g(x) g (x) f(x), F (x) (3) h(x) = x 3x tf(t) dt.9 = {(x, y) ; x, y, x + y } f(x, y) = xy( x y). h (x) f(x), F (x

PRML pdf PRML ( N x t y(x, w) = w 0 + w 1 x + w 2 x w M x m = M w j x j (1.1) j=0 E(w) = 1 {y(x n, w) t n } 2

03.Œk’ì

B ver B

July 28, H H 0 H int = H H 0 H int = H int (x)d 3 x Schrödinger Picture Ψ(t) S =e iht Ψ H O S Heisenberg Picture Ψ H O H (t) =e iht O S e i

3 filename=quantum-3dim110705a.tex ,2 [1],[2],[3] [3] U(x, y, z; t), p x ˆp x = h i x, p y ˆp y = h i y, p z ˆp z = h

y = x x R = 0. 9, R = σ $ = y x w = x y x x w = x y α ε = + β + x x x y α ε = + β + γ x + x x x x' = / x y' = y/ x y' =

1 filename=mathformula tex 1 ax 2 + bx + c = 0, x = b ± b 2 4ac, (1.1) 2a x 1 + x 2 = b a, x 1x 2 = c a, (1.2) ax 2 + 2b x + c = 0, x = b ± b 2

) ] [ h m x + y + + V x) φ = Eφ 1) z E = i h t 13) x << 1) N n n= = N N + 1) 14) N n n= = N N + 1)N + 1) 6 15) N n 3 n= = 1 4 N N + 1) 16) N n 4

Kalman ( ) 1) (Kalman filter) ( ) t y 0,, y t x ˆx 3) 10) t x Y [y 0,, y ] ) x ( > ) ˆx (prediction) ) x ( ) ˆx (filtering) )

gr09.dvi

‚åŁÎ“·„´Šš‡ðŠp‡¢‡½‹âfi`fiI…A…‰…S…−…Y…•‡ÌMarkovŸA“½fiI›ð’Í

SO(3) 7 = = 1 ( r ) + 1 r r r r ( l ) (5.17) l = 1 ( sin θ ) + sin θ θ θ ϕ (5.18) χ(r)ψ(θ, ϕ) l ψ = αψ (5.19) l 1 = i(sin ϕ θ l = i( cos ϕ θ l 3 = i ϕ

液晶の物理1:連続体理論(弾性,粘性)

春期講座 ~ 極限 1 1, 1 2, 1 3, 1 4,, 1 n, n n {a n } n a n α {a n } α {a n } α lim n an = α n a n α α {a n } {a n } {a n } 1. a n = 2 n {a n } 2, 4, 8, 16,


arctan 1 arctan arctan arctan π = = ( ) π = 4 = π = π = π = =

IBM-Mode1 Q: A: cash money It is fine today 2

医系の統計入門第 2 版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 第 2 版 1 刷発行時のものです.

,. Black-Scholes u t t, x c u 0 t, x x u t t, x c u t, x x u t t, x + σ x u t, x + rx ut, x rux, t 0 x x,,.,. Step 3, 7,,, Step 6., Step 4,. Step 5,,.


ax 2 + bx + c = n 8 (n ) a n x n + a n 1 x n a 1 x + a 0 = 0 ( a n, a n 1,, a 1, a 0 a n 0) n n ( ) ( ) ax 3 + bx 2 + cx + d = 0 4

D = [a, b] [c, d] D ij P ij (ξ ij, η ij ) f S(f,, {P ij }) S(f,, {P ij }) = = k m i=1 j=1 m n f(ξ ij, η ij )(x i x i 1 )(y j y j 1 ) = i=1 j

Ł\”ƒ-2005

SFGÇÃÉXÉyÉNÉgÉãå`.pdf


第90回日本感染症学会学術講演会抄録(I)

s = 1.15 (s = 1.07), R = 0.786, R = 0.679, DW =.03 5 Y = 0.3 (0.095) (.708) X, R = 0.786, R = 0.679, s = 1.07, DW =.03, t û Y = 0.3 (3.163) + 0

Kullback-Leibler

1 I

変 位 変位とは 物体中のある点が変形後に 別の点に異動したときの位置の変化で あり ベクトル量である 変位には 物体の変形の他に剛体運動 剛体変位 が含まれている 剛体変位 P(x, y, z) 平行移動と回転 P! (x + u, y + v, z + w) Q(x + d x, y + dy,


untitled

Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :

12/1 ( ) GLM, R MCMC, WinBUGS 12/2 ( ) WinBUGS WinBUGS 12/2 ( ) : 12/3 ( ) :? ( :51 ) 2/ 71

日本内科学会雑誌第102巻第4号

O x y z O ( O ) O (O ) 3 x y z O O x v t = t = 0 ( 1 ) O t = 0 c t r = ct P (x, y, z) r 2 = x 2 + y 2 + z 2 (t, x, y, z) (ct) 2 x 2 y 2 z 2 = 0

untitled


³ÎΨÏÀ

量子力学 問題

,,,17,,, ( ),, E Q [S T F t ] < S t, t [, T ],,,,,,,,

第86回日本感染症学会総会学術集会後抄録(I)

( ) Loewner SLE 13 February

現代物理化学 1-1(4)16.ppt

1 1 sin cos P (primary) S (secondly) 2 P S A sin(ω2πt + α) A ω 1 ω α V T m T m 1 100Hz m 2 36km 500Hz. 36km 1

Mantel-Haenszelの方法

30

st.dvi

2

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

(4) P θ P 3 P O O = θ OP = a n P n OP n = a n {a n } a = θ, a n = a n (n ) {a n } θ a n = ( ) n θ P n O = a a + a 3 + ( ) n a n a a + a 3 + ( ) n a n

s s U s L e A = P A l l + dl dε = dl l l


(5) 75 (a) (b) ( 1 ) v ( 1 ) E E 1 v (a) ( 1 ) x E E (b) (a) (b)

ii 3.,. 4. F. (), ,,. 8.,. 1. (75%) (25%) =7 20, =7 21 (. ). 1.,, (). 3.,. 1. ().,.,.,.,.,. () (12 )., (), 0. 2., 1., 0,.

70 5. (isolated system) ( ) E N (closed system) N T (open system) (homogeneous) (heterogeneous) (phase) (phase boundary) (grain) (grain boundary) 5. 1

TOP URL 1

( ) ( 40 )+( 60 ) Schrödinger 3. (a) (b) (c) yoshioka/education-09.html pdf 1

18 I ( ) (1) I-1,I-2,I-3 (2) (3) I-1 ( ) (100 ) θ ϕ θ ϕ m m l l θ ϕ θ ϕ 2 g (1) (2) 0 (3) θ ϕ (4) (3) θ(t) = A 1 cos(ω 1 t + α 1 ) + A 2 cos(ω 2 t + α

* n x 11,, x 1n N(µ 1, σ 2 ) x 21,, x 2n N(µ 2, σ 2 ) H 0 µ 1 = µ 2 (= µ ) H 1 µ 1 µ 2 H 0, H 1 *2 σ 2 σ 2 0, σ 2 1 *1 *2 H 0 H

O1-1 O1-2 O1-3 O1-4 O1-5 O1-6

.5 z = a + b + c n.6 = a sin t y = b cos t dy d a e e b e + e c e e e + e 3 s36 3 a + y = a, b > b 3 s363.7 y = + 3 y = + 3 s364.8 cos a 3 s365.9 y =,

( ) ) ) ) 5) 1 J = σe 2 6) ) 9) 1955 Statistical-Mechanical Theory of Irreversible Processes )

nakata/nakata.html p.1/20

入試の軌跡

1 1 x y = y(x) y, y,..., y (n) : n y F (x, y, y,..., y (n) ) = 0 n F (x, y, y ) = 0 1 y(x) y y = G(x, y) y, y y + p(x)y = q(x) 1 p(x) q(

08-Note2-web

Transcription:

SLC Internal tutorial Daichi Mochihashi daichi.mochihashi@atr.jp ATR SLC 2005.6.21 (Tue) 13:15 15:00@Meeting Room 1 Variational Bayesian methods for Natural Language Processing p.1/30

? (EM),, EM? (, 2004/ 2002) von Mises-Fisher ( 2004) HMM (MacKay 1997) LDA (Blei et al. 2001) PCFG ( 2004)... Variational Bayesian methods for Natural Language Processing p.2/30

EM VB-EM LDA VB-HMM Variational Bayesian methods for Natural Language Processing p.3/30

: Jensen f(x), f(e[x]) E[f(x)] (1) log(x), x = f(x) log p(x)f(x)dx p(x)logf(x)dx. (2) KL D(p q) = p(x)log p(x) dx 0. (3) q(x) p = q. Variational Bayesian methods for Natural Language Processing p.4/30

: D, p(d) = p(d, θ)dθ (4) = p(d θ)p(θ)dθ. (5) p(d θ),p(θ) : ( ) θ : : D, p(θ D). p(θ D) p(θ, D) =p(d θ)p(θ). (6) θ D Variational Bayesian methods for Natural Language Processing p.5/30

p(θ D), d p(d D) = p(d θ)p(θ D)dθ. (7) p(d θ) : p(θ D) : θ. p(d D) =p(d ˆθ). (8) θ = ˆθ p(θ D) =δ(ˆθ) δ p(θ D), Variational Bayesian methods for Natural Language Processing p.6/30

D z, p(d θ) = p(d, z θ)dz. (9) (6) θ = ˆθ z. EM. z D Variational Bayesian methods for Natural Language Processing p.7/30

EM (1) Jensen, log p(d θ) = log p(d, z θ)dz (10) p(d, z θ) =log q(z D, ˆθ) dz (11) q(z D, ˆθ) p(d, z θ) q(z D, ˆθ)log dz = F (q(z),θ) (12) q(z D, ˆθ), F (q(z),θ). E step: M step: q(z) = arg max q(z) ˆθ =argmax θ F (q(z),θ), (13) F (q(z),θ). (14) Variational Bayesian methods for Natural Language Processing p.8/30

EM (2) E step F (q(z),θ)= = = q(z D, q(z D, q(z D, p(d, z θ) ˆθ)log dz (15) q(z D, ˆθ) ˆθ)log p(z D, θ)p(d θ) q(z D, ˆθ) dz (16) q(z D, ˆθ) ˆθ)log dz +logp(d θ) (17) p(z D, θ) = D(q(z D, ˆθ) p(z D, θ)) +logp(d θ) (18) q(z D, ˆθ) =p(z D, θ) (19) (E ). Variational Bayesian methods for Natural Language Processing p.9/30

EM (3) M step F (q(z),θ)= q(z D, p(d, z θ) ˆθ)log dz (20) q(z D, ˆθ) = log p(d, z θ) q(z D,ˆθ) + H(q(z D, ˆθ)) (21), F (q(z),θ) θ,, Q(θ) = log p(d, z θ) q(z D,ˆθ) (Q ) (22) Q(θ) θ =0 (23) θ ˆθ. (M ) Variational Bayesian methods for Natural Language Processing p.10/30

EM ( ) log p(d θ) F (q(z),θ)= q(z D, p(d, z θ) ˆθ)log dz (24) q(z D, ˆθ), F (q(z),θ) q(z),θ (EM )., log p(d θ) F (q(z),θ) (25) = q(z D, ˆθ)logp(D θ)dz p(d, z θ) q(z D, ˆθ)log dz (26) q(z D, ˆθ) q(z D, ˆθ) = q(z D, ˆθ)log dz (27) p(z D, θ) = D(q(z D, ˆθ) p(z D, θ)) 0. (28) KL. Variational Bayesian methods for Natural Language Processing p.11/30

Example: PLSI (1/3) w d, z p(d, w, z) =p(z)p(d z)p(w z) (29) p(d z). d z w = w 1 w 2 w n W = {w 1, w 2,...,w D }, D = {1, 2,...,D}, p(z) p(w z) p(d,w,z)= p(d, w d, z d ) (30) d = p(d, w dn,z dn ) (31) d n = p(z dn )p(d z dn )p(w dn z dn ) (32) d n [ log p(zdn )+log p(d z dn )+log p(w dn z dn ) ]. w log p(d,w,z)= d n (33) Variational Bayesian methods for Natural Language Processing p.12/30

Example: PLSI (2/3) Q log p(d, z θ) p(z D,θ), Q(z) = log p(d,w,z) p(z D,W ) (34) = d [ p(z d, w dn )logp(z dn ) n z + z + z p(z d, w dn )logp(d z dn ) ] p(z d, w dn )logp(w dn z dn ). (35) δq/δθ, δq δp(z) = d p(z) d n p(z d, w dn) p(z) p(z d, w dn ) n d + λ =0 (36) n(d, w)p(z d, w) (37) w Variational Bayesian methods for Natural Language Processing p.13/30

Example: PLSI (3/3), p(d z) n p(z d, w dn ) w n(d, w)p(z d, w) (38) p(w z) d p(z d, w dn ) n d n(d, w)p(z d, w). (39) p(z d, w) p(z, d, w) =p(z)p(d z)p(w z). (40), d θ(d) =p(z d) (41) p(z)p(d z) (42). Variational Bayesian methods for Natural Language Processing p.14/30

EM θ given, z 1 (z ).. Variational Bayesian methods for Natural Language Processing p.15/30

p(d) = p(d, z, θ)dzdθ. (43) (26) z,θ p(z D),p(θ D). p(d, z, θ) log p(d) = log q(z,θ D) q(z,θ D) p(d, z, θ) q(z,θ D)log q(z,θ D) θ z D dzdθ (44) dzdθ (45) z,θ, q(z,θ D) =q(z)q(θ) (46), Variational Bayesian methods for Natural Language Processing p.16/30

log p(d) = q(z,θ D)log q(z)q(θ)log p(d, z, θ) q(z,θ D) dzdθ (47) p(d, zθ) dzdθ (48) q(z)q(θ) = F (q). ( ) (49) (, variational lower bound) F (q) q(z),q(θ). Variational Bayesian methods for Natural Language Processing p.17/30

Maximize w.r.t. q(z) ( L = F (q)+λ = δl δq(z) = = q(z)q(θ)log ) q(z)dz 1 ( p(d, z, θ) q(z)q(θ) dzdθ + λ ) q(z)dz 1 q(θ) [ log p(d, z, θ) log q(θ) log q(z) 1 ] dzdθ + λ (50), q(θ) [ log p(d, z θ)+logp(θ) log q(θ) log q(z) 1 ] dzdθ + λ = log p(d, z θ) q(θ) log q(z)+(const.)+λ =0 (51) q(z) exp log p(d, z θ) q(θ). (52) Variational Bayesian methods for Natural Language Processing p.18/30

Maximize w.r.t. q(θ) ( L = F (q)+λ = δl δq(θ) = = q(z)q(θ)log ) q(θ)dθ 1 ( p(d, z, θ) q(z)q(θ) dzdθ + λ ) q(θ)dθ 1 q(z) [ log p(d, z, θ) log q(θ) log q(z) 1 ] dzdθ + λ (53) (54) q(z) [ log p(d, z θ)+logp(θ) log q(θ) log q(z) 1 ] dzdθ + λ = log p(d, z θ) +logp(θ) log q(θ)+(const.)+λ =0 q(z) (55) q(θ) p(θ)exp log p(d, z θ) q(z). (56) Variational Bayesian methods for Natural Language Processing p.19/30

D, z, θ,. log p(d) = log p(d, z, θ)dzdθ (57) p(d, z, θ) q(z)q(θ)log dzdθ = F (q). q(z)q(θ) (58) F (q) q(z), q(θ), q(z) exp log p(d, z θ) q(θ) q(θ) p(θ)exp log p(d, z θ) q(z) (VB-E step) (59) (VB-M step) (60) VB-EM. Variational Bayesian methods for Natural Language Processing p.20/30

(1) VB-EM : q(z) exp log p(d, z θ) q(θ) q(θ) p(θ)exp log p(d, z θ) q(z) (VB-E step) (61) (VB-M step) (62) q(θ) =δ(ˆθ), (44) q(z) p(d, z ˆθ) p(z D, ˆθ) (63) EM E-step. Variational Bayesian methods for Natural Language Processing p.21/30

(2), log p(d) F (q) = q(z,θ)logp(d)dzdθ = = log p(d) F (q). (64) q(z,θ)log p(d, z, θ) q(z,θ) (65) dzdθ (66) q(z,θ) [ log p(d) log p(z,θ D) log p(d)+logq(z,θ) ] dzdθ q(z,θ)log q(z,θ) p(z,θ D) (67) dzdθ (68) = D(q(z,θ) p(z,θ D)) 0. (69), q(z,θ) =q(z)q(θ). Variational Bayesian methods for Natural Language Processing p.22/30

(3) F (q) = ( = = = log log log q(z)q(θ)log p(d, z, θ) q(z)q(θ) p(d, z θ) p(θ) q(z)q(θ)log q(z) q(θ) p(d, z θ) q(z) p(d, z θ) q(z) p(d, z θ) q(z) q(z)q(θ) dzdθ (70) dzdθ (71) q(θ)log q(θ) dθ (72) p(θ) D(q(θ D) p(θ)) q(z)q(θ) }{{} ( ) q(z)q(θ) ˆθ 2 log N }{{} MDL, BIC +logp(ˆθ) }{{} (const.) KL,. ) (73) (74) Variational Bayesian methods for Natural Language Processing p.23/30

(2), log p(d) F (q). (75) KL ( ). δf/δq(z),δf/δq(θ) VB-EM. q(z),q(θ) ( ) q(θ) δ, EM p(θ) q(θ D) KL- N MDL/BIC Variational Bayesian methods for Natural Language Processing p.24/30

: LDA β PLSI d θ = p(z d) α θ z w N, D θ ( ) : β = p(w z), p(w α, β) = p(θ α) n p(θ) Dir(θ α) (76) p(w n θ, β)dθ (77) = Γ( k α k) k Γ(α k). k θ α k 1 k (θ z β zv ) wv n dθ n z v (78) Variational Bayesian methods for Natural Language Processing p.25/30

: LDA (2) log p(w α, β) = log p(w,z,θ α, β)dθ (79) z =log z p(w,z,θ α, β) q(z,θ γ,ψ) dθ (80) q(z,θ γ,ψ) q(z,θ γ,ψ)log z p(w,z,θ α, β) q(z,θ γ,ψ) q(z,θ w,γ,ψ)=q(θ γ) n q(z n w n,ψ), dθ (81) log p(w α, β) log p(θ α) q(θ γ) + n log p(zn θ) q(θ γ),q(z n w n,ψ) log p(wn z n,β) q(z n w n,ψ) + n log q(θ γ) q(θ γ) n log q(zn w n,ψ) q(z n w n,ψ). (82) Variational Bayesian methods for Natural Language Processing p.26/30

VB-HMM y = y 1 y 2 y T, s = s 1 s 2 s T, HMM π (1 K) C (K K) A (K W ) log p(y) =log dπ dπ da da dc s dc s p(π, A, C)p(y, s π, A, C) (83) q(π, A, C, s) log p(π, A, C)p(y, s π, A, C) q(π, A, C, s) (84). Variational Bayesian methods for Natural Language Processing p.27/30

VB-HMM (2) π,c, A Dir(α), Dir(β), Dir(γ), VB-Estep: π k exp ( Ψ(α k ) Ψ( k α k )) (85) A ij exp ( Ψ(β ij) Ψ( j β ij) ) (86) C ij exp ( Ψ(γij) Ψ( γij) ) (87) j VB-Mstep: α, β, γ α,β,γ Forward-Backward. Beal, M.J. (2003) Variational Algorithms for Approximate Bayesian Inference. PhD thesis, Gatsby UCL. http://www.cse.buffalo.edu/faculty/mbeal/thesis/ Chapter.3. Variational Bayesian methods for Natural Language Processing p.28/30

, (or, ) Gibbs sampling, MCMC,,, (LDA, 3 ( )) EP (Expectation Propagation) (Minka 2001), Power EP (Minka 2004) VB EP KL- Power EP α- Variational Bayesian methods for Natural Language Processing p.29/30

Readings Hagai Attias. A Variational Bayesian Framework for Graphical Models. In NIPS 1999, 1999. Thomas Minka. Expectation-Maximization as lower bound maximization, 1998. http://research.microsoft.com/ minka/papers/em.html. Radford M. Neal and Geoffrey E. Hinton. A View of the EM Algorithm that Justifies Incremental, Sparse, and other Variants. in Learning in Graphical Models, pages 355 368. Dordrecht: Kluwer Academic Publishers, 1998. Zoubin Ghahramani. Unsupervised Learning. in Advanced Lectures on Machine Learning LNAI 3176. Springer-Verlag, Berlin, 2004. http://www.gatsby.ucl.ac.uk/ zoubin/course04/ul.pdf. Variational Bayesian methods for Natural Language Processing p.30/30