Overview (Gaussian Process) GPLVM GPDM 2 / 59

Similar documents
A

I A A441 : April 15, 2013 Version : 1.1 I Kawahira, Tomoki TA (Shigehiro, Yoshida )

30

meiji_resume_1.PDF

‚åŁÎ“·„´Šš‡ðŠp‡¢‡½‹âfi`fiI…A…‰…S…−…Y…•‡ÌMarkovŸA“½fiI›ð’Í

ばらつき抑制のための確率最適制御


tokei01.dvi

…p…^†[…fiflF”¯ Pattern Recognition

1 1.1 ( ). z = a + bi, a, b R 0 a, b 0 a 2 + b 2 0 z = a + bi = ( ) a 2 + b 2 a a 2 + b + b 2 a 2 + b i 2 r = a 2 + b 2 θ cos θ = a a 2 + b 2, sin θ =

.2 ρ dv dt = ρk grad p + 3 η grad (divv) + η 2 v.3 divh = 0, rote + c H t = 0 dive = ρ, H = 0, E = ρ, roth c E t = c ρv E + H c t = 0 H c E t = c ρv T

I A A441 : April 21, 2014 Version : Kawahira, Tomoki TA (Kondo, Hirotaka ) Google

1 (Berry,1975) 2-6 p (S πr 2 )p πr 2 p 2πRγ p p = 2γ R (2.5).1-1 : : : : ( ).2 α, β α, β () X S = X X α X β (.1) 1 2

W u = u(x, t) u tt = a 2 u xx, a > 0 (1) D := {(x, t) : 0 x l, t 0} u (0, t) = 0, u (l, t) = 0, t 0 (2)

1 (1) () (3) I 0 3 I I d θ = L () dt θ L L θ I d θ = L = κθ (3) dt κ T I T = π κ (4) T I κ κ κ L l a θ L r δr δl L θ ϕ ϕ = rθ (5) l


1 Introduction 1 (1) (2) (3) () {f n (x)} n=1 [a, b] K > 0 n, x f n (x) K < ( ) x [a, b] lim f n (x) f(x) (1) f(x)? (2) () f(x)? b lim a f n (x)dx = b

,. Black-Scholes u t t, x c u 0 t, x x u t t, x c u t, x x u t t, x + σ x u t, x + rx ut, x rux, t 0 x x,,.,. Step 3, 7,,, Step 6., Step 4,. Step 5,,.

Dirichlet process mixture Dirichlet process mixture 2 /40 MIRU2008 :

y π π O π x 9 s94.5 y dy dx. y = x + 3 y = x logx + 9 s9.6 z z x, z y. z = xy + y 3 z = sinx y 9 s x dx π x cos xdx 9 s93.8 a, fx = e x ax,. a =

目次 ガウス過程 (Gaussian Process; GP) 序論 GPによる回帰 GPによる識別 GP 状態空間モデル 概括 GP 状態空間モデルによる音楽ムードの推定

x V x x V x, x V x = x + = x +(x+x )=(x +x)+x = +x = x x = x x = x =x =(+)x =x +x = x +x x = x ( )x = x =x =(+( ))x =x +( )x = x +( )x ( )x = x x x R


知能科学:ニューラルネットワーク

2011de.dvi

6 2 2 x y x y t P P = P t P = I P P P ( ) ( ) ,, ( ) ( ) cos θ sin θ cos θ sin θ, sin θ cos θ sin θ cos θ y x θ x θ P

V(x) m e V 0 cos x π x π V(x) = x < π, x > π V 0 (i) x = 0 (V(x) V 0 (1 x 2 /2)) n n d 2 f dξ 2ξ d f 2 dξ + 2n f = 0 H n (ξ) (ii) H

( 28 ) ( ) ( ) 0 This note is c 2016, 2017 by Setsuo Taniguchi. It may be used for personal or classroom purposes, but not for commercial purp


Part () () Γ Part ,

6kg 1.1m 1.m.1m.1 l λ ϵ λ l + λ l l l dl dl + dλ ϵ dλ dl dl + dλ dl dl 3 1. JIS 1 6kg 1% 66kg 1 13 σ a1 σ m σ a1 σ m σ m σ a1 f f σ a1 σ a1 σ m f 4

2000年度『数学展望 I』講義録

(3) (2),,. ( 20) ( s200103) 0.7 x C,, x 2 + y 2 + ax = 0 a.. D,. D, y C, C (x, y) (y 0) C m. (2) D y = y(x) (x ± y 0), (x, y) D, m, m = 1., D. (x 2 y

v er.1/ c /(21)

pdf

I, II 1, A = A 4 : 6 = max{ A, } A A 10 10%

1 1 x y = y(x) y, y,..., y (n) : n y F (x, y, y,..., y (n) ) = 0 n F (x, y, y ) = 0 1 y(x) y y = G(x, y) y, y y + p(x)y = q(x) 1 p(x) q(


医系の統計入門第 2 版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 第 2 版 1 刷発行時のものです.

/ 2 n n n n x 1,..., x n 1 n 2 n R n n ndimensional Euclidean space R n vector point R n set space R n R n x = x 1 x n y = y 1 y n distance dx,

simx simxdx, cosxdx, sixdx 6.3 px m m + pxfxdx = pxf x p xf xdx = pxf x p xf x + p xf xdx 7.4 a m.5 fx simxdx 8 fx fx simxdx = πb m 9 a fxdx = πa a =

x, y x 3 y xy 3 x 2 y + xy 2 x 3 + y 3 = x 3 y xy 3 x 2 y + xy 2 x 3 + y 3 = 15 xy (x y) (x + y) xy (x y) (x y) ( x 2 + xy + y 2) = 15 (x y)

08-Note2-web

A S hara/lectures/lectures-j.html ϵ-n 1 ϵ-n lim n a n = α n a n α 2 lim a n = 0 1 n a k n n k= ϵ

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i


Note.tex 2008/09/19( )

4. ϵ(ν, T ) = c 4 u(ν, T ) ϵ(ν, T ) T ν π4 Planck dx = 0 e x 1 15 U(T ) x 3 U(T ) = σt 4 Stefan-Boltzmann σ 2π5 k 4 15c 2 h 3 = W m 2 K 4 5.

液晶の物理1:連続体理論(弾性,粘性)

II 2 3.,, A(B + C) = AB + AC, (A + B)C = AC + BC. 4. m m A, m m B,, m m B, AB = BA, A,, I. 5. m m A, m n B, AB = B, A I E, 4 4 I, J, K

No δs δs = r + δr r = δr (3) δs δs = r r = δr + u(r + δr, t) u(r, t) (4) δr = (δx, δy, δz) u i (r + δr, t) u i (r, t) = u i x j δx j (5) δs 2

II (Percolation) ( 3-4 ) 1. [ ],,,,,,,. 2. [ ],.. 3. [ ],. 4. [ ] [ ] G. Grimmett Percolation Springer-Verlag New-York [ ] 3

,,,17,,, ( ),, E Q [S T F t ] < S t, t [, T ],,,,,,,,

³ÎΨÏÀ

p.2/76

M3 x y f(x, y) (= x) (= y) x + y f(x, y) = x + y + *. f(x, y) π y f(x, y) x f(x + x, y) f(x, y) lim x x () f(x,y) x 3 -


入試の軌跡

() n C + n C + n C + + n C n n (3) n C + n C + n C 4 + n C + n C 3 + n C 5 + (5) (6 ) n C + nc + 3 nc n nc n (7 ) n C + nc + 3 nc n nc n (

N cos s s cos ψ e e e e 3 3 e e 3 e 3 e

ii 3.,. 4. F. (), ,,. 8.,. 1. (75%) (25%) =7 20, =7 21 (. ). 1.,, (). 3.,. 1. ().,.,.,.,.,. () (12 )., (), 0. 2., 1., 0,.

x () g(x) = f(t) dt f(x), F (x) 3x () g(x) g (x) f(x), F (x) (3) h(x) = x 3x tf(t) dt.9 = {(x, y) ; x, y, x + y } f(x, y) = xy( x y). h (x) f(x), F (x

p = mv p x > h/4π λ = h p m v Ψ 2 Ψ

DVIOUT


() x + y + y + x dy dx = 0 () dy + xy = x dx y + x y ( 5) ( s55906) 0.7. (). 5 (). ( 6) ( s6590) 0.8 m n. 0.9 n n A. ( 6) ( s6590) f A (λ) = det(a λi)

Microsoft PowerPoint - SSII_harada pptx

Transcription:

daichi@ism.ac.jp 2015-3-3( ) 1 / 59

Overview (Gaussian Process) GPLVM GPDM 2 / 59

(Gaussian Process) y 2 1 0 1 2 3 8 6 4 2 0 2 4 6 8 x x y (regressor) D = { (x (n), y (n) ) } N, n=1 x (n+1) y (n+1), ( ) 3 / 59

(Gaussian Process) y 3 2 1 0 1 2 3 4 8 6 4 2 0 2 4 6 8 x x y (regressor) D = { (x (n), y (n) ) } N, n=1 x (n+1) y (n+1), ( ) 3 / 59

(Gaussian Process) y x x y (regressor) D = { (x (n), y (n) ) } N, n=1 x (n+1) y (n+1), ( ) 4 / 59

y = w 0 + w 1 x 1 + w 2 x 2 + ϵ = (w 0 w 1 w 2 ) 1 +ϵ }{{} w T x 1 = w T x + ϵ x 2 }{{} x ŵ = (X T X) 1 X T y ( ), 5 / 59

(GLM) y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 + ϵ (1) = (w 0 w 1 w 2 w 3 ) 1 +ϵ (2) }{{} w T x x 2 x 3 }{{} = w T ϕ(x) + ϵ ϕ(x) (3) ϕ(x)! 6 / 59

(GLM) (2) 1 0.75 0.5 0.25 0 1 0 1 ϕ(x) = ( (x µ 1) 2 2σ 2, (x µ 2) 2 2σ 2,, (x µ K) 2 ) 2σ 2 (4),! µ = (µ 1, µ 2,, µ K ) 7 / 59

x y R y = f(x) x = (x 1,, x d ) R d y = f(x), x ϕ(x) y = w T ϕ(x) (5) ϕ(x) = (ϕ 1 (x), ϕ 2 (x),, ϕ H (x)) T = (1, x 1,, x d, x 2 1,, x 2 d )T w = (w 0, w 1,, w 2d ) T, y = w T ϕ(x) = w 0 + w 1 x 1 + + w d x d + w d+1 x 2 1 + + w 2d x 2 d. 8 / 59

GP (1) y (1) y (N), y = Φw (Φ : ) ϕ 1 (x (1) ) ϕ H (x (1) ) w 1. = ϕ 1 (x (2) ) ϕ H (x (2) ) w 2... ϕ 1 (x (N) ) ϕ H (x (N) ). y (1) y (2) y (N) w H y Φ w w p(w) = N(0, α 1 I), y = Φw, 0, yy T = (Φw) (Φw) T = Φ ww T Φ T (7) = α 1 ΦΦ T (6) 9 / 59

GP (2) p(y) = N(y 0, α 1 ΦΦ T ) (8), {x n } N n=1 (x 1, x 2,, x N ), y = (y 1, y 2,, y N ), p(y). =, K = α 1 ΦΦ T k(x, x ) = α 1 ϕ(x) T ϕ(x ) (9) k(x, x ) x x ; x y 10 / 59

GP (3), ϵ { y = w T ϕ(x) + ϵ = p(y f) = N(w T ϕ(x), β 1 I) (10) ϵ N(0, β 1 I) f = w T ϕ(x) p(y x) = p(y f)p(f x)df (11) = N(0, C) (12) Gaussian, C : C(x i, x j ) = k(x i, x j ) + β 1 δ(i, j). (13) GP, k(x, x ) α, β. 11 / 59

2.5 3 2 1.5 2 1 0.5 1 y 0 y 0 0.5 1 1 1.5 2 2 2.5 5 4 3 2 1 0 1 2 3 4 5 x Gaussian: exp( (x x ) 2 /l) 3 3 5 4 3 2 1 0 1 2 3 4 5 x Exponential: exp( x x /l) (OU process) 2.5 2 2 1.5 1 1 0.5 y 0 y 0 1 0.5 1 2 1.5 2 3 5 4 3 2 1 0 1 2 3 4 5 x 2.5 5 4 3 2 1 0 1 2 3 4 5 x Periodic: exp( 2 sin 2 ( x x 2 )/l 2 ) Periodic(L): exp( 2 sin 2 ( x x 2 )/(10l) 2 ) 12 / 59

Correlated Gaussian K = 13 / 59

(2) Correlated Gaussian K = 14 / 59

(3) Correlated Gaussian K = 15 / 59

Infinite dimensional Gaussian, (x 1, x 2,, x n ) y = (y 1, y 2,, y n ), y. (x 1, x 2,, x n ), ( ). K K ij = k(x i, x j ) k. 16 / 59

RBF ϕ(x) = exp((x h) 2 /r 2 ) 1, h k(x, x ) = σ 2 H h=1 ϕ h (x)ϕ h (x ) (14) (x h)2 exp ( r 2 ) exp ( (x h) 2 r 2 ) dh (15) = πr 2 exp ( (x x ) 2 ) 2r 2 θ 1 exp ( (x x ) 2 ) θ 2 2 (16) (x, x ) RBF, RBF. θ 1, θ 2 17 / 59

GP y new y Gaussian, p(y new x new, X, y, θ) = p((y, ynew ) (X, x new ), θ) p(y X, θ) [ exp 1 2 ([y, K ynew ] k T k k ] 1 [ ] y y new y T K 1 y) (17) (18) (19) N(k T K 1 y, k k T K 1 k). (20) K = [k(x, x )]. k = (k(x new, x 1 ),, k(x new, x N )). 18 / 59

GP SVR, Ridge, ARD (Cohn+ 2013) ( ) k(x, x ) = σf 2 exp 1 (x k x k )2 2 σk 2 k (21) Model MAE RMSE µ 0.8279 0.9899 SVM 0.6889 0.8201 Linear ARD 0.7063 0.8480 Squared exp. Isotropic 0.6813 0.8146 Squared exp. ARD 0.6680 0.8098 Rational quadratic ARD 0.6773 0.8238 Matern(5,2) 0.6772 0.8124 Neural network 0.6727 0.8103 19 / 59

GP SVR, Ridge, ARD (Cohn+ 2013) ( ) k(x, x ) = σf 2 exp 1 (x k x k )2 2 σk 2 k (22) Model MAE RMSE µ 0.8541 1.0119 Independent SVMs 0.7967 0.9673 EasyAdapt SVM 0.7655 0.9105 Independent 0.7061 0.8534 Pooled 0.7252 0.8754 Pooled &{N} 0.7050 0.8497 Combined 0.6966 0.8448 20 / 59

GP>SVR,, (Cohn+ 2014 etc.)! 21 / 59

GP GP : / X K 1 O(N 3 ) N > 1000, : m X m, X m O(m 2 N) 22 / 59

Subset of Data : K K mm (23), m O(m 3 ), 23 / 59

Subset of Data : K K mm (24), m O(m 3 ), 1.5 1 0.5 0 0.5 1 1.5 15 10 5 0 5. 10. 15.... 24 / 59

(2) Subset of Regressors (Silverman 1985) : m K K nm K 1 mmk mn = K (25) K nm : N m O(m 2 N) 25 / 59

(2) Subset of Regressors (Silverman 1985) : m K K nm K 1 mmk mn = K (26) K nm : N m O(m 2 N), 1.5 1 0.5 0 0.5 1 26 / 59

K, (Quiñonero-Candela & Rasmussen 2005). 27 / 59

(Titsias 2009), Jensen : log p(x)f(x)dx p(x) log f(x)dx X m GP f m, log p(y) = log p(y, f, f m )dfdf m (27) = log q(f, f m ) p(y, f, f m) q(f, f m ) dfdf m (28) q(f, f m ) log p(y, f, f m) q(f, f m ) dfdf m (29), q(f, f m ) 28 / 59

(2) p(y, f, f m ) = p(y f)p(f f m )p(f m ), q(f, f m ) = p(f f m )q(f m ), log p(y) = = = q(f, f m ) log p(y, f, f m) q(f, f m ) dfdf m (30) p(f f m )q(f m ) log p(y f) p(f f m )p(f m ) dfdf m p(f f m )q(f m ) (31) p(f f m )q(f m ) log p(y f)p(f m) dfdf m q(f m ) (32) q(f m )[ p(f f m ) log p(y f)df } {{ } G(f m ) + log p(f ] m) df m q(f m ) (33) 29 / 59

(3) G(f m ), G(f m ) = p(f f m ) log p(y f)df (34) = p(f f m ) ( N2 ) (y log(2πσ2 f)2 ) 2σ 2 df (35) [ = p(f f m ) N 2 log(2πσ2 ) 1 ] 2σ 2 tr(yt y 2y T f +f T f) df = N 2 log(2πσ2 ) 1 [ y T 2σ 2 y 2y T α+α T α+tr ( K nn K nm K 1 ( α = E[f fm ] = K nm Kmmf 1 ) m = log N(y α, σ 2 I) 1 2σ 2 tr ( K nn K nn (36) mmk mn )] (37) ). (38) 30 / 59

(4), log p(y) = = [ q(f m ) q(f m ) G(f m ) + log p(f m) q(f m ) [ log N(y α, σ 2 I) 1 ] df m (39) 2σ 2 tr ( K nn K nn ) + log p(f m) q(f m ) ] df m [ q(f m ) log N(y α, σ2 I) + log p(f m ) q(f m ) Jensen bound, p(x) log f(x) dx log p(x) ] df m (40) 1 2σ 2 tr(k nn K nn) (41) f(x)dx (42) 31 / 59

(5), log N(y α, σ 2 I)p(f m )df m 1 2σ 2 tr(k nn K nn) (K nn = K nm K 1 mmk mn ) (43) α = E[f f m ] = K nm K 1 mmf m, N(y α, σ 2 I)p(f m )df m = N(y 0, σ 2 I + K nn) (44), log p(y) log N(y 0, σ 2 I + K nn) 1 2σ 2 tr(k nn K nn). (45) 32 / 59

(6) log N(y 0, σ 2 I + K nn) 1 2σ 2 tr(k nn K nn) = log N(y 0, σ 2 I + K nn) 1 2σ 2 tr(cov(f f m)) (46) 1 : f m 2 : f m K nn, 1. 33 / 59

GP SVM y = {+1, 1}, p(y f) = σ(y f) (logit) or Ψ(y f) (probit) minimize: log p(y f)p(f X) = 1 N 2 f T K 1 f log p(y i f i ) (47) i=1 SVM Kα = f, w = α i x i w 2 = α T Kα = f T K 1 f, i 1 N minimize: 2 w 2 C (1 y i f i ) + i=1 = 1 N 2 f T K 1 f C (1 y i f i ) +. (48) i=1, SVM hinge loss. 34 / 59

Loss functions Relationships between GPs and Other Models 2 log(1 + exp( z)) log Φ(z) max(1 z, 0) g ǫ(z) 1 0 2 0 1 4. ǫ 0 ǫ z (a) (b) Figure 6.3: (a) A comparison of the hinge error, g λ and g Φ. (b) The ǫ-insensitive error function used in SVR. SVM ME, :, GP classifier ( ) 35 / 59

DP Gaussian process Dirichlet process [ ] GP: (x 1, x 2,, x ), (y 1, y 2,, y ) DP: (X 1, X 2,, X ), Dir(α(X 1 ), α(x 2 ),, α(x )), smoother 36 / 59

Probabilistic PCA (Tipping & Bishop 1999), { yn = Wx n + ϵ ϵ N(0, σ 2 I) (49) L = log p(y n ) = log N(Wx n, σ 2 I) (50) = N 2 ( log 2π + log C + tr(c 1 S) ) (51), C = WW T + σ 2 I (52) S = 1 N YYT. (53) 38 / 59

(2) L = 0, L W W Ŵ U q(λ q σ 2 I) 1 2 (σ 2 = 0 U q Λ 1 2 ) (54) Λ q, U q : YY T q σ 2 = 0 39 / 59

Gaussian Process Latent Variable Models (GPLVM) Probabilistic PCA (Tipping&Bishop 1999): p(y n W, β) = p(y n x n,w, β)p(x n )dx n (55) p(y W, β) = n p(y n W, β) W GPLVM (Lawrence, NIPS 2003): W prior p(w) = D N(w i 0, α 1 I) (56) i=1 p(y X, β) = p(y X, β)p(w)dw (57) ( 1 = (2π) DN/2 exp 1 ) K D/2 2 tr(k 1 YY T ) (58) 40 / 59

GPLVM (2): PPCA Dual log p(y X, β) = DN 2 log(2π) D 2 log K 1 2 tr(k 1 YY T ) (59) K = αxx T + β 1 I (60) X = [x 1,, x N ] T (61) X, L X = αk 1 YY T K 1 X αdk 1 X = 0 (62) X = 1 D YYT K 1 X X U Q LV T (63) U Q (N Q) : YY T Q λ 1 λ Q L = diag(l 1,, l Q ); l i = 1/ λi αd 1 αβ 41 / 59

GPLVM (3) : Kernel log p(y X, β) = DN 2 log(2π) D 2 log K 1 2 tr(k 1 YY T ) K = αxx T + β 1 I, (64) X = [x 1,, x N ] T (65) = K ( k(x n, x m ) = α exp γ 2 (x n x m ) 2) + δ(n, m)β 1 (66) L K = K 1 YY T K 1 DK 1 L = L K x n,j K x n,j Scaled Conjugate Gradient GPLVM in MATLAB: http://www.cs.man.ac.uk/ neill/gplvm/ 42 / 59

GPLVM (4): 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.2 0.15 0.1 0.05 0 0.05 0.1 0.15 0.2 0.25 2.4 2.2 2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.5 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.2 0.1 0 0.1 0.2 0.3 0.4 Figure 1: Visualisation of the Oil data with (a) PCA (a linear GPLVM) and (b) A GPLVM which uses an RBF kernel. Crosses, circles and plus signs represent stratifi ed, annular and homogeneous flows respectively. The greyscales in plot (b) indicate the precision with which the manifold was expressed in data-space for that latent point. The optimised parameters of the kernel were, and f. PPCA( ), GP-LVM( ), Confidence (O(N 3 )): active set ( ), 43 / 59

GPLVM (4): Caveat PCA, Neil Lawrence, 1e-2*randn(N,dims) Scaled conjugate gradient.03.02.01 0.01.02.03 0.03 0.02 0.01 0 0.01 0.02 0.03 44 / 59

GPLVM (5): 1.5 1 0.5 0 0.5 1 1.5 0 20 40 60 80 100 1 0.8 0.6 0.4 0.2 0 0.2 0.4 0.6 0.8 1 1 0.5 0 0.5 1 45 / 59

GPLVM (6): 1.5 1 0.5 0 0.5 1 1.5 1.5 1 0.5 0 0.5 1 1.5 46 / 59

GPLVM (6): 1.5 1.5 1 1 0.5 0.5 0 0 0.5 0.5 1 1 1.5 1.5 1 0.5 0 0.5 1 1.5 1.5 1.5 1 0.5 0 0.5 1 1.5 PCA 47 / 59

GPLVM (7): MCMC 4 1 3 0.5 0 2 1 0 0.5 1 1 2 3 1 0.5 0 0.5 1 Local 4 4 2 0 2 4 6 Global MCMC ( =0.2, 400 iteration) 0 ( GPDM) 48 / 59

GPLVM (8): MCMC (Oil Flow) Local Global MCMC, X, X 49 / 59

Gaussian Process Dynamical Model (Hertzmann 2005) http://www.dgp.toronto.edu/ jmwang/gpdm/ GPLVM, x n x n (GP ). ( ).? 50 / 59

GPDM (2): Formulation (1) { xt = f(x t 1 ; A) + ϵ x,t y t = g(x t ; B) + ϵ y,t, f GP(0, K x ) (67) g GP(0, K y ) (68) p(y, X α, β) = p(y X, β)p(x α). 1 W N ( p(y X, β) = (2π) ND/2 exp 1 ) K Y D/2 2 tr(k 1 Y YW2 Y T ) (69) GPLVM. K Y ( ) RBF 51 / 59

GPDM (3): Formulation (2) 2 Markov N p(x α) = p(x 1 ) p(x t x t 1, A, α) p(a α) da (70) }{{} t=2 Gaussian ( 1 = p(x 1 ) (2π) d(n 1)/2 K X d exp 1 ) 2 tr(k 1 X X X T ) (71) X = [x 2,, x N ] T K X x 1 x N 1 RBF+ ( k(x, x ) = α 1 exp α 2 2 x x 2) + α 3 x T x + α4 1 δ(x, x ). (72) 52 / 59

GPDM (4): Formulation(3) p(y, X, α, β) = p(y X, β)p(x α)p(α)p(β) (73) p(α) i αi 1, p(β) i β 1 i. (74) log p(y, X, α, β) = 1 2 tr(k 1 X X X T ) + 1 2 tr(k 1 Y YW2 Y T ) + d 2 log K X + D 2 log K Y ( ) log W + log α j + log β j j j }{{} ( ) (75). (76) 53 / 59

Gaussian Process Density Sampler (1) 3 3 3 3 2 2 2 2 1 1 1 1 0 0 0 0 1 1 1 1 2 2 2 2 3 3 2 1 0 1 2 3 3 3 2 1 0 1 2 3 3 3 2 1 0 1 2 3 3 3 2 1 0 1 2 3 (a) l x =1, l y =1, α=1 (b) l x =1, l y =1, α=10 (c) l x =0.2, l y =0.2, α=5 (d) l x =0.1, l y =2, α=5 GP prior? p(x) = 1 Φ(f(x))π(x) (77) Z(f) f(x) GP(x) ; π(x) : Φ(x) [0, 1] : ex. Φ(x) = 1/(1 + exp( x)) 54 / 59

Gaussian Process Density Sampler (2) : Rejection sampling p(x) = 1 Φ(f(x))π(x) (78) Z(f) 1. Draw x π(x). 2. Draw r Uniform[0, 1]. 3. If r < Φ(g(x)) then accept x; else reject x Accept N, reject M ( ) Z(f), Φ(g(x)) MCMC! Infinite Mixture 55 / 59

Gaussian process,,,,, (GPLVM, GPDM) 56 / 59

Literature Gaussian Process Dynamical Models. J. Wang, D. Fleet, and A. Hertzmann. NIPS 2005. http://www.dgp.toronto.edu/ jmwang/gpdm/ Gaussian Process Latent Variable Models for Visualization of High Dimensional Data. Neil D. Lawrence, NIPS 2003. The Gaussian Process Density Sampler. Ryan Prescott Adams, Iain Murray and David MacKay. NIPS 2008. Archipelago: Nonparametric Bayesian Semi-Supervised Learning. Ryan Prescott Adams and Zoubin Ghahramani. ICML 2009. 57 / 59

(Pattern Recognition and Machine Learning), Chapter 6. Christopher Bishop, Springer, 2006. http://ibisforest.org/index.php?prml Gaussian Processes for Machine Learning. Rasmussen and Williams, MIT Press, 2006. http://www.gaussianprocess.org/gpml/ Gaussian Processes A Replacement for Supervised Neural Networks?. David MacKay, Lecture notes at NIPS 1997. http://www.inference.phy.cam.ac.uk/mackay/gp/ Videolectures.net: Gaussian Process Basics. http://videolectures.net/gpip06 mackay gpb/ (1)., 2007. http://www.iris.dti.ne.jp/ tmasada/2007071101.pdf 58 / 59

Codes GPML Toolbox (in MATLAB): http://www.gaussianprocess.org/gpml/code/ GPy (in Python): http://sheffieldml.github.io/gpy/ 59 / 59