(note-02) Rademacher 1/57

Similar documents
³ÎΨÏÀ

Akito Tsuboi June 22, T ϕ T M M ϕ M M ϕ T ϕ 2 Definition 1 X, Y, Z,... 1

1 1.1 ( ). z = a + bi, a, b R 0 a, b 0 a 2 + b 2 0 z = a + bi = ( ) a 2 + b 2 a a 2 + b + b 2 a 2 + b i 2 r = a 2 + b 2 θ cos θ = a a 2 + b 2, sin θ =

III 1 (X, d) d U d X (X, d). 1. (X, d).. (i) d(x, y) d(z, y) d(x, z) (ii) d(x, y) d(z, w) d(x, z) + d(y, w) 2. (X, d). F X.. (1), X F, (2) F 1, F 2 F

(2 X Poisso P (λ ϕ X (t = E[e itx ] = k= itk λk e k! e λ = (e it λ k e λ = e eitλ e λ = e λ(eit 1. k! k= 6.7 X N(, 1 ϕ X (t = e 1 2 t2 : Cauchy ϕ X (t

8.1 Fubini 8.2 Fubini 9 (0%) 10 (50%) Carathéodory 10.3 Fubini 1 Introduction 1 (1) (2) {f n (x)} n=1 [a, b] K > 0 n, x f n (x) K < ( ) x [a

() Remrk I = [0, ] [x i, x i ]. (x : ) f(x) = 0 (x : ) ξ i, (f) = f(ξ i )(x i x i ) = (x i x i ) = ξ i, (f) = f(ξ i )(x i x i ) = 0 (f) 0.

Riemann-Stieltjes Poland S. Lojasiewicz [1] An introduction to the theory of real functions, John Wiley & Sons, Ltd., Chichester, 1988.,,,,. Riemann-S

Lebesgue Fubini L p Banach, Hilbert Höld

,. Black-Scholes u t t, x c u 0 t, x x u t t, x c u t, x x u t t, x + σ x u t, x + rx ut, x rux, t 0 x x,,.,. Step 3, 7,,, Step 6., Step 4,. Step 5,,.

1 Introduction 1 (1) (2) (3) () {f n (x)} n=1 [a, b] K > 0 n, x f n (x) K < ( ) x [a, b] lim f n (x) f(x) (1) f(x)? (2) () f(x)? b lim a f n (x)dx = b

meiji_resume_1.PDF

医系の統計入門第 2 版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 第 2 版 1 刷発行時のものです.

III III 2010 PART I 1 Definition 1.1 (, σ-),,,, Borel( ),, (σ-) (M, F, µ), (R, B(R)), (C, B(C)) Borel Definition 1.2 (µ-a.e.), (in µ), (in L 1 (µ)). T


量子力学 問題

π, R { 2, 0, 3} , ( R),. R, [ 1, 1] = {x R 1 x 1} 1 0 1, [ 1, 1],, 1 0 1,, ( 1, 1) = {x R 1 < x < 1} [ 1, 1] 1 1, ( 1, 1), 1, 1, R A 1

2012 IA 8 I p.3, 2 p.19, 3 p.19, 4 p.22, 5 p.27, 6 p.27, 7 p

..3. Ω, Ω F, P Ω, F, P ). ) F a) A, A,..., A i,... F A i F. b) A F A c F c) Ω F. ) A F A P A),. a) 0 P A) b) P Ω) c) [ ] A, A,..., A i,... F i j A i A

A

I, II 1, A = A 4 : 6 = max{ A, } A A 10 10%


°ÌÁê¿ô³ØII

III ϵ-n ϵ-n lim n a n = α n a n α 1 lim a n = 0 1 n a k n n k= ϵ-n 1.1

B2 ( 19 ) Lebesgue ( ) ( ) 0 This note is c 2007 by Setsuo Taniguchi. It may be used for personal or classroom purposes, but not for commercia

B [ 0.1 ] x > 0 x 6= 1 f(x) µ 1 1 xn 1 + sin sin x 1 x 1 f(x) := lim. n x n (1) lim inf f(x) (2) lim sup f(x) x 1 0 x 1 0 (

Basic Math. 1 0 [ N Z Q Q c R C] 1, 2, 3,... natural numbers, N Def.(Definition) N (1) 1 N, (2) n N = n +1 N, (3) N (1), (2), n N n N (element). n/ N.

微分積分 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

( )/2 hara/lectures/lectures-j.html 2, {H} {T } S = {H, T } {(H, H), (H, T )} {(H, T ), (T, T )} {(H, H), (T, T )} {1

x (x, ) x y (, y) iy x y z = x + iy (x, y) (r, θ) r = x + y, θ = tan ( y ), π < θ π x r = z, θ = arg z z = x + iy = r cos θ + ir sin θ = r(cos θ + i s


2000年度『数学展望 I』講義録

y π π O π x 9 s94.5 y dy dx. y = x + 3 y = x logx + 9 s9.6 z z x, z y. z = xy + y 3 z = sinx y 9 s x dx π x cos xdx 9 s93.8 a, fx = e x ax,. a =

Z[i] Z[i] π 4,1 (x) π 4,3 (x) 1 x (x ) 2 log x π m,a (x) 1 x ϕ(m) log x 1.1 ( ). π(x) x (a, m) = 1 π m,a (x) x modm a 1 π m,a (x) 1 ϕ(m) π(x)

inkiso.dvi

I (Analysis I) Lebesgue (Lebesgue Integral Theory) 1 (Seiji HIRABA) 1 ( ),,, ( )

Dynkin Serre Weyl

…X…p†[…X’³‚¥›»‡¨‡æ‡Ñ…}…‰…`…J†[…l…‰−w‘K‡Ì‡½‡ß‡Ì“ÅfiK›»…A…‰…S…−…Y…•‡ÆCV†EPR‡Ö‡Ì›žŠp

= π2 6, ( ) = π 4, ( ). 1 ( ( 5) ) ( 9 1 ( ( ) ) (

ばらつき抑制のための確率最適制御

(1) (2) (3) (4) 1

x () g(x) = f(t) dt f(x), F (x) 3x () g(x) g (x) f(x), F (x) (3) h(x) = x 3x tf(t) dt.9 = {(x, y) ; x, y, x + y } f(x, y) = xy( x y). h (x) f(x), F (x

(ii) (iii) z a = z a =2 z a =6 sin z z a dz. cosh z z a dz. e z dz. (, a b > 6.) (z a)(z b) 52.. (a) dz, ( a = /6.), (b) z =6 az (c) z a =2 53. f n (z


ohpmain.dvi

v er.1/ c /(21)

I = [a, b] R γ : I C γ(a) = γ(b) z C \ γ(i) 1(4) γ z winding number index Ind γ (z) = φ(b, z) φ(a, z) φ 1(1) (i)(ii) 1 1 c C \ {0} B(c; c ) L c z B(c;

n=1 1 n 2 = π = π f(z) f(z) 2 f(z) = u(z) + iv(z) *1 f (z) u(x, y), v(x, y) f(z) f (z) = f/ x u x = v y, u y = v x

2 1 Introduction

all.dvi

Part () () Γ Part ,

.3. (x, x = (, u = = 4 (, x x = 4 x, x 0 x = 0 x = 4 x.4. ( z + z = 8 z, z 0 (z, z = (0, 8, (,, (8, 0 3 (0, 8, (,, (8, 0 z = z 4 z (g f(x = g(

r 1 m A r/m i) t ii) m i) t B(t; m) ( B(t; m) = A 1 + r ) mt m ii) B(t; m) ( B(t; m) = A 1 + r ) mt m { ( = A 1 + r ) m } rt r m n = m r m n B


D = [a, b] [c, d] D ij P ij (ξ ij, η ij ) f S(f,, {P ij }) S(f,, {P ij }) = = k m i=1 j=1 m n f(ξ ij, η ij )(x i x i 1 )(y j y j 1 ) = i=1 j

I A A441 : April 15, 2013 Version : 1.1 I Kawahira, Tomoki TA (Shigehiro, Yoshida )

II Time-stamp: <05/09/30 17:14:06 waki> ii

1 c Koichi Suga, ISBN

,,,,., = (),, (1) (4) :,,,, (1),. (2),, =. (3),,. (4),,,,.. (1) (3), (4).,,., () : = , ( ) : = F 1 + F 2 + F 3 + ( ) : = i Fj j=1 2


1 X X T T X (topology) T X (open set) (X, T ) (topological space) ( ) T1 T, X T T2 T T T3 T T ( ) ( ) T1 X T2 T3 1 X T = {, X} X (X, T ) indiscrete sp

D 24 D D D

all.dvi



2 1 κ c(t) = (x(t), y(t)) ( ) det(c (t), c x (t)) = det (t) x (t) y (t) y = x (t)y (t) x (t)y (t), (t) c (t) = (x (t)) 2 + (y (t)) 2. c (t) =

−g”U›ß™ö‡Æ…X…y…N…g…‰

1. A0 A B A0 A : A1,...,A5 B : B1,...,B

ALAGIN (SVM)

Exercise in Mathematics IIB IIB (Seiji HIRABA) 0.1, =,,,. n R n, B(a; δ) = B δ (a) or U δ (a) = U(a;, δ) δ-. R n,,,, ;,,, ;,,. (S, O),,,,,,,, 1 C I 2

newmain.dvi

untitled

i

卓球の試合への興味度に関する確率論的分析

ii 3.,. 4. F. ( ), ,,. 8.,. 1. (75% ) (25% ) =7 24, =7 25, =7 26 (. ). 1.,, ( ). 3.,...,.,.,.,.,. ( ) (1 2 )., ( ), 0., 1., 0,.

1/1 lim f(x, y) (x,y) (a,b) ( ) ( ) lim limf(x, y) lim lim f(x, y) x a y b y b x a ( ) ( ) xy x lim lim lim lim x y x y x + y y x x + y x x lim x x 1

1 I

2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n

(Basic of Proability Theory). (Probability Spacees ad Radom Variables , (Expectatios, Meas) (Weak Law


.1 1,... ( )

Z: Q: R: C: sin 6 5 ζ a, b

‚åŁÎ“·„´Šš‡ðŠp‡¢‡½‹âfi`fiI…A…‰…S…−…Y…•‡ÌMarkovŸA“½fiI›ð’Í

A 21 A.1 L p A A.3 H k,p () A

p = mv p x > h/4π λ = h p m v Ψ 2 Ψ

I L01( Wed) : Time-stamp: Wed 07:38 JST hig e, ( ) L01 I(2017) 1 / 19


A B P (A B) = P (A)P (B) (3) A B A B P (B A) A B A B P (A B) = P (B A)P (A) (4) P (B A) = P (A B) P (A) (5) P (A B) P (B A) P (A B) A B P

No δs δs = r + δr r = δr (3) δs δs = r r = δr + u(r + δr, t) u(r, t) (4) δr = (δx, δy, δz) u i (r + δr, t) u i (r, t) = u i x j δx j (5) δs 2

a n a n ( ) (1) a m a n = a m+n (2) (a m ) n = a mn (3) (ab) n = a n b n (4) a m a n = a m n ( m > n ) m n 4 ( ) 552

II (10 4 ) 1. p (x, y) (a, b) ε(x, y; a, b) 0 f (x, y) f (a, b) A, B (6.5) y = b f (x, b) f (a, b) x a = A + ε(x, b; a, b) x a 2 x a 0 A = f x (


z f(z) f(z) x, y, u, v, r, θ r > 0 z = x + iy, f = u + iv C γ D f(z) f(z) D f(z) f(z) z, Rm z, z 1.1 z = x + iy = re iθ = r (cos θ + i sin θ) z = x iy

ii 3.,. 4. F. (), ,,. 8.,. 1. (75%) (25%) =7 20, =7 21 (. ). 1.,, (). 3.,. 1. ().,.,.,.,.,. () (12 )., (), 0. 2., 1., 0,.

「産業上利用することができる発明」の審査の運用指針(案)

重力方向に基づくコントローラの向き決定方法

1 variation 1.1 imension unit L m M kg T s Q C QT 1 A = C s 1 MKSA F = ma N N = kg m s 1.1 J E = 1 mv W = F x J = kg m s 1 = N m 1.

ft. ft τfτdτ = e t.5.. fx = x [ π, π] n sinnx n n=. π a π a, x [ π, π] x = a n cosnx cosna + 4 n=. 3, x [ π, π] x 3 π x = n sinnx. n=.6 f, t gt n 3 n

tokei01.dvi

one way two way (talk back) (... ) C.E.Shannon 1948 A Mathematical theory of communication. 1 ( ) 0 ( ) 1

Transcription:

(note-02) Rademacher 1/57

(x 1, y 1 ),..., (x n, y n ) X Y f : X Y Y = R f Y = {+1, 1}, {1, 2,..., G} f x y 1. (x 1, y 1 ),..., (x n, y n ) f(x i ) ( ) 2. x f(x) Y 2/57

(x, y) f(x) f(x) y (, loss) l(f(x), y) l(f(x), y) = (f(x) y) 2 L 1 f(x) y 0-1 l(f(x), y) = 1[f(x) y] f(x) = y loss = 0 f(x) y loss = 1 1, A, 1[ ] 1[A] = 0, A. 3/57

f : X Y (x 1, y 1 ),..., (x n, y n ) X Y L(f) = 1 n n l(f(x i ), y i ) X Y D L(f) = E[l(f(x), y)], (x, y) D 4/57

2 & 0-1 x R, y {1, 1}, P (x, y) = p(x y)p (y) L(f) 0.4 h(x) = 1 h(x) =1 = P (f(x) = 1, Y = +1) 0.3 + P (f(x) = +1, Y = 1) 0.2 0.1 p(x y = 1) p(x y = 1) = P (f(x) = 1 Y = 1) P (Y = 1) + P (f(x) = 1 Y = 1) P (Y = 1) 0.0-4 -2 0 2 4 = [ ] P (Y = 1) + [ ] P (Y = 1) 5/57

L(f) f D L(f) L(f) f i.i.d. f L(f) p L(f), n, = [ L(f) f ] [ L(f) f ] ( ) L(f) L(f) 6/57

Bayes rule f 0 L(f 0 ) = inf f L(f) inf f : f : X Y f 0 : Bayes error: L(f 0 ) ( ) Bayes rule 7/57

Theorem 1. Bayes rule : Bayes rule (x, y) p(x, y) = p(y x)p(x) f 0 (x) = arg min Bayes error : L(f 0 ) = f 0 f Y p(x) l(f, y)p(y x)dy ( min f Y ) l(f, y)p(y x)dy dx (Steinwart, Christmann, Support Vector Machines, Chap.3 ) 8/57

L(f) = = L(f(x), y)p(y x)p(x)dydx ( ) p(x) min L(f, y)p(y x)dy dx f Y p(x) L(f 0 (x), y)p(y x)dydx = L(f 0 ). 9/57

Y = R l(f, y) = (f y) 2 f 0 (x) = yp(y x)dx. (y f) 2 p(y x)dy = (y f 0 (x)) 2 p(y x)dy + (f 0 (x) f(x)) 2 10/57

l(f, y) = f y f 0 (x) p(y x) y note: y f p(y x)dy f = 0 global min. y f p(y x)dy = f (y f)p(y x)dy + f (f y)p(y x)dy = f yp(y x)dy fp (Y f x) + fp (Y f x) f yp(y x)dy. 11/57

f L(f x) = fp(f x) P (Y f x) fp(f x) + P (y f x) fp(f x) + fp(f x) = P (Y f x) + P (y f x) L(f x) = 0 P (Y f x) = P (y f x) f p(y x) 12/57

2 Y = {+1, 1} p(x, y) = p(y x)p(x) f 0 (x) = +1, p(+1 x) p( 1 x), 1, p( 1 x) > p(+1 x). L(f 0 ) = min{p(x, y)}dx. y 13/57

y=±1 1[f y]p(y x) = 1 p(f x) f = { p(+1 x) p( 1 x) f0 (x) = +1, p(+1 x) < p( 1 x) f 0 (x) = 1 L(f 0 ) = (1 max y=±1 p(y x))p(x)dx = min p(y x)p(x)dx y=±1 2 2 14/57

p(+1 x) x 1.0 0.8 p(y = 1 x) p(y =1 x) p(y x) 0.6 0.4 0.2 0.0-4 -2 0 2 4 6 x h 0 (x) 15/57

Y = {1, 2,..., K} p(x, y) = p(y x)p(x) Remark. L(f 0 ) = f 0 (x) = arg max p(y x) y (1 max y p(y x))p(x)dx p(y x)p(x) [Steinwart, A3.16] cf. Polish space. 16/57

( ) H (Empirical risk minimization, ERM): min f H L(f) f ERM (Regularized ERM) f 0 H 17/57

Example 1. H = {f 1, f 2 }, L(f1 ) = 2 8, L(f2 ) = 3 8 = f = f 1 h 1 h 2 18/57

(Y = R) ϕ : X R d H Lin = {f(x) = w T ϕ(x i ) + b w R d, b R} (2 ) H = {f(x) = sign(g(x)) g H Lin }, sign ( ) Y = {1, 2,..., K}. H = {f(x) = arg max{g 1 (x),..., g K (x)} g 1,..., g K H Lin } 19/57

(x 1, y 1 ),..., (x n, y n ) i.i.d. D f 0 (x) ERM, H f L( f) L(f 0 )?? 20/57

H Bayes rule f 0 f 0 H f := arg min L(f) f H f = arg min L(f) f H f 0 f. 2 (i) L(f 0 ) L( f) L( f), (ii) L( f) L( f) 21/57

L( f) L(f 0 ) L( f) L(f 0 ) = [ L( f) L( f) ] + [ L( f) L(f 0 ) ] ( 0) ( 0) H f 0 22/57

: P ( L( f) L( f) < ε ) 1 δ ε δ ε > 0, δ (0, 1) P ( ) i.i.d. (x 1, y 1 ),..., (x n, y n ) δ ε 23/57

Lemma 1. P ( L( f) L( f) ε ) P ( L( f) L( f) ε/2 ) + P ( L( f) L( f) ε/2 ) Proof. L( f) L( f) = (L( f) L( f)) + ( L( f) L( f)) + ( L( f) L( f)) (L( f) L( f)) + ( L( f) L( f)) L( f) L( f) ε (i) L( f) L( f) ε/2, (ii) L( f) L( f) ε/2. 24/57

Theorem 2 ( ). sup f,y l(f, y) inf f,y l(f, y) M H < 1 δ L( f) < L(f 0 ) + [ L( f) L(f 0 ) ] 2 H + 1 + M log n δ (1) Proof. Hoeffding Z i = l(f(x i ), y i ) E[ L( f)] = L( f) P ( L( f) L( f)) ε/2 ) e nε 2 /2M 2. 25/57

f P ( L( f) L( f) ε/2 ) P ( { ) L(f) L(f) ε/2} f H f H P ( L(f) L(f) ε/2 ) H e nε2 /2M 2. note: l(ĥ(x i), y i ), i = 1,..., n = H 26/57

P P ( ) L( f) L( f) ε ( ) L( f) L( f) ε/2 ( H + 1)e nε2 /2M 2 + P ( ) L( f) L( f) ε/2 δ = ( H + 1)e nε2 /2M 2 (1) Remark. Chebyshev s ineq P (4 H +1)M 2 4 M n H +1 δ. nε 2, 27/57

2, 0-1 loss (M = 1). X = (0, 1] p(y x)p(x), p(x) = 1, x (0, 1] 0, x a, p(+1 x) =, f 0 (x) = 1, x < a 1, x a, +1, x < a. L(f 0 ) = 0. F K f b,l b, l/k x 1, f b,l (x) = b, 0 < x < l/k, b = ±1, l = 1,..., K. 28/57

F K = 2K, f F K, L( f) 1/(2K) a (l/k, (l + 1)/K] f = f 0 f F K F K < 1 δ L( f) < 1 2K + 2 n log 2K + 1 δ 29/57

K = n, δ = 1/n 1 1/n L( f) < 1 2 n + 2 log n n log 3n3/2 4, (n 3). n 30/57

b H := min f H L(f) L(f 0) v H (n, δ) := M 2 n H + 1 log δ (1) n 1 δ L( f) < L(f 0 ) + b H + v H (n, δ). 31/57

H 1, H 2,..., H K H 1 H 2 H K b H1 b H2 b HK n δ v H1 (n, δ) v H2 (n, δ) v HK (n, δ) 32/57

L( f) L(f 0 ) + b H + v H (n, δ) min k=1,...,k b H k + v Hk (n, δ) H k prediction error 0.0 0.5 1.0 1.5 2.0 bias+varialce bias variance 1e+01 1e+03 1e+05 H 33/57

Rademacher H = 34/57

Definition 1 (( ) Rademacher ). H {f f : Z R} S = {z 1,..., z n } Z Rademacher : RS (H) = E σ [sup f H σ 1,..., σ n i.i.d. Ber(1/2), i.e., P (σ i = +1) = P (σ i = 1) = 1/2 1 n ] n σ i f(z i ), E σ [ ] σ 1,..., σ n Rademacher R n (H) = E S [ R S (H)], (z 1,..., z n i.i.d. P ). R n (H) 35/57

Rademacher σ 1,..., σ n {+1, 1} G 36/57

R n (H) Rademacher 1. H 1 H 2 = R S (H 1 ) R S (H 2 ) 2. c R R S (c H) = c R S (H) c H = {cg : g H} 3. ϕ : R R L R S (ϕ H) L R S (H). ϕ H = {z ϕ(g(z)) : g H} 37/57

Proof. 1. 2. c = 0 c 0 R S (ch) = E σ [ sup g H = c E σ [ 1 n sup g H n 1 n ] [ σ i cg(z i ) = c E σ n ] σ i g(z i ) sup g H 1 n n ] σ i sign(c)g(z i ) σ i σ i sign(c) 1/2 ±1 3 Foundations of Machine Learning, Lemma 4.2. 38/57

Rademacher Theorem 3 ( ). H {f : Z [a, b]} Z Z, Z 1,..., Z n i.i.d. P. 1 δ sup f H E[f(Z)] 1 n n f(z i ) log(2/δ) 2R n(h) + (b a). 2n Hoeffding s McDiarmid 39/57

{ Proof. A(z 1,..., z n ) = sup E[g(Z)] 1 g H n n } g(z i ) A(z 1,..., z n ) A(z 1,..., z n 1, z ) { = sup inf E[g(Z)] 1 n g(z i ) E[f(Z)] + 1 g H f H n n sup g H = sup g H { E[g(Z)] 1 n g(z ) g(z n ) n n g(z i ) E[g(Z)] + 1 n b a n. n 1 n 1 f(z i ) + 1 } n f(z ) g(z i ) + 1 } n g(z ) A(z 1,..., z n 1, z ) A(z 1,..., z n 1, z n ) (b a)/n 40/57

A(z 1,..., z n 1, z ) A(z 1,..., z n 1, z n ) b a n McDiarmid s inequality { P A(Z 1,..., Z n ) E[A] (b a) } log(1/δ) 2n 1 δ. (2) E[A] Z Z 1,..., Z n, Z 1,..., Z n 41/57

A(Z 1,..., Z n ) = sup g H { E Z 1,...,Z n Z i Z i E Z 1,...,Z n [ sup g H [ 1 n 1 n ] } n g(z i) 1 n g(z i ) n ] n (g(z i) g(z i )). = g(z i ) g(z i) g(z i ) g(z i ) σ 1,..., σ n i.i.d. Ber(1/2) σ i (g(z i ) g(z i)) g(z i ) g(z i) 42/57

E Z1,...,Z n [A(Z 1,..., Z n )] [ ] 1 n E Z,Z,σ σ i (g(z n i) g(z i )) E [ sup g H = 2R n (H) 1 n sup g H ] [ n σ i g(z i) + E sup g H 1 n ] n ( σ i )g(z i ) (2) O.K. 43/57

Example 2 ( ). H < f H f B (f(z 1 ),..., f(z n )) 2 B n Massart s ineq. R S (H) = E σ [max f H 1 n ] n σ i f(z i ) B n 2 log H n = B 2 log H n R n (H) 44/57

Example 3 ( (decision stump)). 1 (R d 2 ) x = (x 1,..., x d ) f(x; s, k, z) = s sign(x k z) {+1, 1}, : s = ±1, k {1,..., d}, z R. : H = { f(x; s, k, z) s = ±1, k = 1,..., d, z R }. 45/57

H A = {(f(x 1 ),..., f(x n )) f H} A n A 2nd Massart s ineq. R S (H) = 1 n E σ [ sup s,k,z n ] σ i f(x i ; s, k, z) 2 log(2nd) n 46/57

(R d 2 ) l H l ( H 2 ) A l := {(f(x 1 ),..., f(x n )) {±1} n f H l }, l 1 A l (2n k d) A l 1 = 2nd A l 1 (n k : k ) k=1 = A l (2nd) l 1. R n (H l ) 2(l 1) log(2nd) n 47/57

0-1 Rademacher G = {1[f(x) y] f H}, f : X {+1, 1}. For S = {(x 1, y 1 ),..., (x n, y n )}, R S (G) = E σ [sup = 1 2 E σ f H [ 1 n sup f H n 1 n ] 1 y i f(x i ) σ i 2 ] n ( y i σ i )f(x i ) = 1 2 R S (H) 48/57

H 2 0-1 ERM f 0 H 2, L(f 0 ) = 0 L( f) = 0 1 δ L( f) 2 log(2nd) n + log(2/δ) 2n 49/57

2 Rademacher y = f 0 (x) + ε, H {f : X [ B, B]}. l(f(x), y) = 1 (y f(x))2 2 (x, y) X [ M, M] (B + M)2 0 l(f(x), y). 2 l(f, y) f = f y B + M, = l(f 1 (x), y) l(f 2 (x), y) (B + M) f 1 (x) f 2 (x). 50/57

G = {(x, y) l(f(x), y) : f H} R n (G) (B + M)R n (H) 1 δ L( f) L( f) 2(B + M)R n (H) + (B + M)2 log(2/δ) 2 2n 51/57

RKHS Rademacher k RKHS H H B = {f H f B} sup k(x, x) a 2 x X f H B f ab, R m (H B ) ab m 52/57

Proof. f = sup x f, k(x, ) f sup x k(x, x) ab. R S (H B ) = E σ [ sup f B 1 m m ] σ i f(x i ) = B [ m E m σ σ i k(x i, ) ] B m = 1 m E σ = B ( [ ] ) 1/2 k(x i, x j )E σ σi σ j m = B m i,j ( m ) 1/2 k(x i, x i ) ab. m [ sup f B ( E σ [ m m f, σ i k(x i, ) ] σ i k(x i, ) 2 ] ) 1/2 53/57

(x 1, y 1 ),..., (x m, y m ) X [ M, M] RKHS H 2 l(f(x), y) = 1 2 (f(x) y)2 min f H 1 2m m (y i f(x i )) 2 + λ 2 f 2 f 54/57

λ 2 f 2 1 2m m = f M/ λ (y i f(x i )) 2 + λ 2 f 2 1 2m m y 2 i M 2 2 H M/ λ 2 B am/ λ, R m (H M/ λ ) am mλ 55/57

L( f) L( f) sup f H M/ λ L(f) L(f) ( 2 am/ ) am λ + M mλ + ( ) 1 O λ m (λ = o(1) ) ( am/ λ + M 2 ) 2 log(1/δ) 2m 56/57

f 0 H, λ = λ m λ m 0, λ m m = L( f) L(f0 ) (in prob.) Proof. λ m 0 m f 0 H M/ λm 0 L( f) L(f 0 ) L( f) L( f) + L( f) + λ m 2 f 2 L(f 0 ) λ m 2 f 0 2 0 + L(f 0 ) L(f 0 ) + λ m 2 f 0 2 2 sup f H M/ λ m L( f) L( f) + λ m 2 f 0 2 = O ( ) 1 + λ m λ m m λ m = O(m 1/4 ) 57/57