x T = (x 1,, x M ) x T x M K C 1,, C K 22 x w y 1: 2 2

Similar documents
Kobe University Repository : Kernel タイトル Title 著者 Author(s) 掲載誌 巻号 ページ Citation 刊行日 Issue date 資源タイプ Resource Type 版区分 Resource Version 権利 Rights DOI


「スウェーデン企業におけるワーク・ライフ・バランス調査 」報告書

& 3 3 ' ' (., (Pixel), (Light Intensity) (Random Variable). (Joint Probability). V., V = {,,, V }. i x i x = (x, x,, x V ) T. x i i (State Variable),

9 8 7 (x-1.0)*(x-1.0) *(x-1.0) (a) f(a) (b) f(a) Figure 1: f(a) a =1.0 (1) a 1.0 f(1.0)


医系の統計入門第 2 版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 第 2 版 1 刷発行時のものです.

…p…^†[…fiflF”¯ Pattern Recognition

研究シリーズ第40号

dvi

untitled

*2.5mm ”ŒŠá‡ÆfiÁ™¥‡Ì…Z†[…t…X…N…−†[…j…fi…O

03.Œk’ì

ばらつき抑制のための確率最適制御

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

カルマンフィルターによるベータ推定( )

わが国企業による資金調達方法の選択問題

ALAGIN (SVM)

ohpmain.dvi

Stepwise Chow Test * Chow Test Chow Test Stepwise Chow Test Stepwise Chow Test Stepwise Chow Test Riddell Riddell first step second step sub-step Step

On the Limited Sample Effect of the Optimum Classifier by Bayesian Approach he Case of Independent Sample Size for Each Class Xuexian HA, etsushi WAKA

untitled

,, Poisson 3 3. t t y,, y n Nµ, σ 2 y i µ + ɛ i ɛ i N0, σ 2 E[y i ] µ * i y i x i y i α + βx i + ɛ i ɛ i N0, σ 2, α, β *3 y i E[y i ] α + βx i

通信容量制約を考慮したフィードバック制御 - 電子情報通信学会 情報理論研究会(IT) 若手研究者のための講演会

Kalman ( ) 1) (Kalman filter) ( ) t y 0,, y t x ˆx 3) 10) t x Y [y 0,, y ] ) x ( > ) ˆx (prediction) ) x ( ) ˆx (filtering) )

untitled

seminar0220a.dvi

2 1,2, , 2 ( ) (1) (2) (3) (4) Cameron and Trivedi(1998) , (1987) (1982) Agresti(2003)

三石貴志.indd

橡表紙参照.PDF

?

01.Œk’ì/“²fi¡*

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)


1 4 1 ( ) ( ) ( ) ( ) () 1 4 2

(2) Fisher α (α) α Fisher α ( α) 0 Levi Civita (1) ( 1) e m (e) (m) ([1], [2], [13]) Poincaré e m Poincaré e m Kähler-like 2 Kähler-like

第10章 アイソパラメトリック要素

(a) (b) (c) Canny (d) 1 ( x α, y α ) 3 (x α, y α ) (a) A 2 + B 2 + C 2 + D 2 + E 2 + F 2 = 1 (3) u ξ α u (A, B, C, D, E, F ) (4) ξ α (x 2 α, 2x α y α,

untitled

Evaluation of a SATOYAMA Forest Using a Voluntary Labor Supply Curve Version: c 2003 Taku Terawaki, Akio Muranaka URL: http

130 Oct Radial Basis Function RBF Efficient Market Hypothesis Fama ) 4) 1 Fig. 1 Utility function. 2 Fig. 2 Value function. (1) (2)

1 Tokyo Daily Rainfall (mm) Days (mm)

bag-of-words bag-of-keypoints Web bagof-keypoints Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 84

, 1.,,,.,., (Lin, 1955).,.,.,.,. f, 2,. main.tex 2011/08/13( )

Mantel-Haenszelの方法

untitled

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

untitled

Autumn II III Zon and Muysken 2005 Zon and Muysken 2005 IV II 障害者への所得移転の経済効果 分析に用いるデータ

ver.1 / c /(13)

第5章 偏微分方程式の境界値問題

鉄鋼協会プレゼン

基礎数学I

main.dvi

X X X Y R Y R Y R MCAR MAR MNAR Figure 1: MCAR, MAR, MNAR Y R X 1.2 Missing At Random (MAR) MAR MCAR MCAR Y X X Y MCAR 2 1 R X Y Table 1 3 IQ MCAR Y I


要旨 1. 始めに PCA 2. 不偏分散, 分散, 共分散 N N 49

(Compton Scattering) Beaming 1 exp [i (k x ωt)] k λ k = 2π/λ ω = 2πν k = ω/c k x ωt ( ω ) k α c, k k x ωt η αβ k α x β diag( + ++) x β = (ct, x) O O x

IS1-09 第 回画像センシングシンポジウム, 横浜,14 年 6 月 2 Hough Forest Hough Forest[6] Random Forest( [5]) Random Forest Hough Forest Hough Forest 2.1 Hough Forest 1 2.2

SO(2)

微分積分 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

fiš„v3.dvi

( ) (, ) arxiv: hgm OpenXM search. d n A = (a ij ). A i a i Z d, Z d. i a ij > 0. β N 0 A = N 0 a N 0 a n Z A (β; p) = Au=β,u N n 0 A

Support Vector Machine (SVM) 4 SVM SVM 2 80% 100% SVM SVM SVM 4 SVM 2 2 SVM 4

y = x x R = 0. 9, R = σ $ = y x w = x y x x w = x y α ε = + β + x x x y α ε = + β + γ x + x x x x' = / x y' = y/ x y' =


untitled

スケーリング理論とはなにか? - --尺度を変えて見えること--

Microsoft Word doc

φ 4 Minimal subtraction scheme 2-loop ε 2008 (University of Tokyo) (Atsuo Kuniba) version 21/Apr/ Formulas Γ( n + ɛ) = ( 1)n (1 n! ɛ + ψ(n + 1)

SAMA- SUKU-RU Contents p-adic families of Eisenstein series (modular form) Hecke Eisenstein Eisenstein p T

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

Microsoft PowerPoint - SSII_harada pptx

2 (March 13, 2010) N Λ a = i,j=1 x i ( d (a) i,j x j ), Λ h = N i,j=1 x i ( d (h) i,j x j ) B a B h B a = N i,j=1 ν i d (a) i,j, B h = x j N i,j=1 ν i

all.dvi

all.dvi

A11 (1993,1994) 29 A12 (1994) 29 A13 Trefethen and Bau Numerical Linear Algebra (1997) 29 A14 (1999) 30 A15 (2003) 30 A16 (2004) 30 A17 (2007) 30 A18

土木学会論文集 D3( 土木計画学 ), Vol. 71, No. 2, 31-43,

163 KdV KP Lax pair L, B L L L 1/2 W 1 LW = ( / x W t 1, t 2, t 3, ψ t n ψ/ t n = B nψ (KdV B n = L n/2 KP B n = L n KdV KP Lax W Lax τ KP L ψ τ τ Cha

[1] SBS [2] SBS Random Forests[3] Random Forests ii

ISCO自動コーディングシステムの分類精度向上に向けて―SSM およびJGSS データセットによる実験の結果―

n (1.6) i j=1 1 n a ij x j = b i (1.7) (1.7) (1.4) (1.5) (1.4) (1.7) u, v, w ε x, ε y, ε x, γ yz, γ zx, γ xy (1.8) ε x = u x ε y = v y ε z = w z γ yz


TOP URL 1

Twist knot orbifold Chern-Simons

Trapezoidal Rule θ = 1/ x n x n 1 t = 1 [f(t n 1, x n 1 ) + f(t n, x n )] (6) 1. dx dt = f(t, x), x(t 0) = x 0 (7) t [t 0, t 1 ] f t [t 0, t 1 ], x x

Duality in Bayesian prediction and its implication

Isogai, T., Building a dynamic correlation network for fat-tailed financial asset returns, Applied Network Science (7):-24, 206,


L Y L( ) Y0.15Y 0.03L 0.01L 6% L=(10.15)Y 108.5Y 6%1 Y y p L ( 19 ) [1990] [1988] 1


it-ken_open.key

Vol.8 No (July 2015) 2/ [3] stratification / *1 2 J-REIT *2 *1 *2 J-REIT % J-REIT J-REIT 6 J-REIT J-REIT 10 J-REIT *3 J-

QMII_10.dvi

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

ヒストリカル法によるバリュー・アット・リスクの計測:市場価格変動の非定常性への実務的対応

b3e2003.dvi

2016 Institute of Statistical Research

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

Title 最適年金の理論 Author(s) 藤井, 隆雄 ; 林, 史明 ; 入谷, 純 ; 小黒, 一正 Citation Issue Date Type Technical Report Text Version publisher URL

Transcription:

Takio Kurita Neurosceince Research Institute, National Institute of Advanced Indastrial Science and Technology takio-kurita@aistgojp (Support Vector Machine, SVM) 1 (Support Vector Machine, SVM) ( ) 2 2 (Support Vector Machine, SVM) 1960 Vapnik Optimal Separating Hyperplane 1990 1

2 21 1 x T = (x 1,, x M ) x T x M K C 1,, C K 22 x w y 1: 2 2

1 y = sign(w T x h) (1) 2 w h sign(u) u > 0 1 u 0 1 1 1 2 2 C 1,C 2 1 1 N x 1,, x N t 1,, t N H2 H1 2: ( 1-1 ) t i (w T x i h) 1, i = 1,, N (2) H1: w T x h = 1 H2: w T x h = 1 2 2 1 w w h t i (w T x i h) 1, (i = 1,, N) (3) 3

L(w) = 1 2 w 2 (4) Lagrange α i ( 0), i = 1,, N L(w, h, α) = 1 2 w 2 α i {t i (w T x i h) 1} (5) w h w = 0 = α i t i x i (6) α i t i (7) α i t i = 0 (8) L D (α) = α i 1 2 0 α i, i = 1,, N (9) α i α j t i t j x T i x j (10) Lagrange α i ( 0), i = 1,, N αi 0 α i > 0 x i w T x h = 1 w T x h = 1 αi 0 x i αi (i 0) w w = αi t i x i (11) i S i, S h w T x h = 1 w T x h = 1 x s, s S h = w T x s t s (12) αi (i 0) y = sign(w T x h ) = sign( i S α i t i x T i x h ) (13) 4

αi = 0 α i > 0 23 H2 H1 3: ( 1-1 ) 1 w 3 H1 H2 ξ i ( 0) ξ i w ξ i w (14) ξ i 0, t i (w T x i h) 1 ξ i, (i = 1,, N) (15) L(w, ξ) = 1 2 w 2 + γ ξ i (16) 5

γ 2 Lagrange α i ν i L(w, h, α, ν) = 1 2 w 2 + γ ξ i α i {t i (w T x i h) (1 ξ i )} ν i ξ i (17) w h ξ i 0 w = 0 = α i t i x i (18) α i t i (19) α i = γ ν i (20) α i t i = 0 (21) L D (α) = α i 1 2 0 α i γ, i = 1,, N (22) α i α j t i t j x T i x j (23) αi H1 H2 ( ) H1 H2 αi α i = 0 H1 H2 0 < αi < γ H1 H2 αi = γ ξ i 0 H1 H2 i, 24 6

x ϕ(x) ϕ L D ϕ(x 1 ) ϕ(x 2 ) ϕ(x 1 ) T ϕ(x 2 ) = K(x 1, x 2 ) (24) x 1 x 2 ϕ(x 1 ) ϕ(x 2 ) K(x 1, x 2 ) K K K(x 1, x 2 ) = (1 + x T 1 x 2 ) p (25) Gauss ( x1 x 2 2 ) K(x 1, x 2 ) = exp 2σ 2 (26) K(x 1, x 2 ) = tanh ( ax T 1 x 2 b ) (27) (10) (23) L D L D (α) = = α i 1 α i α j t i t j ϕ(x i ) T ϕ(x j ) 2 i, α i 1 α i α j t i t j K(x i, x j ) (28) 2 i, (13) y = sign(w T ϕ(x) h ) = sign( i S α i t i ϕ(x i ) T ϕ(x) h ) 7

= sign( i S α i t i K(x i, x) h ) (29) Gauss Radial Basis Function (RBF) Kernel x w y 4: 25 CV Gauss σ 8

multinomial logit model 3 31 Rosenblatt x = (x 1,, x M ) T y y = f(η) η = w T x h = w T x (30) w i i h w = (h, w 1,, w M ) T x = ( 1, x 1,, x M ) T f Rosenblatt { 1 if η 0 f(η) = sign(η) = 0 otherwise f(η) = (31) f(η) = η (32) exp(η) 1 + exp(η) (33) 9

0 1 Threshold 08 06 04 02-4 -2 0 2 4 (a) 1 08 Linear 06 04 02 0-02 -04-06 -08-1 -4-2 0 2 4 (b) 1 09 Logistic 08 07 06 05 04 03 02 01 0-4 -2 0 2 4 (c) 5: 32 ( ) Rosenblatt 0 33 2 N {(x i, t i ) i = 1,, N} x i ( ) t i 10

ε 2 emp = (t i y i ) 2 = ε 2 emp(i) (34) ( ) w ε 2 emp ε 2 emp w j ε 2 emp w j = h 2(t i y i )x ij = 2δ i x ij (35) ε 2 emp h N = 2(t i y i )( 1) = 2δ i ( 1) (36) δ i = (t i y i ) w j w j + α( δ i x ij ) (37) h h + α( δ i ( 1)) (38) α (learning rate) Widrow-Hoff (Widrow-Hoff learning rule) t i y i δ i (delta rule) Widrow-Hoff N (M + 1) X = ( x 1,, x N ) T N t = (t 1,, t N ) T 2 ε 2 emp = w 0 (t i y i ) 2 = t X w 2 (39) ε 2 emp w = XT (t X w) = 0 (40) (X T X) w w = (X T X) 1 X T t (41) (multiple regression analysis) x (explanatory variable) t (criterion variable) 11

34 (multiple regression analysis) (shrinkage method) (regularization method) (ridge regression) (1) x Forward stepwise selection Backward stepwise selection Forward stepwise selection 1 1 Backward stepwise selection 1 2 resampling leave-one-out leave-one-out N N 1 12

1 N 1 1 N jackknife [9, 10] bootstrap [11, 12, 13] resampling resampling 2 F AIC(An Information Theoretical Criterion)[14, 15] Rissanen MDL(Minimum Description Length)[16, 17] ( ) AIC MDL AIC J AIC = 2( ) + 2J (42) MDL Rissanen (Minimal Discription Length) MDL = ( ) + N 2 log J (43) AIC MDL (2) Shrinkage (rigde regression) ε 2 emp = (t i y i ) 2 = (t i ( w j x ij h)) 2 (44) w j wj 2 (45) Q(w, h) = (t i ( w j x ij h)) 2 + λ wj 2 (46) 13

λ 2 λ = 0 Q(w, h) h 0 Q(w, h) h h = 2N( t h = t + w j x j + h) = 0 (47) w j x j (48) t = 1 N N t i x j = 1 N N x ij Q(w, h) Q(w) = + λ {(t i t) w j (x ij x j )} 2 w 2 j = ( t Xw) T ( t Xw) + λw T w (49) t X (t i t) (x ij x j ) w w Q(w) w = 2 X t + 2( X T Xw + λi)w = 0 (50) w = ( X T X + λi) 1 XT t (51) X T X λ X T X 35 {(x i, u i ) i = 1,, N} u i 0 1 2 x y x u 1 L = N y u i i (1 y i ) (1 ui) (52) 14

( ) l = = = {u i log y i + (1 u i ) log(1 y i )} {u i log{ exp(η i) 1 + exp(η i ) } 1 +(1 u i ) log{ 1 + exp(η i ) }} {u i η i log{1 + exp(η i )}} (53) w w j l w j = (u i y i )x ij = δ i x ij (54) δ i = (u i y i ) h l N h = (u i y i )( 1) = δ i ( 1) (55) w j w j + α( δ i x ij ) (56) h h + α( δ i ( 1)) (57) Widrow-Hoff (Widrow-Hoff learning rule) y i Fisher y θ 1,, θ M f(y, θ 1,, θ M ) ( ) 2 F ij = E log f(y, θ 1,, θ M ) θ i θ j Fisher F = [F ij ] Fisher Fisher Fisher (53) 2 2 2 l w k w j = (58) ω i x ik x ij (59) 15

025 omega 02 015 01 005 0 0 02 04 06 08 1 6: ω p ω i = y i (1 y i ) 1 2 l = δ i x i = X T δ, (60) 2 l = ω i x i x T i = X T W X X T = [ x 1,, x N ], W = diag(ω 1,, ω N ) δ = (δ 1,, δ N ) T w Fisher Hessian F = E( 2 l) = X T W X (61) { x i } ω i ω i 6 0 1 05 Fisher (53) Fisher [18] Hessian Fisher Fisher Hessian Fisher w δ w w = w + δ w (62) δ w F δ w = l (63) (62) F F w = F w + F δ w = F w + l (64) F w F w = X T W η (65) 16

η = (η 1,, η N ) T w w = F 1 (F w + l) = (X T W X) 1 (X T W η + X T δ) δ = (δ 1,, δ N ) T = (X T W X) 1 X T W (η + W 1 δ) (66) X η + W 1 δ 0 w = 0 W = 1 4 I, η = 0 δ = u 1 2 1 (66) w 0 w 0 = 4(X T X) 1 X T (u 1 1) (67) 2 t 1 2 1 36 resampling (1) Weight Decay 2 Q( w) = l + λ = w 2 j {log{1 + exp(η i )} u i η i } +λ wj 2 (68) Q( w) w j Q = l + 2λw j w j w j 17

= (u i y i )x ij + 2λw j (69) Q( w) h Q h = l h = (u i y i )( 1) (70) Weight Decay w j w j + α( (u i y i )x ij ) 2αλw j (71) h h + α( (u i y i )( 1)) (72) w j 2 w j 0 37 (16) L(w, ξ) = = ξ i + λ w 2 j [1 t i η i ] + + λ wj 2 (73) [x] + x 7 (a) [1 x] + 1 t i η i 1 ( H1 H2 ) 0 1 1 2 t i 0 1 Q = (1 t i η i ) 2 + λ wj 2 (74) 1 2 1 (1 x) 2 7 (b) t i η i 1 t i η i 1 t i η i 1 1 2 18

0 3 (1-x)*(x<1) 25 2 15 1 05-3 -2-1 0 1 2 3 (a) 3 (1-x)*(1-x) 25 2 15 1 05 0-3 -2-1 0 1 2 3 (b) ( ) 3 log(1+exp(-x)) 25 2 15 1 05 0-3 -2-1 0 1 2 3 (c) (Weight Decay) 7: 19

Weight Decay u i (0, 1) t i ( 1, 1) Q = log{1 + exp(t i η i )} + λ wj 2 (75) 1 2 1 log{1 + exp(t i η i )} 7 (c) 1 t i η i = 1 (2 ) t i η i 1 3 2 1 2 2 4 [1] VNVapnik, Statistical Learning Theory, John Wiley & Sons (1998) [2],,,, No444, pp52-58 (2000) [3], -,, Vol42, No7, pp676-683 (2001) [4] BScholkopf, CJCBurges, AJSmola, Advances in Kernel Methods - Support Vector Learning, The MIT Press, 1999 [5] NCristianini, JS-Taylor, An Introduction to Support Vector Machines and other kernel-based learning methods, Cambridge University Press, 2000 [6] THastie, RTibshirani, JFriedman, The Elements of Statistical Learning - Data Mining, Inference, and Prediction, Springer-Verlag, 2001 [7] RODuda, PEHart, DGStork, Pattern Classification (Second Edition), John Wiley & Sons, 2001 [8] KRMuller, SMika, GRatsch, KTsuda, BScholkopf, An introduction to kernel-based learning algorithms, IEEE Trans On Neural Networks, Vol12, No2, pp181-201, 2001 [9] Miller,RG,(1974): The jacknife -a review, Biometrika, Vol61, No1, pp1-15 [10] Stone,M,(1974): Cross-validatory choice and assessment of statistic al predictions, Journal of Royal Statistical Society, VolB36, pp111-147 20

[11] Efron,B,(1979): Bootstrap methods: anothoe look at the jackknife, The Annals of Statistics, Vol7, No1, pp1-26 [12] Efron,B,(1983): Estimating the error rate of a prediction rule: imp rovements in cross-validation, Journal of Amerian Statistical Association, Vol 78, pp316-331 [13] Efron,B,(1985): The bootstrap method for assessing statistical accu racy, Behaviormetrika, Vol17, pp1-35 [14] Akaike,H,(1974): A new look at the statistical model identification, IEEE Trans on Automatic Control, volac-19, No6, pp716-723 [15],,,(1983):, [16] Rissanen,J,(1983): A universal prior for integers and estimation by minimum description length, The Annals of Statistics, Vol11, NO2, pp416-431 [17] Rissanen,J,(1986): Stochastic complexity and modeling, The Annals of Statistics, Vol14, No3, pp1080-1100 [18] PMcCullagh, and JANelder FRS, Generalized Linear Models, Chapman and Hall, 1989 21