Mastering the Game of Go without Human Knowledge ( ) AI 3 1 AI 1 rev.1 (2017/11/26) 1 6 2

Size: px

Start display at page:

Download "Mastering the Game of Go without Human Knowledge ( ) AI 3 1 AI 1 rev.1 (2017/11/26) 1 6 2"

さやことじ
7 years ago
Views:

1 ?

2 Mastering the Game of Go without Human Knowledge ( ) AI 3 1 AI 1 rev.1 (2017/11/26) 1 6 2

3 6.1 ( ) AI (MEMO ) 2 3 AI MEMO: ( ) %, % 3

4 6.1: 3 4

5 (MEMO ) dual MEMO: Mastering the Game of Go without Human Knowledge ( ), (David Silver, et al., Nature, 2017) Mastering the game of go with deep neural networks and tree search ( ) (David Silver, et al., Nature, 2016) : 17 1 : ReLU 2 39 : ReLU 2 :

1 : 1 1 2 ReLU 2 : 362 : 362 (361 ) : 1 : 1 1 1 ReLU

6 1 : ReLU 2 : 362 : 362 (361 ) : 1 : ReLU 2 : 256 ReLU 3 : 1 tanh : 1 ( ) 6.2: 6

7 48 ( 6.3(a)) 17 ( 6.3(b)) ( 6.3(a)) n(=1 7) ( 6.3(b)) n 6.3: (a) 48 (b) 17 7

8 (MEMO ) 2017 p v 2 2 MEMO: (ResNet) (MEMO ) 6.4(a) 19 ( 6.4(b)) (3x3 Conv 256) (Bn) ReLU (ReLU) 2 3 MEMO: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep residual learning for image recognition. Computer Vision and Pattern Recognition (CVPR), (a) ReLU 8

9 3x3 39 : ( :39) = 83 : ( :39) = SL 5 6.4: s k π k, z k M {(s k, π k, z k )} M k=1 9

10 π s a A a a 1 0 A π = {π a } A a=1 A z ( +1, 1) θ f θ (s) s a p(s, a) v(s) (p, v) (π, z) L (p, v) = f θ (s) (6.1) L θ = M { A } (z k v k ) 2 πa k log p k a + c θi 2 (6.2) k=1 (z k v k ) 2 z v A a=1 πa k log p k a π = {π a} A a=1 p = {p a} A a=1 i θi 2 θ (weight decay) c 2017 c = 10 4 L θ θ L θ θ a=1 i θ θ α θ (6.3) θ = L θ (6.4) θ α α SGD % 57.0% Chainer Chainer 10

11 6.5: Chainer

12 12

13 6.3 (MCTS) MCTS (MEMO ) MEMO: MCTS UCB ( + ) (Selection) ( ) (Evaluation) (Backup) (Expansion) MCTS MCTS MCTS

14 6.6: (a) (Q(s, a)+ u(s, a)) a (b) p, v (c) Step

15 Step 1( ) Step 1 s 4 Q(s, a) + u(s, a) a Q(s, a) = W (s, a) N(s, a) u(s, a) = c puct p(s, a) b N(s, b) 1 + N(s, a) (6.5) (6.6) Q(s, a) u(s, a) a p(s, a) a b N(s,b) 1+N(s,a) u(s, a) p(s, a) a c puct Q(s, a) u(s, a) Q(s, a) Step 2( ) Step 2 s f θ, p(s, a), v(s ) n(= 40) 1 Step 3( ) Step 3 s W (s, a) N(s, a) N(s, a) = N(s, a) + 1 (6.7) W (s, a) = W (s, a) + v(s) (6.8) 1 N(s, a), W (s, a) s a W (s, a)/n(s, a) s a MCTS W (s, a)/n(s, a) MCTS MCTS 15

16 Step 4( ) Step 1 3 N ( 1600 ) (Step 4) 6.7: 1 16

17 TPU 4000 / MCTS 1 6 MCTS 6 B* 17

18 ( )(MEMO ) θ MEMO: ( ) AI AI 1 1 AI 1 AI AI AI AI? SL RL (MEMO ) MEMO: Q ( )

19 6.8: ,, 3 ( )θ f θ ( z π) θ f θ θ f θ f θ θ θ θ Step 1 θ Step 3 θ (Step 4) θ θ 19

21 6.9: f θ 21

22 MCTS ( 6.7 Step 4) % 9 1 s a N(s, a) z 50 A = π z π % MCTS p 0-1 π 0-1 π a MCTS N(s, a) π a = N(s, a)1/τ (6.9) b N(s, a) 1/τ N(s, a) τ τ = 0 N(s, a) a 100% MCTS τ = 1 N(s, a) a τ

23 θ θ 1,000 f θ f θ 400 ( ) f θ f θ 220 θ θ ( / ) 150( / ) 490 ( ) 2.9 (6.10) ( ) ( / ) ( ) = 40 (6.11) ? TPU 4 TPU GPU ( ) 98% 23

24 TPU CPU 1? GPU CPU 20 TPU GPU ( ) 20( ) 30( ) 4( ) 1000( ) = 720 (6.12) ? f θ θ? (MEMO ) MEMO: 2 ( ) 1 100% 6.10(a) s f θ v z θ ( 6.10(a-1)) f θ s v v ( 6.10(a-2)) z new z new ( 6.10(a-3)) z new 24

25 θ 6.10(b) s f θ p N(s, a) π θ ( 6.10(b-1)) p = f θ ( 6.10(b-2)) N(s, a) π new π new ( 6.10(b-3)) θ? 25

26 6.10: (MEMO )

27 ( AlphaGo Lee) MEMO: % : AI AI AI ( )

28 AI AI (MEMO ) AI MEMO:, Webpage: (Last access: 2017/11/4) 28

6.5 2017 2017 4 4 1 4 (AlphaGoFan) (AlphaGoLee)

29 (AlphaGoFan) (AlphaGoLee) (AlphaGoMaster) (AlphaGoZero) : (a) ( 2017 ) (b)

30 6.12(a) (b) AlphaGoFan, AlphaGoLee AlphaGoMaster, AlphaGoZero 4TPU 1 AlphaGoZero

31 6.6 AI 3 1 3? 3 4TPU 1000 CPU 1 PC 20,000 3? MCTS π, v 1? 1 4TPU CPU2400 GPU120 AI AI 31

知能科学：ニューラルネットワーク

知能科学：ニューラルネットワーク 2 3 4 (Neural Network) (Deep Learning) (Deep Learning) ( x x = ax + b x x x ? x x x w σ b = σ(wx + b) x w b w b .2.8.6 σ(x) = + e x.4.2 -.2 - -5 5 x w x2 w2 σ x3 w3 b = σ(w x + w 2 x 2 + w 3 x 3 + b) x,