Mastering the Game of Go without Human Knowledge ( ) AI 3 1 AI 1 rev.1 (2017/11/26) 1 6 2

Similar documents
知能科学:ニューラルネットワーク

2017 (413812)

IPSJ SIG Technical Report Vol.2016-GI-35 No /3/9 StarCraft AI Deep Q-Network StarCraft: BroodWar Blizzard Entertainment AI Competition AI Convo

表_pdf用

1-12


<91E F18E7396AF8CF68A4A8D758DC0837C E815B2E706466>


F0238_h1_h4

全集’.PDF

サイボウズ ガルーン 3 管理者マニュアル

P indd

85


1

制御盤BASIC Vol.3

altus_storage_guide

Microsoft Word - ランチョンプレゼンテーション詳細.doc

2013 5

1000

sikepuri.dvi

PDF


(4) ω t(x) = 1 ω min Ω ( (I C (y))) min 0 < ω < C A C = 1 (5) ω (5) t transmission map tmap 1 4(a) t 4(a) t tmap RGB 2 (a) RGB (A), (B), (C)

: BV15005

untitled

JA2008

東海道新幹線でDS


プログラム


平成20年5月 協会創立50年の歩み 海の安全と環境保全を目指して 友國八郎 海上保安庁 長官 岩崎貞二 日本船主協会 会長 前川弘幸 JF全国漁業協同組合連合会 代表理事会長 服部郁弘 日本船長協会 会長 森本靖之 日本船舶機関士協会 会長 大内博文 航海訓練所 練習船船長 竹本孝弘 第二管区海上保安本部長 梅田宜弘

Program

aphp37-11_プロ1/ky869543540410005590


日本内科学会雑誌第96巻第11号

Œ{ٶ/1ŒÊ −ªfiª„¾ [ 1…y†[…W ]


RX501NC_LTE Mobile Router取説.indb

VOLTA TENSOR コアで 高速かつ高精度に DL モデルをトレーニングする方法 成瀬彰, シニアデベロッパーテクノロジーエンジニア, 2017/12/12

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution


Microsoft PowerPoint _秀英体の取組み素材(予稿集).ppt

AI技術を活用して戦略的優位性を構築する――Using AI to Create Advantage

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

IPSJ SIG Technical Report Vol.2013-CVIM-187 No /5/30 1,a) 1,b), 1,,,,,,, (DNN),,,, 2 (CNN),, 1.,,,,,,,,,,,,,,,,,, [1], [6], [7], [12], [13]., [


広報しもつけp01ol

ONPRESS190

Transcription:

6 2 6.1........................................... 3 6.2....................... 5 6.2.1........................... 5 6.2.2........................... 9 6.2.3................. 11 6.3....................... 13 6.3.1................... 13 6.3.2...................... 14 6.3.3................. 17 6.4............................. 18 6.4.1.......................... 19 6.4.2................................ 23 6.4.3?................ 24 6.4.4................................... 26 6.4.5........................ 27 6.5................................... 29 6.6............................................ 31 1

6 2017 10 19 Mastering the Game of Go without Human Knowledge ( ) AI 3 1 AI 1 rev.1 (2017/11/26) 1 6 2

6.1 ( ) AI 6.1 3 1 (MEMO ) 2 3 AI 6.2 6.3 6.4 MEMO: 19 19 ( ) -1.0 1.0 1.0 100%, -1.0 0% 3

6.1: 3 4

6.2 2017 (MEMO ) dual MEMO: 2016 2017 2017 10 2017 2017 Mastering the Game of Go without Human Knowledge ( ), (David Silver, et al., Nature, 2017) 2016 1 2016 Mastering the game of go with deep neural networks and tree search ( ) (David Silver, et al., Nature, 2016) 6.2.1 6.2 : 17 1 : 3 3 256 ReLU 2 39 : 19 2 3 3 256 ReLU 2 : 2 2017 19 39 5

1 : 1 1 2 ReLU 2 : 362 : 362 (361 ) : 1 : 1 1 1 ReLU 2 : 256 ReLU 3 : 1 tanh : 1 (-1.0 1.0 +1.0-1.0 ) 6.2: 6

48 ( 6.3(a)) 17 ( 6.3(b)) ( 6.3(a)) n(=1 7) ( 6.3(b)) n 6.3: (a) 48 (b) 17 7

(MEMO ) 2017 p v 2 2 MEMO: (ResNet) (MEMO ) 6.4(a) 19 ( 6.4(b)) 3 3 256 (3x3 Conv 256) (Bn) ReLU (ReLU) 2 3 MEMO: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep residual learning for image recognition. Computer Vision and Pattern Recognition (CVPR), 2016. 3 6.4(a) ReLU 8

3x3 39 : 19 19 3 3 256 256 ( :39) = 83 : 3 3 256 256 ( :39) = 2500 192 SL 5 6.4: 6.2.2 s k π k, z k M {(s k, π k, z k )} M k=1 9

π s a A a a 1 0 A π = {π a } A a=1 A 361 362 z ( +1, 1) θ f θ (s) s a p(s, a) v(s) (p, v) (π, z) L (p, v) = f θ (s) (6.1) L θ = M { A } (z k v k ) 2 πa k log p k a + c θi 2 (6.2) k=1 (z k v k ) 2 z v A a=1 πa k log p k a π = {π a} A a=1 p = {p a} A a=1 i θi 2 θ (weight decay) c 2017 c = 10 4 L θ θ L θ θ a=1 i θ θ α θ (6.3) θ = L θ (6.4) θ α α SGD 3000 60.4% 57.0% 6.4 6.5 Chainer Chainer 10

6.5: Chainer 6.2.3 11

12

6.3 (MCTS) MCTS (MEMO ) MEMO: MCTS UCB ( + ) (Selection) ( ) (Evaluation) (Backup) (Expansion) MCTS 6.3.1 MCTS MCTS 1 1 1 1 13

6.6: (a) (Q(s, a)+ u(s, a)) a (b) p, v (c) 6.3.2 6.7 Step 1 3 14

Step 1( ) Step 1 s 4 Q(s, a) + u(s, a) a Q(s, a) = W (s, a) N(s, a) u(s, a) = c puct p(s, a) b N(s, b) 1 + N(s, a) (6.5) (6.6) Q(s, a) u(s, a) a p(s, a) a b N(s,b) 1+N(s,a) u(s, a) p(s, a) a c puct Q(s, a) u(s, a) Q(s, a) Step 2( ) Step 2 s f θ, p(s, a), v(s ) n(= 40) 1 Step 3( ) Step 3 s W (s, a) N(s, a) N(s, a) = N(s, a) + 1 (6.7) W (s, a) = W (s, a) + v(s) (6.8) 1 N(s, a), W (s, a) s a W (s, a)/n(s, a) s a MCTS W (s, a)/n(s, a) 5 4 5 MCTS MCTS 15

Step 4( ) Step 1 3 N ( 1600 ) (Step 4) 6.7: 1 16

6.3.3 1 TPU 4000 / MCTS 1 6 MCTS 6 B* 17

6.4 6.8 ( )(MEMO ) θ MEMO: ( ) AI AI 1 1 AI 1 AI AI AI AI? SL RL (MEMO ) MEMO: Q ( ) 3.4 18

6.8: 6.4.1 6.9,, 3 ( )θ f θ ( z π) θ f θ θ f θ f θ θ θ θ Step 1 θ Step 3 θ (Step 4) θ θ 19

7 6.9 2.5 7 20

6.9: f θ 21

6.3 1 0.4 1600 MCTS ( 6.7 Step 4) 30 30 8 5% 9 1 s a N(s, a) z 50 A = 2048 6.2 π z π 6.2 1 1 100% MCTS p 0-1 π 0-1 π a MCTS N(s, a) π a = N(s, a)1/τ (6.9) b N(s, a) 1/τ N(s, a) τ τ = 0 N(s, a) a 100% MCTS τ = 1 N(s, a) a τ 8 9 22

θ θ 1,000 f θ f θ 400 ( ) 1 1600 f θ f θ 220 θ θ 10 6.4.2 3 2017 1 1600 0.4 490 1 150 0.4( / ) 150( / ) 490 ( ) 2.9 (6.10) 3400 9.3 10 3 1000 1600 0.4 2016 1 1 1 6.2 5 ( ) 5 1600 5.0( / ) 5 1600( ) = 40 (6.11) 0.4 100? TPU 4 TPU GPU 30 30 4 100 10 ( ) 98% 23

TPU 4 1000 3 CPU 1? GPU CPU 20 TPU GPU 30 4 1000 3( ) 20( ) 30( ) 4( ) 1000( ) = 720 (6.12) 1.97 2 3 6.4.3? f θ θ? (MEMO ) MEMO: 2 ( ) 1 100% 6.10(a) s f θ v z θ ( 6.10(a-1)) f θ s v v ( 6.10(a-2)) z new z new ( 6.10(a-3)) z new 24

θ 6.10(b) s f θ p N(s, a) π θ ( 6.10(b-1)) p = f θ ( 6.10(b-2)) N(s, a) π new π new ( 6.10(b-3)) θ? 25

6.10: 6.4.4 2017 6.11 (MEMO ) 3500 24 3000 36 26

( AlphaGo Lee) 72 4500 MEMO: 100 64% 1200 1400 1400 1800 1800 2000 2017 5 2800 2900 2400 1 6.11: 2017 6.4.5 AI AI AI ( ) 2017 27

AI AI (MEMO ) AI MEMO:, Webpage: http://yaneuraou.yaneu.com/2017/06/12/ (Last access: 2017/11/4) 28

6.5 2017 2017 4 4 1 4 (AlphaGoFan) (AlphaGoLee) (AlphaGoMaster) (AlphaGoZero) 11 6.12: (a) ( 2017 ) (b) 11 6.2 19 39 80 29

6.12(a) 4 6.12(b) AlphaGoFan, AlphaGoLee AlphaGoMaster, AlphaGoZero 4TPU 1 AlphaGoZero 5185 4000 1000 1 30

6.6 AI 3 1 3? 3 4TPU 1000 CPU 1 PC 20,000 3? MCTS π, v 1? 1 4TPU CPU2400 GPU120 AI AI 31