The 19th Game Programming Workshop 2014 SHOT 1,a) 2 UCT SHOT UCT SHOT UCT UCT SHOT UCT An Empirical Evaluation of the Effectiveness of the SHOT algori

Similar documents
Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

Logistello 1) playout playout 1 5) SIMD Bitboard playout playout Bitboard Bitboard 8 8 = black white 2 2 Bitboard 2 1 6) position rev i

p-9-10.eps

AI

[1] AI [2] Pac-Man Ms. Pac-Man Ms. Pac-Man Pac-Man Ms. Pac-Man IEEE AI Ms. Pac-Man AI [3] AI 2011 UCT[4] [5] 58,990 Ms. Pac-Man AI Ms. Pac-Man 921,360

2 ( ) i

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

25 Study on Effectiveness of Monte Calro Tree Search for Single-Player Games

Fig. 2 28th Ryuou Tournament, Match 5, 59th move. The last move is Black s Rx5f. 1 Tic-Tac-Toe Fig. 1 AsearchtreeofTic-Tac-Toe. [2] [3], [4]

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

コンピュータ 囲 碁 に 起 きた 革 命 2008 年 3 月 末 パリ 囲 碁 トーナメントのエキシビショ ンでプロ 対 コンピュータの 対 戦 が 実 現 ( プロ:タラヌ カタリン 五 段 ( 日 本 棋 院 中 部 総 本 部 所

23 Study on Generation of Sudoku Problems with Fewer Clues

COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

2015 3

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

1,a) 1,b) TUBSTAP TUBSTAP Offering New Benchmark Maps for Turn Based Strategy Game Tomihiro Kimura 1,a) Kokolo Ikeda 1,b) Abstract: Tsume-shogi and Ts

Vol. 52 No (Dec. 2011) Ms. Pac-Man IEEE CIG Ms. Pac-Man Ms. Pac-Man AI AI Ms. Pac-Man Ms. Pac-Man Competition Ms. Pac-Man Monte

先端社会研究 ★5★号/4.山崎

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

JAIST Reposi Title 少数の記録からプレイヤの価値観を機械学習するチー ムプレイ AI の構成 Author(s) 和田, 堯之 ; 佐藤, 直之 ; 池田, 心 Citation 研究報告ゲーム情報学 (GI), 2015-GI-33(5): 1-

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

soturon.dvi

1 DHT Fig. 1 Example of DHT 2 Successor Fig. 2 Example of Successor 2.1 Distributed Hash Table key key value O(1) DHT DHT 1 DHT 1 ID key ID IP value D

4.1 % 7.5 %

新製品開発プロジェクトの評価手法

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

Kyushu Communication Studies 第2号

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

Web Web Web Web 1 1,,,,,, Web, Web - i -

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

Research on decision making in multi-player games with imperfect information

28 Horizontal angle correction using straight line detection in an equirectangular image

( )

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

3_39.dvi

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

Web Web Web Web Web, i

,,,,., C Java,,.,,.,., ,,.,, i

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

日本感性工学会論文誌

( ) fnirs ( ) An analysis of the brain activity during playing video games: comparing master with not master Shingo Hattahara, 1 Nobuto Fuji

大学における原価計算教育の現状と課題

,,,, : - i -

DEIM Forum 2009 B4-6, Str

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

1_26.dvi

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

7,, i

IPSJ SIG Technical Report Vol.2010-SLDM-144 No.50 Vol.2010-EMB-16 No.50 Vol.2010-MBL-53 No.50 Vol.2010-UBI-25 No /3/27 Twitter IME Twitte

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna


IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

百人一首かるた選手の競技時の脳の情報処理に関する研究

知能と情報, Vol.30, No.5, pp

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

2017 (413812)

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

floodgate 15 Nine- DayFever XeonE c(NDF) gpsfish XeonX c 11 * [6]

58 10

21 Key Exchange method for portable terminal with direct input by user

1 Web Web 1,,,, Web, Web : - i -

Bull. of Nippon Sport Sci. Univ. 47 (2) ) 1) 1) 1) 2) 1) 1) 2) The study of university and technical school freshmen in judo

On the Wireless Beam of Short Electric Waves. (VII) (A New Electric Wave Projector.) By S. UDA, Member (Tohoku Imperial University.) Abstract. A new e

..,,...,..,...,,.,....,,,.,.,,.,.,,,.,.,.,.,,.,,,.,,,,.,,, Becker., Becker,,,,,, Becker,.,,,,.,,.,.,,

12 DCT A Data-Driven Implementation of Shape Adaptive DCT


大学論集第42号本文.indb

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

pp

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

i


IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

: ( 1) () 1. ( 1) 2. ( 1) 3. ( 2)

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

p _08森.qxd

:- Ofer Feldman,Feldman : -

Instability of Aerostatic Journal Bearings with Porous Floating Bush at High Speeds Masaaki MIYATAKE *4, Shigeka YOSHIMOTO, Tomoaki CHIBA and Akira CH

01ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐02ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐03ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐04ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐05ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐06ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六

Ł\1,4.ai

A Navigation Algorithm for Avoidance of Moving and Stationary Obstacles for Mobile Robot Masaaki TOMITA*3 and Motoji YAMAMOTO Department of Production

kut-paper-template.dvi

3_23.dvi

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ SIG Technical Report Vol.2014-CE-123 No /2/8 Bebras 1,a) Bebras,,, Evaluation and Possibility of the Questions for Bebras Contest Abs

untitled

16.16%

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N


Transcription:

SHOT 1,a) 2 UCT SHOT UCT SHOT UCT UCT SHOT UCT An Empirical Evaluation of the Effectiveness of the SHOT algorithm in Go and Gobang Masahiro Honjo 1,a) Yoshimasa Tsuruoka 2 Abstract: Today, UCT is the most widely used Monte-Carlo Tree Search (MCTS) method in games like Go. Recently, a new MCTS method called SHOT has been proposed and compared to UCT in some games. In this paper, we compare the performance of SHOT and UCT in the game of Go and Gobang. Experimental results suggest that SHOT is inferior to UCT when many playouts are performed, and that SHOT is superior to UCT when there are many legal moves in comparison with the number of playouts. We have also conducted experiments with composed Go problems and found that SHOT can perform poorly at situations that require a deep search. 1. 1 Department of Information and Communication Engineering, The University of Tokyo 2 Department of Electrical Engineering and Information Systems, Graduate School of Engineering, The University of Tokyo a) honjo@logos.t.u-tokyo.ac.jp MCTS (Monte-Carlo Tree Search) CrazyStone [5] MCTS 2008 MCTS UCT (Upper Confidence Bound applied to Trees) Mogo [6] UCT 19 UCT MCTS SHOT (Sequential Halving applied to Trees) Cazenave [4] UCT Multi-Armed Bandit UCB1 SHOT Sequential Halving [7] - 17 -

Sequential Halving UCB UCB Sequential Halving Arm [4] SHOT UCT Nogo SHOT UCT UCT Nogo UCT SHOT UCT SHOT UCT 2 3 4 2. 2.1 Regret 2.1.1 K X j T 2.1.2 Cumulative Regret Cumulative Regret [2], [8] T n R n = (X X t ) (1) t=0 R n n Cumulative Regret X K X t t 2.1.3 Simple Regret K K X j T K 2.1.1 T T Simple Regret [3] (2) r n = X X S(n) (2) r n n Simple Regret X K X S(n) n 2.1.2 Cumulative Regret 2.1.3 Simple Regret Regret Regret [9] 2.2 Regret Regret Regret Regret Cumulative Regret Simple Regret 2.2.1 Sequential Halving Sequential Halving UCB1 2.1.3 Simple Regret Sequential Halving Algorithm2.1 Algorithm2.1 [7] 0 1-18 -

2 SHOT 1 Sequential Halving 1 Sequential Halving Sequential Halving K 10 K K K/2 Algorithm 2.1 Sequential Halving 1: IMPUT : 2: T total budget, 3: K arms {Arm 1, Arm 2, Arm 3,..., Arm K} 4: OUTPUT: 5: Sequential Halving Arm 6: S 0 {Arm 1, Arm 2, Arm3,..., Arm K} 7: k 0 8: B T 9: while S k 1 do 10: n k max(1, [ T S k [log 2 K] ]) 11: for i = 0 to S k do 12: for j = 0 to n k do 13: S k Arm i 1 14: Arm i 15: B B 1 16: if B = 0 then 17: break WHILE loop 18: 19: end for end i loop 20: end for end j loop 21: S k+1 S k [ Sk ] Arm 2 22: k k + 1 Max k = [log 2 K] 1 23: end while 24: return S k Arm 1 0 ( ) K 2.2.2 SHOT SHOT Sequential Halving MCTS SHOT Algorithm2.2 [4] 2 SHOT SHOT MCTS Sequential Halving Sequential Halving ( ) SHOT 1 1 Sequential Halving 1 Sequential Halving 1 3. 3.1 Sequentail Halving UCB Multi-Armed Bandit Sequential Halving UCB - 19 -

Algorithm 2.2 SHOT struct NODE{ INT possible moves num CHILD child[possible moves num] INT budgetused MOVE move } struct CHILD{ INT num games FLOAT rate NODE my node } Shot(INT budget, NODE node){ budgetused 0 if board is terminal then result CheckWin() return result and update rate and games if budget = 1 then result Playout() return result and update rate and games if possible moves num = 1 then Play(child.move) Shot(budget, child.my move) UnPlay() return result and update rate and games if the c.move with 0 playout exist then for all the c.move with 0 playout do result Playout() update child.games, child.rate, and budgetused if budgetused budgetused then return result and update rate and games end for S child[ ] sort child in S according to their chile.rate b 0 while S > 1 do budgetused+budget b b + max(1, [ S [log 2 possible moves num] ] for child in S by decreasing child.rate do if child.games < b then b1 b child.games if at root S = 2 child is the first in S then b1 budget budgetused (b NEXT child.games) b1 min(b1, budget budgetused) Play(child.move) Shot(b1, child.my node) UnPlay() update budgetused, child.rate and child.games break if budgetused budget end for S the [ S ] child from S with best child.rate 2 break if budgetused budget end while update budgetuset, child.rate and child.games return The child with best child.rate of S UCB Sequential Halving UCB Sequential Halving n(n = 2, 4, 8, 16,..., 1024) x j (j = 1, 2, 3,..., n) x j P (x j ) P (x j ) = 0.1 + 0.8 j 1 n 1 1 () 20,000 UCB, Sequential Halving 1,000 UCB SHOT Sequential Halving 0 1 3 4 4 UCB Sequential Halving 1 3 UCB Sequential Halving 128 UCB, Sequential Halving Sequential Halving UCB 128 Sequential Halving UCB 3.2 SHOT UCT MCTS SHOT SHOT UCT SHOT UCT 10,000 20,000 30,000 40,000 50,000 10 10 20 20 2-20 -

3 1 4 1 1,000 (500 ) SHOT SHOT UCT SHOT 10,000 20,000 50,000 100,000 UCT 10 10 20 20 2 1 1,000 (500 ) SHOT 2 5 *1 *2 *3 SHOT UCT ( ) UCT C 0.41 3.2 SHOT UCT *1 *2 *3 2 2 [4] Nogo 3.3 3.2 10 10 SHOT UCT 10 10 20 20 UCT 10 10 SHOT [4] Nogo 1 SHOT UCT 2 10 10 UCT SHOT SHOT [%] 10,000 60.9 20,000 54.2 30,000 49.9 40,000 45.9 50,000 42.9 UCT SHOT SHOT [ / ] UCT [ / ] SHOT [%] 10 10 10,000 0.072 4,900 0.071 85.6 10 10 20,000 0.142 9,200 0.142 78.4 10 10 50,000 0.343 21,500 0.345 67.4 10 10 100,000 0.686 41,000 0.682 45.3 20 20 10,000 0.216 4,900 0.232 92.1 20 20 20,000 0.434 9,200 0.440 93.7 20 20 50,000 1.078 21,500 1.048 95.8 20 20 100,000 2.146 41,000 2.080 95.4-21 -

UCT UCT SHOT UCT SHOT Sequential Halving UCT SHOT UCT 41,000 UCT 100,000 150,000 200,000 300,000 SHOT 1,000 60,000 80,000 120,000 UCT 1,000 3 UCT 41,000 1.5 2.0 3.0 SHOT SHOT 400, 000 500, 000 49.9 49.3% SHOT UCT 3.3 SHOT UCT UCT SHOT MCTS UCT MCTS SHOT UCT [10], [11] UCT SHOT 500 1,000 2,000 5,000 10,000 9 9 *4 6 100 (50 ) SHOT SHOT UCT 4 3.2 SHOT UCT [4] Nogo 3.2 1 10,000 1 2.3 10 5 [ /1 ] 3.2 10 4 [ /1 ] 10 [4] Nogo UCT SHOT ( ) Nogo 1 4 9 9 SHOT UCT 3 10 10 UCT41,000 [%] 100,000 47.7 SHOT 150,000 44.2 200,000 47.9 300,000 47.6 60,000 63.4 UCT 80,000 68.9 120,000 70.7 (SHOT) [ / ] (UCT) [ / ] SHOT [%] 5,00 0.144 0.145 14 44 29 1,000 0.296 0.298 24 68 46 2,000 0.596 0.600 38 62 50 5,000 1.486 1.511 26 54 40 10,000 3.002 3.034 26 36 31 *4-22 -

SHOT UCT SHOT 2 SHOT UCT UCT SHOT UCT SHOT SHOT UCT 3.2 SHOT UCT 5,000 10,000 50,000 UCT SHOT *5 Web [1] 15 *6 5 9 9 *5 *6 (1) 1 (2) 4 (3) 2 1 (4) 2 2 (5) 2 4 (6) 2 5 (7) 2 7 (8) 2 8 (9) 3 8 (10) 1 2 (11) 1 3 (12) 32 3 (13) 32 6 (14) 32 1 (15) 27 1 5 (1) ( ) 0 5 3.3 3.3 5 5 1 4 5 (1) (9) (12) (13) (15) UCT SHOT UCT 4. UCT SHOT SHOT 1 SHOT UCT SHOT UCT UCT - 23 -

5 UCT SHOT (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12) (13) (14) (15) 5,000 UCT 10,000 50,000 5,000 SHOT 10,000 50,000 SHOT UCT SHOT UCT SHOT UCT UCT [1]. http://www.h-eba.com/ tsumego/japan/j310.html. [2] Andrew G. Barto and Richard S. Sutton. Reinforcement Learning. Mit Press, 2012. [3] S Budeck, Munos. R., and G. Stolts. Pure exploration in finitely-armed and continuous-armed bandits. Theoretival Computer Science, pp. 1832 1852, 2010. [4] T. Cazenave. Sequential halving applied to trees. IEEE Transactions on Computational Intelligence and AI in Games, Vol. PP-99,, 2014. [5] Remi Coulom. Computing elo ratings of move patterns in the game of go. IGGA Journal, Vol. 30, pp. 198 208, 2007. [6] Syvain Gelly, Y. Wang, R. Munos, and O. Teytaud. Modification of uct with patterns in monte-carlo go,technical report pr-6062. INRIA, 2006. [7] Z. Karnin, T. Koren, and O. Somekh. Almost optimal exploration in multi-armed bandits. Proc of the int. Conf. on Math. Learn, pp. 1238 1246, 2013. [8] T.L. Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied Mathmatics, Vol. 6, pp. 4 22, 1985. [9] D. Tolpin and S. Shimony. Mcts based on simple regret. Proc., pp. 570 576, 2012. [10].. http://www.yss-aya.com/ book2011/. [11],.., 2012. - 24 -