25 Study on Effectiveness of Monte Calro Tree Search for Single-Player Games

Similar documents
23 Study on Generation of Sudoku Problems with Fewer Clues

i

The 19th Game Programming Workshop 2014 SHOT 1,a) 2 UCT SHOT UCT SHOT UCT UCT SHOT UCT An Empirical Evaluation of the Effectiveness of the SHOT algori

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

Logistello 1) playout playout 1 5) SIMD Bitboard playout playout Bitboard Bitboard 8 8 = black white 2 2 Bitboard 2 1 6) position rev i

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

2015 3

[1] AI [2] Pac-Man Ms. Pac-Man Ms. Pac-Man Pac-Man Ms. Pac-Man IEEE AI Ms. Pac-Man AI [3] AI 2011 UCT[4] [5] 58,990 Ms. Pac-Man AI Ms. Pac-Man 921,360

soturon.dvi

Research on decision making in multi-player games with imperfect information

29 Short-time prediction of time series data for binary option trade

job-shop.dvi

2 ( ) i

4.1 % 7.5 %

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

Vol. 52 No (Dec. 2011) Ms. Pac-Man IEEE CIG Ms. Pac-Man Ms. Pac-Man AI AI Ms. Pac-Man Ms. Pac-Man Competition Ms. Pac-Man Monte

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

1 1 tf-idf tf-idf i

先端社会研究 ★5★号/4.山崎

(a) Picking up of six components (b) Picking up of three simultaneously. components simultaneously. Fig. 2 An example of the simultaneous pickup. 6 /

21 Quantum calculator simulator based on reversible operation

,.,.,,.,. X Y..,,., [1].,,,.,,.. HCI,,,,,,, i

IT i

Web Web Web Web Web, i

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

Web Web Web Web 1 1,,,,,, Web, Web - i -

Bulletin of JSSAC(2014) Vol. 20, No. 2, pp (Received 2013/11/27 Revised 2014/3/27 Accepted 2014/5/26) It is known that some of number puzzles ca

19 Systematization of Problem Solving Strategy in High School Mathematics for Improving Metacognitive Ability

28 Horizontal angle correction using straight line detection in an equirectangular image

29 jjencode JavaScript

johnny-paper2nd.dvi

23 A Comparison of Flick and Ring Document Scrolling in Touch-based Mobile Phones

Sobel Canny i

,,,,., C Java,,.,,.,., ,,.,, i

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

Web Web Web Web i

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

P2P P2P peer peer P2P peer P2P peer P2P i

fx-9860G Manager PLUS_J



27 1 NP NP-completeness of Picross 3D without segment-information and that with height of one

2017 (413812)

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

1,a) 1,b) TUBSTAP TUBSTAP Offering New Benchmark Maps for Turn Based Strategy Game Tomihiro Kimura 1,a) Kokolo Ikeda 1,b) Abstract: Tsume-shogi and Ts

GPGPU

ODA NGO NGO JICA JICA NGO JICA JBIC SCP

220 28;29) 30 35) 26;27) % 8.0% 9 36) 8) 14) 37) O O 13 2 E S % % 2 6 1fl 2fl 3fl 3 4

Web Basic Web SAS-2 Web SAS-2 i

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

27 VR Effects of the position of viewpoint on self body in VR environment

( )

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

fiš„v5.dvi

01ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐02ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐03ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐04ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐05ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐06ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六

25 D Effects of viewpoints of head mounted wearable 3D display on human task performance

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

21 Key Exchange method for portable terminal with direct input by user

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

IPSJ SIG Technical Report Vol.2016-GI-35 No /3/9 StarCraft AI Deep Q-Network StarCraft: BroodWar Blizzard Entertainment AI Competition AI Convo

1 DHT Fig. 1 Example of DHT 2 Successor Fig. 2 Example of Successor 2.1 Distributed Hash Table key key value O(1) DHT DHT 1 DHT 1 ID key ID IP value D

149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :

NotePC 8 10cd=m 2 965cd=m Note-PC Weber L,M,S { i {

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf


1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

卒業論文2.dvi

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

Print

(1) i NGO ii (2) 112

人工知能学会研究会資料 SIG-KBS-B Analysis of Voting Behavior in One Night Werewolf 1 2 Ema Nishizaki 1 Tomonobu Ozaki Graduate School of Integrated B

Run-Based Trieから構成される 決定木の枝刈り法

58 10

OS Windows Vista Windows XP PowerPoint2003 Word2003 (a Test No. OS 1 Windows Vista PPT Windows Vista Word Windows XP PPT Windows XP

) ,

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a


THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE {s-kasihr, wakamiya,

2 1 ( ) 2 ( ) i

修士論文

26 Development of Learning Support System for Fixation of Basketball Shoot Form

IT,, i

,,.,.,,.,.,.,.,,.,..,,,, i

i

ii

SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

fiš„v8.dvi


2

untitled

i

B_01田中.indd

Wi-Fi Wi-Fi Wi-Fi Wi-Fi SAS SAS-2 Wi-Fi i

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

21 e-learning Development of Real-time Learner Detection System for e-learning

Vol.1 No Autumn

Transcription:

25 Study on Effectiveness of Monte Calro Tree Search for Single-Player Games 1165065 2014 2 28

,,, i

Abstract Study on Effectiveness of Monte Calro Tree Search for Single-Player Games Norimasa NASU Currently, Monte-carlo tree search has been actively studied for two-person games. Monte-carlo tree search took much attention after the first application to the igo player. However, there are few studies on monte-carlo tree search applied to one-person games. For some one-player games we cannot perform complete-search due to their very broad search space. An efficient search algorithm is needed to solve those problems with broach search space. In this study, we apply the monte-carlo tree search to one player games and verified the effectiveness. We use two problems for experiments: the knapsack problem and generation of Sudoku problems. The performance of monte-carlo tree search is compared with that of the genetic algorithm. The experiment results show the following facts. Firstly, the monte-carlo tree search gives a rather good solution immediately after the start. Secondly, the accuracy of the solution is hard to rise. Thirdly, the reason is that many playouts are done on a node that does not give an optimal solution. key words Monte-carlo tree search, Genetic algorithm, Sudoku, Knapsack problem ii

1 1 1.1.................................. 1 1.2................................... 2 1.3................................. 2 2 3 2.1................................. 3 2.2................................. 3 2.2.1..................... 4 2.2.2 SAT....................... 5 2.3............................. 6 3 8 4 9 4.1............................. 9 4.2..................... 10 4.3.......................... 11 4.4.......................... 13 5 15 5.1...................... 15 5.1.1............................ 17 5.1.2.................................. 18 5.1.3................................ 19 iii

5.1.4............................. 20 5.2.......................... 20 5.2.1......................... 20 5.2.2............................... 20 5.3.......................... 22 5.3.1...................... 22 5.3.2............................... 22 6 24 6.1................................ 24 6.2.......................... 26 6.3...................................... 30 7 31 32 33 iv

2.1............................ 4 2.2 2.1............................. 4 2.3 SAT...................... 5 2.4.............................. 6 4.1....................... 10 4.2............. 12 4.3........... 13 5.1.............................. 16 5.2............................... 16 5.3............................. 18 5.4................................. 18 5.5 3.................................. 19 5.6................................. 19 5.7............................ 21 5.8 1.................................. 21 5.9 2.................................. 21 5.10 1.................................. 22 5.11 2.................................. 22 5.12 3.................................. 22 6.1 n = 17................... 25 6.2 n = 18................... 26 6.3 n = 200.................... 28 v

6.4 n = 250.................... 28 6.5 n = 300.................... 29 vi

3.1............................... 8 6.1............................ 25 6.2................................ 27 6.3 MCT GA............. 29 vii

1 1.1 [2] [14] [11] [14] 1

1.2 1.2 [11] ( ) [13] Samegame Schadd Takes [7] [8] 1.3 2 3 4 5 6 4 5 7 2

2 2.1 [10] 2.1 3 3 1 9 1 2 3 1 1 1 1 9 1 1 1 9 1 2.1 2.2 2.2 2.2 2.1 3

2.2 2.1 2.2 2.1 1 SAT 2.2.1 Cell-Unique x x a 1 x a Line-Unique Block-Unique a x 1 x a k-naked k x 1, x 2,..., x k k a 1, a 2,..., a k x 1, x 2,..., x k a 1, a 2,..., a k k-hidden k a 1, a 2,..., a k x 1, x 2,..., x k k x 1, x 2,..., x k a 1, a 2,..., a k 4

2.2 solve(board) { cnf <- CNFOfBaseRule cnf <- append(cnf, tocnf(board)) if (satisfiable(cnf)) { answer = findanswer(cnf) cnf2 <- append(cnf, negate(answer)) if (satisfiable(cnf2)) { return multiple else { return unique else { return noanswer 2.3 SAT 2.2.2 SAT SAT CNF CNF [5] 2.1 CNF SAT CNF CNF CNF [5] extended encoding 1 1 CNF SAT CNF no_answer CNF 1 CNF CNF SAT multiple unique 5

2.3 makeboard() { loop until board $n$ { board <- sethint(board) if(solve(board) ){ board <- board else if(solve(board) ){ return board; 2.4 2.3 1 2 1 1 1 2 1 1 2.4 sethint 1 1 solve 2.2.2 SAT solve 6

2.3 multiple no_answer unique 7

3 [3] 1 0-1 [6] m n i C i j w j v j 1 1 m = 2 C 0 = 10, C 1 = 15 3 k i C i k 0 = {0, 1, 4 k 1 = {2, 3, 5, 6 82 3.1 j 0 1 2 3 4 5 6 7 8 9 w j 5 4 3 4 5 2 3 1 4 3 v j 20 12 8 10 17 7 9 2 9 9 8

4 4.1 [2] 2 1 1 1 1 2 9

4.2 MCT(root) { loop until { leaf <- select_downwards(root) leaf.n <- leaf.n + 1 if (expand_cond(leaf)) { leaf <- expand(leaf).first_child board = playout(leaf.board) update_upwards(leaf, getvalue(board)) return select_best_child(root) 4.1 4.2 4.1 4 1. select_downwards 2. expand_cond expand 3. playout 4. evaluate update_upwards 4.1 select_downwards expand_cond expand playout evaluate update_upwards 10

4.3 UCT [4] UCT UCB1 Upper Confidence Bound [1] Multi-armed bandit problem [12] X j j n j j n c ucb j = X j + c 2 log n n j (4.1) UCB1 4.1 UCB1 UCB1 UCB1 2 1 4.1 select_downwards expand_cond expand playout evaluate update_upwards UCB1 4.3 4.2 11

4.3 playout(board) { loop until board.hint == $n$ { hint <- selecthint(board) board <- addhint(board, hint) if (solve(board ) == noanswer) { board <- deletecand(board, hint) else { board <- board return board 4.2 n 4.3 1 selecthint solve deletecand addhint 1 n solve playout solve 2.2.2 SAT n 1 2.2.1 n n c v v = n 3 c (4.2) 12

4.4 playout(items, knapsack) { loop (infinity) { items <- getitemlist(items) if (items == empty) { break loop item <- selectitem(items ) knapsack_id <- selectknapsack(knapsack) knapsack <- additem(knapsack, knapsack_id, item) return bag.value 4.3 4.4 4.2 UCB1 getitemlist) 1 (selectitem) (selectknapsack) (additem) 13

4.4 14

5 [9] 5.1 1 1 5.1 1 3 1. evaluate 2. generate_next 3. 15

5.1 Genetic_Algorithm(first) { now <- first loop until { evaluate(now) next = generate_next(now) now <- next return max(now) 5.1 generate_next(population) { next <- null loop until next parent1, parent2 <- select_parent(popuration) if( ){ child1, child2 <- crossover(parent1, parent2) if( ){ child1, child2 <- mutation(child1, child2) next.add(child1, child2) else { next.add(parent1, parent2) return next 5.2 5.2 2 select_parent 2 2 crossover 2 2 2 mutation 2 16

5.1 5.1 evaluate 5.2 select_parent crossover mutation 5.1.1 [9] i f i N i p i p i = f i N j=1 f j (5.1) N j=1 f j f i 5.3 5.3 3 17

5.1 5.3 5.4 5.1.2 2 2 [9] m 0 m 1 0 m 1 5.4 5.4 7 3 n n n m 0 m 1 18

5.1 5.5 3 5.6 0 m 1 3 5.5 5.5 7 1, 2, 4 1 5.6 5.6 7 5.1.3 19

5.2 5.1.4 5.2 5.2.1 9 9 9 9 2.1 9 9 81 5.7 81 0 9 9 0 9 10 1 2.1 [000000000 000030804 700009000 000006020 014000300 000000000 200000096 000000070 008140000] 5.2.2 5.1 2.2.1 20

5.2 5.7 5.8 1 5.9 2 5.1.2 2 5.8 5.9 5.8 5.9 17 23, 41 5.10 5.11 5.10 18 5.11 16 n n c 0 n 21

5.3 5.10 1 5.11 2 5.12 3 1 c 2 n c 5.8 5.9 7 5.12 1 1 1 5.3 5.3.1 0-1 1 1 m 0 m 1-1 3 [0 0 1 1 0 1 1 1 1 1] 5.3.2 5.1 22

5.3 0 5.1.2 50% 1 23

6 4 5 OS ubuntu 12.04 CPU Intel Core i5-2500 @ 3.30GHz 8GB Java 1.6.0 26-b03 MCT GA 6.1 9 9 n = 17, 18, 19, 20 C 1.0 30 24

6.1 6.1 n MCT GA 17 554 631 18 550 664 19 599 729 20 675 729 6.1 n = 17 30 100% 30% 1 6.1 n = 17 18 6.1 6.2 25

6.2 6.2 n = 18 n = 19, 20 n = 17 35678 554 n = 18 33411 550 17 18 6.2 n m 3 C C 0 = 4n, C 1 = 5n, C 2 = 6n j j n 10 n n 5 2n 26

6.2 6.2 n MCT GA 50 2635(-14) 2649(0) 2649 100 6316(-17) 6323(-10) 6333 150 12972(-131) 13029(-74) 13103 200 16748(-749) 17328(-169) 17497 250 22853(-905) 23511(-247) 23758 300 28149(-1585) 28575(-1159) 29734 C 1.0 100 30 50% 30% 1 n 50, 100, 150, 200, 250, 300 600 (=10 ) n 6.2 n 200, 250, 300 6.3 6.4 6.5 27

6.2 6.3 n = 200 6.2 6.2 6.4 n = 250 28

6.2 6.5 n = 300 6.3 MCT GA n ( ) MCT GA (MCT ) 50 729 2635 2608(-27) 100 2406 6316 6065(-251) 150 11419 12972 12471(-501) 200 17059 16748 16576(-172) 250 30359 22853 21353(-1500) 300 55757 28149 26325(-1824) 6.2 n 29

6.3 6.3 n 200 250 300 6.2 UCB1 UCB1 30

7 2 UCB1 31

4 2 6 32

[1] P. Auer, N. Cesa-Bianchi, and P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, No. 2 3, pp. 235 256, 2002. [2] C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, S. Colton: A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on computional intelligence and AI in games, Vol. 4, No. 1, 2012. [3] H. Kellerer, U. Pferschy and D. Pisinger, Knapsack Problems, Springer, Berlin, 2004. [4] L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning, 17th European Conference on Machine Learning (ECML 2006), Lecture Notes in Computer Science 4212, pp. 282-293 (2006). [5] I. Lynce, J. Ouaknine: Sudoku as a SAT problem. In Proceedings of 9th International Symposium on Artificial Intelligence and Mathematics, 2006. [6] S. Martello, P. Toth: Knapsack problems : algorithms and computer implementations, Wiley, Chichester, pp. 157 182, 1990. [7] M. P. D. Schadd, M. H. M. Winands, M. J. W. Tak, J. W. H. M. Uiterwijk: Singleplayer Monte-Carlo tree search for SameGame. Journal Knowledge-Based Systems, Vol. 34, pp. 3 11, Elsevier, 2011. [8] F. W. Takes and W. A. Kosters: Solving SameGame and its Chessboard Variant. In Proceedings of the 21st Benelux Conference on Artificial Intelligence (BNAIC), pp. 249 256, 2009. [9] : 1,, 1993. [10] :, pp. 107 111,, Vol.53, No.2, 33

563,, 2012. [11],,, :, pp. 2533 2543,, Vol.53, No.11, 2012. [12], : k-armed., AL,, Vol. 2008, No. 49, pp.65 70, 2008. [13], :., Vol. 2011-MPS-86, No. 31, pp. 1 4, 2011. [14] :,, Vol. 49, No. 6, pp. 686 693, 2008. 34