25 Study on Effectiveness of Monte Calro Tree Search for Single-Player Games

25 Study on Effectiveness of Monte Calro Tree Search for Single-Player Games 1165065 2014 2 28

,,, i

Abstract Study on Effectiveness of Monte Calro Tree Search for Single-Player Games Norimasa NASU Currently, Monte-carlo tree search has been actively studied for two-person games. Monte-carlo tree search took much attention after the first application to the igo player. However, there are few studies on monte-carlo tree search applied to one-person games. For some one-player games we cannot perform complete-search due to their very broad search space. An efficient search algorithm is needed to solve those problems with broach search space. In this study, we apply the monte-carlo tree search to one player games and verified the effectiveness. We use two problems for experiments: the knapsack problem and generation of Sudoku problems. The performance of monte-carlo tree search is compared with that of the genetic algorithm. The experiment results show the following facts. Firstly, the monte-carlo tree search gives a rather good solution immediately after the start. Secondly, the accuracy of the solution is hard to rise. Thirdly, the reason is that many playouts are done on a node that does not give an optimal solution. key words Monte-carlo tree search, Genetic algorithm, Sudoku, Knapsack problem ii

1 1 1.1.................................. 1 1.2................................... 2 1.3................................. 2 2 3 2.1................................. 3 2.2................................. 3 2.2.1..................... 4 2.2.2 SAT....................... 5 2.3............................. 6 3 8 4 9 4.1............................. 9 4.2..................... 10 4.3.......................... 11 4.4.......................... 13 5 15 5.1...................... 15 5.1.1............................ 17 5.1.2.................................. 18 5.1.3................................ 19 iii

5.1.4............................. 20 5.2.......................... 20 5.2.1......................... 20 5.2.2............................... 20 5.3.......................... 22 5.3.1...................... 22 5.3.2............................... 22 6 24 6.1................................ 24 6.2.......................... 26 6.3...................................... 30 7 31 32 33 iv

2.1............................ 4 2.2 2.1............................. 4 2.3 SAT...................... 5 2.4.............................. 6 4.1....................... 10 4.2............. 12 4.3........... 13 5.1.............................. 16 5.2............................... 16 5.3............................. 18 5.4................................. 18 5.5 3.................................. 19 5.6................................. 19 5.7............................ 21 5.8 1.................................. 21 5.9 2.................................. 21 5.10 1.................................. 22 5.11 2.................................. 22 5.12 3.................................. 22 6.1 n = 17................... 25 6.2 n = 18................... 26 6.3 n = 200.................... 28 v

6.4 n = 250.................... 28 6.5 n = 300.................... 29 vi

3.1............................... 8 6.1............................ 25 6.2................................ 27 6.3 MCT GA............. 29 vii

1 1.1 [2] [14] [11] [14] 1

1.2 1.2 [11] ( ) [13] Samegame Schadd Takes [7] [8] 1.3 2 3 4 5 6 4 5 7 2

2 2.1 [10] 2.1 3 3 1 9 1 2 3 1 1 1 1 9 1 1 1 9 1 2.1 2.2 2.2 2.2 2.1 3

2.2 2.1 2.2 2.1 1 SAT 2.2.1 Cell-Unique x x a 1 x a Line-Unique Block-Unique a x 1 x a k-naked k x 1, x 2,..., x k k a 1, a 2,..., a k x 1, x 2,..., x k a 1, a 2,..., a k k-hidden k a 1, a 2,..., a k x 1, x 2,..., x k k x 1, x 2,..., x k a 1, a 2,..., a k 4

2.2 solve(board) { cnf <- CNFOfBaseRule cnf <- append(cnf, tocnf(board)) if (satisfiable(cnf)) { answer = findanswer(cnf) cnf2 <- append(cnf, negate(answer)) if (satisfiable(cnf2)) { return multiple else { return unique else { return noanswer 2.3 SAT 2.2.2 SAT SAT CNF CNF [5] 2.1 CNF SAT CNF CNF CNF [5] extended encoding 1 1 CNF SAT CNF no_answer CNF 1 CNF CNF SAT multiple unique 5

2.3 makeboard() { loop until board $n$ { board <- sethint(board) if(solve(board) ){ board <- board else if(solve(board) ){ return board; 2.4 2.3 1 2 1 1 1 2 1 1 2.4 sethint 1 1 solve 2.2.2 SAT solve 6

2.3 multiple no_answer unique 7

3 [3] 1 0-1 [6] m n i C i j w j v j 1 1 m = 2 C 0 = 10, C 1 = 15 3 k i C i k 0 = {0, 1, 4 k 1 = {2, 3, 5, 6 82 3.1 j 0 1 2 3 4 5 6 7 8 9 w j 5 4 3 4 5 2 3 1 4 3 v j 20 12 8 10 17 7 9 2 9 9 8

4 4.1 [2] 2 1 1 1 1 2 9

4.2 MCT(root) { loop until { leaf <- select_downwards(root) leaf.n <- leaf.n + 1 if (expand_cond(leaf)) { leaf <- expand(leaf).first_child board = playout(leaf.board) update_upwards(leaf, getvalue(board)) return select_best_child(root) 4.1 4.2 4.1 4 1. select_downwards 2. expand_cond expand 3. playout 4. evaluate update_upwards 4.1 select_downwards expand_cond expand playout evaluate update_upwards 10

4.3 UCT [4] UCT UCB1 Upper Confidence Bound [1] Multi-armed bandit problem [12] X j j n j j n c ucb j = X j + c 2 log n n j (4.1) UCB1 4.1 UCB1 UCB1 UCB1 2 1 4.1 select_downwards expand_cond expand playout evaluate update_upwards UCB1 4.3 4.2 11

4.3 playout(board) { loop until board.hint == $n$ { hint <- selecthint(board) board <- addhint(board, hint) if (solve(board ) == noanswer) { board <- deletecand(board, hint) else { board <- board return board 4.2 n 4.3 1 selecthint solve deletecand addhint 1 n solve playout solve 2.2.2 SAT n 1 2.2.1 n n c v v = n 3 c (4.2) 12

4.4 playout(items, knapsack) { loop (infinity) { items <- getitemlist(items) if (items == empty) { break loop item <- selectitem(items ) knapsack_id <- selectknapsack(knapsack) knapsack <- additem(knapsack, knapsack_id, item) return bag.value 4.3 4.4 4.2 UCB1 getitemlist) 1 (selectitem) (selectknapsack) (additem) 13

4.4 14

5 [9] 5.1 1 1 5.1 1 3 1. evaluate 2. generate_next 3. 15

5.1 Genetic_Algorithm(first) { now <- first loop until { evaluate(now) next = generate_next(now) now <- next return max(now) 5.1 generate_next(population) { next <- null loop until next parent1, parent2 <- select_parent(popuration) if( ){ child1, child2 <- crossover(parent1, parent2) if( ){ child1, child2 <- mutation(child1, child2) next.add(child1, child2) else { next.add(parent1, parent2) return next 5.2 5.2 2 select_parent 2 2 crossover 2 2 2 mutation 2 16

5.1 5.1 evaluate 5.2 select_parent crossover mutation 5.1.1 [9] i f i N i p i p i = f i N j=1 f j (5.1) N j=1 f j f i 5.3 5.3 3 17

5.1 5.3 5.4 5.1.2 2 2 [9] m 0 m 1 0 m 1 5.4 5.4 7 3 n n n m 0 m 1 18

5.1 5.5 3 5.6 0 m 1 3 5.5 5.5 7 1, 2, 4 1 5.6 5.6 7 5.1.3 19

5.2 5.1.4 5.2 5.2.1 9 9 9 9 2.1 9 9 81 5.7 81 0 9 9 0 9 10 1 2.1 [000000000 000030804 700009000 000006020 014000300 000000000 200000096 000000070 008140000] 5.2.2 5.1 2.2.1 20

5.2 5.7 5.8 1 5.9 2 5.1.2 2 5.8 5.9 5.8 5.9 17 23, 41 5.10 5.11 5.10 18 5.11 16 n n c 0 n 21

5.3 5.10 1 5.11 2 5.12 3 1 c 2 n c 5.8 5.9 7 5.12 1 1 1 5.3 5.3.1 0-1 1 1 m 0 m 1-1 3 [0 0 1 1 0 1 1 1 1 1] 5.3.2 5.1 22

5.3 0 5.1.2 50% 1 23

6 4 5 OS ubuntu 12.04 CPU Intel Core i5-2500 @ 3.30GHz 8GB Java 1.6.0 26-b03 MCT GA 6.1 9 9 n = 17, 18, 19, 20 C 1.0 30 24

6.1 6.1 n MCT GA 17 554 631 18 550 664 19 599 729 20 675 729 6.1 n = 17 30 100% 30% 1 6.1 n = 17 18 6.1 6.2 25

6.2 6.2 n = 18 n = 19, 20 n = 17 35678 554 n = 18 33411 550 17 18 6.2 n m 3 C C 0 = 4n, C 1 = 5n, C 2 = 6n j j n 10 n n 5 2n 26

6.2 6.2 n MCT GA 50 2635(-14) 2649(0) 2649 100 6316(-17) 6323(-10) 6333 150 12972(-131) 13029(-74) 13103 200 16748(-749) 17328(-169) 17497 250 22853(-905) 23511(-247) 23758 300 28149(-1585) 28575(-1159) 29734 C 1.0 100 30 50% 30% 1 n 50, 100, 150, 200, 250, 300 600 (=10 ) n 6.2 n 200, 250, 300 6.3 6.4 6.5 27

6.2 6.3 n = 200 6.2 6.2 6.4 n = 250 28

6.2 6.5 n = 300 6.3 MCT GA n ( ) MCT GA (MCT ) 50 729 2635 2608(-27) 100 2406 6316 6065(-251) 150 11419 12972 12471(-501) 200 17059 16748 16576(-172) 250 30359 22853 21353(-1500) 300 55757 28149 26325(-1824) 6.2 n 29

6.3 6.3 n 200 250 300 6.2 UCB1 UCB1 30

7 2 UCB1 31

4 2 6 32

[1] P. Auer, N. Cesa-Bianchi, and P. Fischer: Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, Vol. 47, No. 2 3, pp. 235 256, 2002. [2] C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, S. Colton: A Survey of Monte Carlo Tree Search Methods. IEEE Transactions on computional intelligence and AI in games, Vol. 4, No. 1, 2012. [3] H. Kellerer, U. Pferschy and D. Pisinger, Knapsack Problems, Springer, Berlin, 2004. [4] L. Kocsis, C. Szepesvári: Bandit Based Monte-Carlo Planning, 17th European Conference on Machine Learning (ECML 2006), Lecture Notes in Computer Science 4212, pp. 282-293 (2006). [5] I. Lynce, J. Ouaknine: Sudoku as a SAT problem. In Proceedings of 9th International Symposium on Artificial Intelligence and Mathematics, 2006. [6] S. Martello, P. Toth: Knapsack problems : algorithms and computer implementations, Wiley, Chichester, pp. 157 182, 1990. [7] M. P. D. Schadd, M. H. M. Winands, M. J. W. Tak, J. W. H. M. Uiterwijk: Singleplayer Monte-Carlo tree search for SameGame. Journal Knowledge-Based Systems, Vol. 34, pp. 3 11, Elsevier, 2011. [8] F. W. Takes and W. A. Kosters: Solving SameGame and its Chessboard Variant. In Proceedings of the 21st Benelux Conference on Artificial Intelligence (BNAIC), pp. 249 256, 2009. [9] : 1,, 1993. [10] :, pp. 107 111,, Vol.53, No.2, 33

563,, 2012. [11],,, :, pp. 2533 2543,, Vol.53, No.11, 2012. [12], : k-armed., AL,, Vol. 2008, No. 49, pp.65 70, 2008. [13], :., Vol. 2011-MPS-86, No. 31, pp. 1 4, 2011. [14] :,, Vol. 49, No. 6, pp. 686 693, 2008. 34