TD 2048 TD 1 N N 2048 N TD N N N N N N 2048 N 2048 TD 2048 TD TD TD 2048 TD 2048 minimax 2048, 2048, TD, N i

Similar documents
28 Horizontal angle correction using straight line detection in an equirectangular image

2 ( ) i

soturon.dvi

28 TCG SURF Card recognition using SURF in TCG play video

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

,,.,.,,.,.,.,.,,.,..,,,, i

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

25 Removal of the fricative sounds that occur in the electronic stethoscope

7,, i

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

,,,,., C Java,,.,,.,., ,,.,, i

2 1 ( ) 2 ( ) i

2017 (413812)

untitled

i

2

[1] AI [2] Pac-Man Ms. Pac-Man Ms. Pac-Man Pac-Man Ms. Pac-Man IEEE AI Ms. Pac-Man AI [3] AI 2011 UCT[4] [5] 58,990 Ms. Pac-Man AI Ms. Pac-Man 921,360

58 10

, IT.,.,..,.. i

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

202

kut-paper-template.dvi

Web Web Web Web Web, i

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

29 jjencode JavaScript

i

21 Key Exchange method for portable terminal with direct input by user

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

ron.dvi

24 Depth scaling of binocular stereopsis by observer s own movements

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

<4D F736F F F696E74202D C835B B E B8CDD8AB B83685D>

SC-85X2取説


Print

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2


21 Effects of background stimuli by changing speed color matching color stimulus

kut-paper-template.dvi

1 1 tf-idf tf-idf i

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)


〈論文〉興行データベースから「古典芸能」の定義を考える


Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System


GPGPU

16_.....E...._.I.v2006

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

<95DB8C9288E397C389C88A E696E6462>

25 D Effects of viewpoints of head mounted wearable 3D display on human task performance

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

(1) i NGO ii (2) 112

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

2 10 The Bulletin of Meiji University of Integrative Medicine 1,2 II 1 Web PubMed elbow pain baseball elbow little leaguer s elbow acupun


II

これわかWord2010_第1部_ indd

パワポカバー入稿用.indd

これでわかるAccess2010

Transcription:

28 2048 2048 TD Computer Players Based on TD Learning for Game 2048 and Its Two-player Variant

2048 2048 TD 2048 TD 1 N N 2048 N TD N N N N N N 2048 N 2048 TD 2048 TD TD TD 2048 TD 2048 minimax 2048, 2048, TD, N i

Abstract Computer Players Based on TD Learning for Game 2048 and Its Two-player Variant Kazuto Oka One of the factors that affects the strength of a computer player based on TD learning in game 2048 is N-tuple networks used for the player. But it is unclear whether the N-tuple networks used so far are adequate. In 2048, if good N-tuple networks can be used, the strength of the computer player based on TD learning using this can be improved. However, it is difficult to investigate all the performance of N- tuple networks which can be enormous. Therefore, I propose systematic selection of N- tuple networks from ordering of N-tuples using own usefulness calculated by exhaustive analysis. In addition, I consider the ordering of the obtained N-tuple and discuss the characteristics of N-tuple with high usefulness and the property of 2048. I investigate the performance of the player using N-tuple networks selected by the proposed method. Also, by applying the player based on TD learning at 2048, I make a computer player based on TD learning in two-player 2048. I improve the player s strength through TD learning in self-game-play. As the baseline for TD learning, I use a player based on TD learning at 2048. I make a stronger player by combining the player of Battle type 2048 with the minimax method and compare the strength by playing against the player using the existing method. I combine the two-player 2048 player with the minimax method. I compare the strength of the two-player 2048 player with the strength of the player using the existing technique by buttle. ii

key words 2048, two-player 2048, TD learning, N-tuple networks iii

1 1 1.1.................................. 1 1.2.................................. 3 1.3.................................. 3 1.4................................. 4 2 6 2.1.................................... 6 2.1.1 2048........................... 6 2.1.2 2048....................... 7 2.2 TD.................... 7 2.2.1................... 7 2.2.2 2048 N.......... 7 2.2.3 2048 TD. 9 2.2.4 2048 TD................................. 9 2.2.5 TD................... 10 3 N 12 3.1 N............................. 12 3.2 N........................ 13 4 N 16 4.1 N....................... 16 4.2 N........................ 20 iv

5 2048 26 5.1............................ 26 5.1.1............................ 26 5.1.2......................... 27 5.2...................... 29 5.3................................... 31 5.3.1......................... 32 5.3.2......................... 34 6 36 7 37 7.1 2048........................ 37 7.2 2048.................... 38 39 40 A N 42 A.1 16 6........... 43 A.2 16 7........... 44 v

2.1.................. 8 2.2 3 2 k 854.0............ 9 2.3 Szubert Jaśkowski 2 4 2 6 [12]. 10 2.4 Wu 4 6 [4]..................... 10 3.1 2 N....................... 14 4.1 6.............................. 17 4.2 7.............................. 17 4.3 6....................... 20 4.4 7....................... 20 4.5 6....................... 21 4.6 7....................... 21 4.7 6 7........................ 22 4.8 6 7........................ 22 4.9............................. 23 5.1 [9]6 (1) N............................... 27 5.2 7 (3 ) minimax 1 minimax/expectimax.................................... 32 vi

5.3 N ( m = 8) 5 (2 ) minimax 1 minimax/expectimax.................................... 32 5.4 7 (3 ) minimax 1 minimax/expectimax...................................... 35 5.5 N m = 4 5 (2 ) minimax 1 minimax/expectimax...................................... 35 vii

3.1 N................................ 13 3.2 N......................... 14 3.3 1 32bit N 1.................................... 15 4.1 4 6 7...... 18 4.2.............................. 19 4.3........................ 25 5.1 1 2048 TD............ 27 5.2 (1): 6.... 29 5.3 (2):... 30 5.4 (3):.................................... 31 5.5 : 5 (2 ) minimax 1............ 31 viii

1 1.1 αβ A [11] 2048 2014 G. Cirulli 1 [2] 4 4 2048 2048 2 2048 2048 2015 GPCC 1 [14] 1 2016 2048 2048 2048 1 GPCC Games and Puzzles Competitions on Computers URL=http://hp.vector.co.jp/authors/VA003988/gpcc/gpcc.htm 1

1.1 1 2 3 0 2048 1 2048 2 2048 [1] TD [12] TD 2048 TD 1 2048 TD 2

1.2 1.2 2048 Szubert TD [12] N N N Wu TD [4] N Szubert N N 1.1 2048 TD TD 1.3 1.2 N N TD N N 3

1.4 2048 TD 2048 TD 2048 TD 2048 TD 5 N N N 7 10 234,136 504,660 expectimax 12 88,000 / N TD 2048 TD TD 1.4 2 2048 2048 TD TD 3 N N 4 4

1.4 N N N 5 2048 TD 6 7 5

2 2.1 2.1.1 2048 2048 4 4 16 2 4 2 2 1. 2. 2 4 1 90% 2 10% 4 2 2 1 3 2 4 2 6

2.2 TD 2.1.2 2048 2048 2048 2 2 4 1 1 2048 2048 2.2 TD Szubert 2048 TD TD 2.2.1 90 180 270 2.1 8 s 8 f i (s) 1 i 8 2.2.2 2048 N Szubert 2048 1 GPCC 2048 2016 7

2.2 TD 2 4 8 64 2 4 2 2 64 8 2 4 2 4 2 2 2 2 4 2 4 2 8 64 2 2 4 2 64 8 4 2 64 2 8 2 4 2 4 2 2 4 2 2 2 4 8 64 2.1 2 4 2 4 2 8 2 64 64 8 4 2 2 2 4 2 n 2.2.1 2048 7 8 1 N 8 [12] m N s N W j (1 j m) V m 8 V (s) = W j (f i (s)) j=1 i=1 2.2 3 N 8

2.2 TD (1) (2) (3) 2 4 8 64 2 4 2 2 (1) (2) (3) 0 0 0 419.7 0 0 1 477.3.... 1 2 0 854.0.... 2.2 3 2 k 854.0 2.2.3 2048 TD TD 2.2.5 s A(s) {N, E, S, W} a R(s, a) N(s, a) arg max (R(s, a) + V (N(s, a))) (2.1) a A(s) 2.2.4 2048 TD 2048 TD 2048 TD 2048 2048 2048 9

2.2 TD 2.3 Szubert Jaśkowski 2 4 2 6 [12] 2.4 Wu 4 6 [4] 2 2048 TD 2.1 2048 2 4 A(s) arg min (V (N(s, a))) (2.2) a A(s) TD N 2.3, 2.4 Szubert Jaśkowski 2.3 N Wu 2.4 N [4] 2.3, 2.4 N 2.2.5 TD 1 TD TD TD 0 2 2048 2 2 1 10

2.2 TD TD [3] [13] N TD TD t s t 2.1 a r = R(s t, a) N(s t, a) 3 s t 1 TD d d = V (s t) + r V (s t 1) (2.3) W j (s t 1) W j(s t 1 ) = W j(s t 1 ) + α d 8m (2.4) α 0.08 8 m N TD 8m t s t N(s t, a) s t t 1 t r 2.3, 2.4 3 2 4 11

3 N 3.1 N Szubert 2.3 Wu 2.4 N N N N N 3.1 N 1 3.1 N 8547 2048 N N 1 N N N 1 N N 9 N Szubert Wu N 4 [12, 4] N 1 3.1 N N 3.1 3 10 1 N 12

3.2 N 3.1 N N 1 2 3 4 5 6 7 8 8547 1524 2744 2606 1254 351 61 6 1 1 3 3 2 21 4 17 3 77 8 29 40 4 252 17 74 99 62 5 567 33 136 220 133 45 6 1051 68 254 372 257 80 20 7 1465 119 383 524 309 107 19 4 8 1674 195 522 573 290 74 18 1 1 9 1465 261 544 469 149 38 3 1 10 1051 300 456 238 49 7 1 11 567 257 241 64 5 12 252 169 76 7 13 77 66 11 14 21 20 1 15 3 3 16 1 1 3.2 N N TD N 3.2 N 13

3.2 N 3.1 3.2 2 N N N F = 18 F = 16 1 18 1 16 1 2 18 2 16 2... i 18 i 16 i... 16 18 16 16 16 1 1 F N 3.2 2048 F = 18 Wu F = 16 2 F = 16 N N N F = 18 N = 16 N 1 1 2 F = 18 2 17 = 131, 072 F = 16 2 15 = 32, 768 14

3.2 N N 5 6 7 8 4.1 MB 67 MB 1.1 GB 17 GB 3.3 1 32bit N 1 N N N N 1 32bit N 1 3.3 N = 6, 7 15

4 N 4.1 N N N TD N N 6 1 C 6 = 68 7 1 C 7 = 119 N p i i i s i = 1, s i = 0 P P = C i=1 s ip i. E = j (P j c i=1 s jip i ) 2 10 TD 1,000,000 10,000 100 6 680 7 1190 4.1, 4.2 6 13,066 7 12566 238%, 261% 69%, 61% 1,000,000 16

4.1 N 35000 30000 25000 partial score 20000 15000 10000 5000 0 0 10 20 30 40 50 60 70 rank of 6-tuples 4.1 6 35000 30000 25000 partial score 20000 15000 10000 5000 0 0 20 40 60 80 100 120 rank of 7-tuples 4.2 7 4.1 N = 6, 7 4 4 N 6-01 7-01 2 2048 8 7 17

4.1 N 4.1 4 6 7 N 4 4 6-01 6-02 6-65 6-66 6 31,161 24,530 9,644 9,642 6-03 6-04 6-67 6-68 22,207 20,576 9,563 9,052 7-001 7-002 7-116 7-117 7 32,900 23,504 8,543 8,204 7-003 7-004 7-118 7-119 23,483 23,338 7,918 7,683 7 7-119 2 2 4.2 Wu 4.1 4 6 6-01 6-04 4.2 1 6-40 6-48 N N N 18

4.1 N 4.2 6-40 6-48 1 12,190 11,330 6-24 7-096 7-087 7-079 2 14,320 10,573 10,988 11,521 6-13 7-005 7-086 3 15,930 23,304 10,995 4.2 2 N 6 6-24 68 6 24 7 3 7 7-96 7 2 7 4.1 7-119, 4.2 2 7-96, 7-87, 7-79 4.2 3 N 1 N + 1 N + 1 6 6-13 1 2 7 7-005 7-086 19

4.2 N average score (x1000) 250 200 150 100 50 m=40 m=20 m=16 m=10 m= 8 m= 4 m= 2 m= 1 0 0 1 2 3 4 5 6 number of game learned (x1,000,000) average score (x1000) 250 200 150 100 50 4.3 6 m=10 m= 8 m= 6 m= 4 m= 2 m= 1 0 0 1 2 3 4 5 6 number of game learned (x1,000,000) 4.4 7 4.2 N N m TD 2 6 45 7 10 3?? 3.3 6 64 MB 7 20

4.2 N max score (x1000) 600 500 400 300 200 100 m=40 m=20 m=16 m=10 m= 8 m= 4 m= 2 m= 1 0 0 1 2 3 4 5 6 number of game learned (x1,000,000) max score (x1000) 600 500 400 300 200 100 4.5 6 m=10 m= 8 m= 6 m= 4 m= 2 m= 1 0 0 1 2 3 4 5 6 number of game learned (x1,000,000) 4.6 7 1.1 GB N = 6 m = 45 3.0 GB N = 7 m = 10 11 GB 2 Intel Xeon E5645 CPU 6 2.4 GHz 12 GB PC CentOS 5.5 2.6.18-194.e15 g++ 4.6.3 21

4.2 N 250 average score (x1000) 200 150 100 50 0 N=7, m=10 N=6, m=40 N=6, m=10 N=7, m= 4 N=6, m= 4 0 1 2 3 4 5 6 number of game learned (x1,000,000) 600 4.7 6 7 500 max score (x1000) 400 300 200 100 0 N=7, m=10 N=6, m=40 N=6, m=10 N=7, m= 4 N=6, m= 4 0 1 2 3 4 5 6 number of game learned (x1,000,000) 4.8 6 7 N 6,000,000 TD 10,000 10,000 10,000 10,000 1 N 22

4.2 N average score (x1000) 250 200 150 100 7-04 6-06 6-08 6-04 7-02 50 7-01 6-02 6-01 7-10 7-08 6-20 7-06 6-12 6-16 6-10 6-25 6-30 6-35 6-40 6-45 0 10 20 30 40 50 time for a move (µs) 4.9 5 5 4.3, 4.4, 4.7 4.5, 4.6, and 4.8 N N = 6 m = 20 m = 10 N = 6 m = 40 1,000,000 N = 6 N = 7 N = 6 N N = 7 6,000,0000 N = 7 m = 10 N = 6 4.9 1 N m m 7 6 1.5 N = 7 m = 10 N = 6 m = 40 23

4.2 N 4.9 N = 6 N = 7 m = 30 249,625 1 N 4.3 N = 6 m = 40 N = 7 m = 10 2048 1.5% N = 6 m = 40 N = 7 m = 10 2 1 16,384 N = 7 m = 10 32,768 1 30 GB 32 GB PC 24

4.2 N 4.3 N = 6 m = 1 m = 2 m = 4 m = 6 m = 8 m = 10 m = 12 2048 78.41 88.36 95.66 96.00 97.45 97.16 97.78 4096 36.05 62.77 90.32 91.36 93.87 94.05 95.17 8192 0.07 0.33 56.38 70.28 75.67 79.29 84.08 16384 0.00 0.00 0.04 0.07 0.05 29.64 39.02 32768 0.00 0.00 0.00 0.00 0.00 0.00 0.00 N = 6 m = 16 m = 20 m = 25 m = 30 m = 35 m = 40 m = 45 2048 97.87 98.38 98.45 98.42 98.46 98.41 98.30 4096 95.42 96.11 96.25 95.97 95.91 95.89 95.64 8192 85.01 86.51 87.61 86.41 86.14 85.75 86.45 16384 38.96 49.75 56.00 52.61 52.76 50.09 51.42 32768 0.00 0.00 0.00 0.00 0.00 0.00 0.00 N = 7 m = 1 m = 2 m = 4 m = 6 m = 8 m = 10 2048 82.54 90.74 96.43 97.16 98.27 98.50 4096 40.39 71.75 86.97 89.18 93.59 96.87 8192 0.31 26.08 63.22 73.28 84.87 83.63 16384 0.00 0.00 6.71 27.91 47.17 56.57 32768 0.00 0.00 0.00 0.00 0.00 0.03 25

5 2048 2048 TD 2048 TD N [9] 5.1 5.1 5.1.1 1 2048 [12, 4] TD N TD N [9] 6 ( 5.1) m = 1 8 6 Wu 4 6 2.4 N N 5,000,000 5.1 TD 2048 m = 5 26

5.1 (1) (2) (3) (4) (5) (6) (7) (8) 5.1 [9]6 (1) N 5.1 1 2048 TD m = 1 51,506 m = 2 113,313 m = 3 138,365 m = 4 169,649 m = 5 214,526 m = 6 212,384 m = 7 222,639 m = 8 230,513 m = 4 (Wu ) 178,386 5.1.2 5.1.1 N 1 2048 2048 N 27

5.1 TD 5.1 6 m = 1 8 2.4 4 6 (9 ) 5.1.1 9 m = 1 8 (11 ) 0% 25% 50% 75% (4 ) N 0 500 5.1.1 3 5.2, 5.3, 5.4 50 1000 5.2 m = 1 5.3 28

5.2 5.2 (1): 6 m = 1 3154 12793 m = 2 2797 11362 m = 3 2825 10697 m = 4 2869 11035 m = 5 2783 10255 m = 6 2883 10292 m = 7 2870 10351 m = 8 2806 9903 m = 4 (Wu ) 2857 10832 5.3 50% N 50% 9 5.2 (2 90% 4 10% ) 29

5.2 5.3 (2): m = 4 (Wu ) 2841 10816 m = 1 3231 12489 m = 2 3253 12450 m = 3 2801 10493 m = 4 2853 10925 m = 5 3037 11721 m = 6 2813 10653 m = 7 2760 10611 m = 8 2710 10110 m = 1,..., 8 2416 8087 11921 50127 m = 8 50% (1 ) 5.1 6 m = 1 8 2.4 4 6 (9 ) 0%, 25%, 50%, 75% (4 ) N 0 5,000,000 5.5 50% N 50% 30

5.3 5.4 (3): 00% 4575 20572 25% 2304 7100 50% 2156 6954 75% 2450 8716 5.5 : 5 (2 ) minimax 1 m = 8 m = 4 00% 813 2764 854 2436 25% 3018 15124 2652 13240 50% 5945 33752 6187 30488 75% 4499 26280 4486 28368 1 2765 16788 2902 12636 5.3 minimax expectimax 31

5.3 100000 ¹ l q± l 20000 10000 1000 w 1 3 5 7 1 3 5 7 1 3 5 1 3 5 1 3 5 N n m=2 m=4 m=8 minimax expectimax 7ÁÝûÛfçï NMey[QV}Š= 5.2 7 (3 ) minimax 1 minimax/expectimax 500000 ¹ l q± l 50000 20000 5000 w 1 3 5 7 1 3 5 7 1 3 5 1 3 5 1 3 5 N n m=2 m=4 m=8 minimax expectimax 7ÁÝûÛfçï NMey[QV}Š= 5.3 N ( m = 8) 5 (2 ) minimax 1 minimax/expectimax 5.3.1 50 1,000 32

5.3 2 4 50% minimax expectimax d 1, 3, 5, 7 0 1, 2, 3 expectimax expectimax d 1, 3, 5, 7 0 1, 2, 3 N 5.1.2 N d = 1, 3, 5 (0, 1, 2 ) minimax m 2, 4, 8 3 5.2 7 (3 ) minimax m = 8 N 5 (2 ) minimax 5.3 N minimax expectimax minimax N m minimax minimax expectimax minimax ( ) 33

5.3 5.3.2 50 1,000 minimax minimax d 1, 3, 5, 7 0 1, 2, 3 expectimax expectimax d 1, 3, 5, 7 0 1, 2, 3 1 N 5.1.1 N d = 1, 3, 5 (0, 1, 2 ) minimax m 2, 4, 8 3 N 5.2 N d = 1, 3, 5 (0, 1, 2 ) minimax m 2, 4, 8 3 5.4 7 (3 ) minimax m = 4 N 5 (2 ) minimax 5.3 N minimax 1 (m = 8) 34

5.3 100000 ¹ l q± l 20000 10000 1000 100 w 1 3 5 7 1 3 5 7 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5 m=2 m=4 m=8 m=2 m=4 m=8 N n minimax expectimax 1Ç@ŠnÝûÛfçï NMey[QV}Š= PSÝûÌÛfçï NMey[QV}Š= 5.4 7 (3 ) minimax 1 minimax/expectimax 50000 20000 ¹ l q± l 5000 500 w 1 3 5 7 1 3 5 7 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5 1 3 5 m=2 m=4 m=8 m=2 m=4 m=8 N n minimax expectimax 1Ç@ŠnÝûÛfçï NMey[QV}Š= PSÝûÌÛfçï NMey[QV}Š= 5.5 N m = 4 5 (2 ) minimax 1 minimax/expectimax 35

6 2048 [1, 15, 4, 12, 8] expectimax TD 2048 TD Szubert Jaśkowski [12] 2 4 2 6 1,000,000 TD 100,178 2 4 Wu 2 6 142,727 Wu expectimax 328,946 TD expectimax = 5 expectimax 2048 1 10 [10, 16] 1 N = 7 m = 10 10 GB 1 12 36

7 [8, 6, 5, 7] 7.1 2048 2048 N TD N N N N N N N N N N N N 10 7 234,136 504,660 N N 37

7.2 2048 7.2 2048 N TD 2048 N 2048 N [9] N 1 3 N TD N minimax minimax N minimax N minimax 1 5.5 2048 2048 [8] 38

2 IACP 39

[1] Ahmad Zaky, Minimax and Expectimax Algorithm to Solve 2048, http: //informatika.stei.itb.ac.id/~rinaldi.munir/stmik/2013-2014-genap/ Makalah2014/MakalahIF2211-2014-037.pdf, 2014. [2] Gabriele Cirulli, 2048, http://gabrielecirulli.github.io/2048/, 2014. [3] Gerald Tesauro, TD-Gammon, a self-teaching backgammon program, achieves master-level play, Neural computation, Vol. 6, No. 2, pp. 215 219, 1994. [4] I-Chen Wu, Kun-Hao Yeh, Chao-Chin Liang, Chia-Chuan Chang and Han Chiang, Multi-Stage Temporal Difference Learning for 2048, Technologies and Applications of Artificial Intelligence, Lecture Notes in Computer Science, Vol. 8916, pp. 366 378, 2014. [5] Kazuto Oka, Kiminori Matsuzaki, Systematic Selection of N-Tuple Networks for 2048, CG 2016, pp. 81 92, 2016. [6],, 2048 1 57, pp. 9 18, 2016. [7],, N-tuple networks 2048, 58, 2017. [8], 2048 Vol. 12, No. 1, pp. 123 130, 2015. [9] Kiminori Matsuzaki, Systematic Selection of N-tuple Networks with Consideration of Interinfluence for Game 2048, Proceedings of the 2016 Conference on Technologies and Applications of Artificial Intelligence (TAAI 2016), 2016. [10] Kun-Hao Yeh, Chao-Chin Liang, Kun-Hao Yeh and I-Chen Wu, 2048-bot tournament in Taiwan, https://icga.leidenuniv.nl/wp-content/uploads/2015/ 40

04/2048-bot-tournament-report-1104.pdf, 2014. [11],, bit,, 1997. [12] Marcin Szubert and Wojciech Jaśkowski, Temporal Difference Learning of N-Tuple Networks for the Game 2048, 2014 IEEE Conference on Computational Intelligence and Games, pp. 1 8, 2014. [13] Michiel van der Ree and Marco Wiering, Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play, IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (AD- PRL), pp. 108 115, 2013. [14], 2048, [2015], pp. 19 22, 2015. [15] Philip Rodgers and John Levine, An Investigation into 2048 AI Strategies, 2014 IEEE Conference on Computational Intelligence and Games, pp. 1 2, 2014. [16] Wojciech Jaśkowski and Marcin Szubert, Game 2048 AI controller competition GECCO 2015, http://www.cs.put.poznan.pl/wjaskowski/pub/ 2015-GECCO-2048-Competition/GECCO-2015-2048-Competition-Results.pdf, 2015. 41

A N 0 15 42

A.1 16 6 A.1 16 6 No. Tuple Score 1 [ 0, 1, 2, 3, 4, 5] 31,161 2 [ 0, 1, 2, 3, 4, 6] 24,530 3 [ 0, 1, 2, 4, 6, 7] 22,207 4 [ 0, 1, 2, 4, 5, 6] 20,576 5 [ 0, 1, 4, 5, 6, 7] 20,512 6 [ 0, 1, 5, 8, 9, 13] 20,283 7 [ 0, 1, 2, 5, 6, 7] 19,970 8 [ 0, 1, 2, 3, 4, 8] 18,990 9 [ 0, 1, 3, 5, 6, 7] 18,875 10 [ 0, 1, 2, 6, 7, 10] 16,587 11 [ 0, 1, 2, 3, 5, 6] 16,113 12 [ 1, 2, 4, 5, 6, 7] 16,086 13 [ 0, 1, 2, 3, 4, 7] 15,930 14 [ 0, 1, 2, 6, 7, 11] 15,752 15 [ 0, 1, 2, 6, 10, 14] 15,722 16 [ 0, 1, 5, 9, 13, 14] 15,718 No. Tuple Score 68 [ 1, 2, 5, 6, 8, 9] 9,052 67 [ 1, 2, 5, 9, 13, 14] 9,563 66 [ 1, 4, 5, 6, 7, 10] 9,642 65 [ 1, 5, 6, 7, 9, 10] 9,644 64 [ 1, 4, 5, 6, 10, 11] 9,661 63 [ 1, 5, 6, 9, 10, 14] 9,864 62 [ 1, 5, 6, 9, 10, 11] 10,069 61 [ 1, 5, 6, 7, 8, 9] 10,076 60 [ 1, 2, 5, 9, 10, 13] 10,129 59 [ 1, 2, 5, 9, 10, 11] 10,310 58 [ 1, 5, 6, 9, 10, 13] 10,379 57 [ 1, 4, 5, 6, 9, 10] 10,749 56 [ 0, 1, 5, 6, 9, 10] 10,766 55 [ 0, 1, 2, 5, 6, 9] 10,778 54 [ 0, 1, 2, 5, 9, 10] 10,875 53 [ 0, 1, 5, 6, 8, 9] 10,936 43

A.2 16 7 A.2 16 7 No. Tuple Score 1 [ 0, 1, 2, 3, 4, 5, 6] 32,900 2 [ 0, 1, 2, 3, 4, 5, 9] 23,504 3 [ 0, 1, 2, 4, 5, 6, 7] 23,483 4 [ 0, 1, 3, 4, 5, 6, 7] 23,338 5 [ 0, 1, 2, 3, 4, 5, 7] 23,304 6 [ 0, 1, 2, 3, 4, 6, 10] 21,954 7 [ 0, 1, 2, 3, 4, 8, 9] 21,766 8 [ 0, 1, 2, 4, 5, 6, 9] 20,591 9 [ 0, 1, 2, 4, 5, 6, 10] 20,406 10 [ 0, 1, 3, 5, 6, 7, 9] 17,577 11 [ 0, 1, 5, 6, 8, 9, 13] 17,423 12 [ 0, 1, 2, 3, 4, 5, 8] 17,394 13 [ 0, 1, 2, 5, 6, 7, 9] 17,330 14 [ 0, 1, 2, 4, 6, 7, 10] 17,300 15 [ 0, 1, 5, 8, 9, 13, 14] 16,818 16 [ 0, 1, 4, 5, 6, 9, 10] 16,590 No. Tuple Score 119 [ 0, 1, 2, 6, 10, 11, 15] 7,683 118 [ 0, 1, 2, 5, 9, 13, 14] 7,918 117 [ 1, 2, 4, 5, 9, 13, 14] 8,204 116 [ 1, 4, 5, 6, 9, 10, 11] 8,543 115 [ 1, 4, 5, 6, 10, 11, 14] 8,830 114 [ 0, 1, 5, 7, 9, 10, 11] 9,183 113 [ 1, 2, 5, 9, 10, 11, 14] 9,372 112 [ 1, 2, 5, 6, 8, 9, 13] 9,514 111 [ 1, 4, 5, 6, 7, 9, 13] 9,562 110 [ 0, 1, 5, 6, 10, 11, 14] 9,587 109 [ 1, 5, 6, 7, 9, 10, 13] 9,783 108 [ 1, 4, 5, 6, 7, 10, 14] 9,792 107 [ 0, 1, 5, 6, 7, 9, 11] 9,822 106 [ 0, 1, 5, 9, 10, 11, 14] 9,962 105 [ 1, 2, 5, 8, 9, 10, 11] 9,985 104 [ 0, 1, 2, 6, 10, 11, 14] 9,986 44