1 1.1 4 TD TD 1.2 TD 1.3 2 TD 3 4 1
2 TD 2.1 2.1.1 2.1.2 0 ( ) (1) 100 8 6 2 200 2.1.3 Minimax Minimax () () PV(Principal Variaion) 2
45 A 45 B 30 C 20 D 50 45 40 30 25 20 E F G H I J 50 45 40 45 35 40 35 30 25 25 20 20 15 10 K L M N O P Q R S T U V W X 1Minimax 1 K E F A,B,E,N PV (Negamax Form) 2 in Minimax(node_ n, in d){ in i, score = - ; if(d == 0 n == erminal) reurn Evaluae(n); for(i = 0; i < n.num_of_children; i++){ g = Minimax(n.child_node[i], d-1) score =max(score, g); } } 2Minimax (Negamax Form) 3
2.1.4 Alpha-Bea Alpha-Bea Minimax 3Alpha-Bea 3 G 40 40 45 A 45 C C D Minimax 2 4 Alpha-Bea (Negamax Form) 4
in AlphaBea(node_ n, in d, in _, in _){ in score = - ; if(d == 0 n == erminal) reurn Evaluae(n); for(i = 0; i < n.num_of_children; i++){ score = max(score, -AlphaBea(n.child_node[i], d-1, -_, -_)); _ = max(_, score); } } if( ) reurn _; reurn score; 4Alpha-Bea 2.2 TD TD(Temporal Difference) TD 2.2.1 TD TD(Temporal Difference) ( 1 ) TD 1 () TD 1 1 1 1 TD TD TD(0) S s S V(s ) s +1 V(s +1) V(s ) V ( s ) V ( s ) + [ V ( s+ 1) V ( s )] (2) 0<1 5
TD(0) TD() V k ( sk ) V ( sk ) + [ V ( s+ 1 ) V ( s )] (3) 1k k (3) 01 =0 TD(0) 2.2.2 2.2.1 1 1 1 1 TD 1 1 V(s ) ( ) P 2 MSE MSE( ) = s S P( s)[ V ( s) V ( s)] V (s) s 2 T = ( (1), (2),..., ( n)) V (s) ss 2 s [ V + 1 ( s ) V ( s )] 2 (4) TD(0) 1 2 1 2 + = [ V 1( s ) V ( s )] + = + [ V ( s ) V ( s )] V ( s ) + 1 (5) TD() k 1 [ V ( s 1) V ( s )] + + + V ( sk ) (6) k = 1 6
2 5 0 e = s_ s V(s) - V(s) e e + V ( s) + e s s' s 5TD() n T V ( s) = = ( i) ( i) s i= 1 s V ( s) = (8) s 1 (7) 2.2.3 TDLEAF() 2.2.12.2.2 TD 2.1.2 Principal Variaion TD() TDLEAF() 7
2.2.4 TD KnighCap KnighCap 0 1.0 0.7 1368 2.2.5 TD Alpha-Bea 5 [3] 5 1 8
3 3 3.1 1 2 1 100000 950 1300 800 1150 600 550 600 400 600 370 600 100 600 2 1150 1000 660 605 440 407 110 2 1 3.2 (10 50 ) 1000 800 250 TDLEAF() 0.9 9
[-99999,99999] (9) 6 P = 1 1+ e (9) ( E) E /1000 E 600 660 1260 1 P( 1260) = / 1000 1+ e 1260 = 0.78 (10) dp de = 1 1000 P(1 P) P E P = E i i 1 = i P(1 P) 1000 (11) (12) (6)(0,1] 20 1 () 20 9999999999 50000 1000000 200000 5 3000 1000 10
3.3 TD while(1){ } _ 10 50 while(1){ if() else if(){ principal variaion principal variaion } } 7 11
4 1000 4.1 3.1 3.1 2 100 0 0 8 3 9 4 10 5 12
3 (100 ) 842 754 641 554 334 224 100 950 800 600 550 400 370 100 0.88 0.94 1.07 1.01 0.84 0.61 1.00 4 1309 1104 641 399 361 336 533 1300 1150 600 600 600 600 600 1.01 0.96 1.07 0.67 0.60 0.56 0.89 13
490 1000 4 7 5 1128 857 740 602 428 436 114 1150 1000 660 605 440 407 110 0.98 0.86 1.12 0.99 0.97 1.07 1.03 1000 4 8 14
4 8 4.2 (3.1 ) 30 4 6 11 6 A B C D E F G H -110-99 -84-102 -83-95 -125 71-20 -6 +15-30 -10-15 -50-10 4 4 4.3 4.2 8 8 (5 )8 13104 12 12 1~4 ( 6~9 ) 15
6~9 ( 1~4 ) 5 5 13 4 6 3.1 ( ) 3.1 5 14 16
5 8 () 7 44-53 -15 78 27-11 -71 87 138 60 124-151 137 51 106 132-65 103 190 142 17
( ) 4.4 0 4 8 4 6 4 4 5 4 6 5 5 8 4 8 18
TDLEAF() KnighCap TD KnighCap KnighCap 1368 1 1 19
5 5.1 TD TD 4 8 5.2 1 1 20
1 21
[1]Jonahan BaxerLearning To Play Chess Using Temporal Differences. [2]Richard S. Suon, Andrew G.Baro, (2000 ). [3]TD (1999 ). [4],,TD (1999 GPW 99 ). [5]Akihiro KishimooTransposiion Table Driven Scheduling for Two-Player Games, M.Sc. Thesis, Universiy of Albera (Final version), January 2002. [6],,, [7],(1998 ) 22