COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

Vol. 50 No. 12 2796 2806 (Dec. 2009) 1 1, 2 COM TCG COM TCG COM TCG Strategy-acquisition System for Video Trading Card Game Nobuto Fujii 1 and Haruhiro Katayose 1, 2 Behavior and strategy of computers (COM) have recently attracted considerable attention with regards to video games, with the development of hardware and the spread of entertainment on the Internet. Previous studies have reported strategy-acquisition schemes for board games and fighting games. However, there have been few studies dealing with the scheme applicable for video Trading Card Games (video TCG). We present an automatic strategy-acquisition system for video TCGs. The proposed strategy-acquisition system uses a sampling technique, Action predictor, and State value function for obtaining rational strategy from many unobservable variables in a large state space. Computer simulations, where our agent played against a Rule-based agent, showed that a COM with the proposed strategy-acquisition system becomes stronger and more adaptable against an opponent s strategy. 1. TCG TCG 1) 2006 11 158 2) TCG TCG 3) 2008 8 1 8 TCG TCG 1 COMCOM COM COM COM 4),5) 6) 10) TCG COM 7) 10) TCG Hearts TCG TCG TCG 2 3 TCG 1 Graduate School of Science and Technology, Kwansei Gakuin University 2 CrestMuse CrestMuse Project, CREST, JST 2796 c 2009 Information Processing Society of Japan

2797 4 5 6 7 2. 2.1 COM COM 4) 5) COM COM 3 4) 5) 2 2.2 COM COM 6) 7) 10) COM Bonanza 6) Bonanza 6 10 20 Hearts COM 7) 10) 52 4 3 Hearts 3 2,000 4,000 Hearts 2.3 TCG TCG TCG TCG 3 TCG TCG

2798 TCG TCG 3. 3.1 COM 1 (1) COM 15 3 (2) 3 1 2 (3) (4) COM 1 (5) 0 (6) 3 (3) (4) (5) (3) (4) (5) 1 6 1 Table 1 Elemental affinity. 1 3 1 1 1/8 HP 1 1 3 3 TCG 400 6 15 10 2

2799 7,000 TCG TCG 3.2 3.1 1 COM 20 HMM 11) RNN 12) 13) 14) TCG 7) 10) 4. 4.1 6 1 1. U(H t,a t) H t t a t COM π(h t) π(h t) = arg max at U(H t,a t) (1) s t t S R(s t,a t,s t+1) a t s t s t+1 V (s t+1) 1 Fig. 1 Learning method for optimum action.

2800 U(H t,a t)= P (s t H t) P (s t+1 s t,a t){r(s t,a t,s t+1)+v (s t+1)} (2) s t S s t+1 S COM COM COM o t COM P (s t H t) 0.0 1.0 s t S 1/ 6. 2. P (s t+1 s t,a t) á t P (á t s t) COM COM 0.0 1.0 P (s t+1 s t,a t)=p (á t s t) (3) P (s t+1 s t,a t) o t P (át ot)p (ot st) O 1 MLP MLP 3 32 32 9 0.0 1.0 3. 0.0 2.0 2.0 1 2.0 R(s t,a t,s t+1) MLP COM 5+5 10 1 4. V (s t+1) MLP 1 32 32 1 5. s t S s t+1 S ŝ t ŝ t+1

2801 U(H t,a t) P (ŝ t H t) P (ŝ t+1 s t,a t){r(ŝ t,a t, ŝ t+1)+v (ŝ t+1)} (4) ŝ t S ŝ t+1 S 6. 15 1 5 32 0 17 0 4 COM 5 9 10 11 COM 12 13 COM 14 COM 15 COM 16 COM 17 COM 18 22 COM 23 27 28 29 COM 30 31 COM 0 4 5 9 COM 0 1.0 1 3 0.0 18 22 23 27 COM 3 18 3.0 19 22 0.0 25 32 18 22 COM 30 31 COM 9 0 1 2 3 4 5 6 7 8 0 8 0.0 1.0 COM 0 1.0 1 7 0.0 COM 5 10 17 30 31 10 17 2 4 MLP 4.2 3 3 1. 6

2802 MLP 6 6 6 6 2. 10 3. MLP 1 COM 3 15+10 25 COM 1.0 0.0 1 COM 3 COM 3 80% 20% 3 3 3 TCG 3 3 3 4.3 TCG MLP 6 6+1 7 1 4.1 4.4 TCG MLP 3 3 32 32 3 1.0 0.0 1 4.1

2803 5. RL-agent Rule-based RL-agent Rule-based 100 Rule-based 200 RL-agent 3 Rule-based 200 RL-agent RL-agent Rule-based Rule-based Rule-based Rule-based Rule-based RL-agent 3 Rule-based 5.2 5.3 TCG 5.4 5.1 Rule-based Rule-based 11 (1) (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) Rule-based 8 10 5 Rule-based 5 7 4 5.2 RL-agent Rule-based Rule-based Rule-based 5,200 RL-agent 2 RL-agent 3 RL-agent 500 RL-agent RL-agent 25 2,200 80% RL-agent 50% 500 Rule-based Rule-based Rule-based Rule-based Rule-based RL-agent 80% Rule-based

2804 2 Fig. 2 Rule-based RL-agent s winning percentage. 4 Fig. 4 Select RL-agent at random. 3 RL-agent 3 RL-agent 70% 80% 5.4 Rule-based 3 3 Fig. 3 Addition of new rules. 5.3 RL-agent 2,200 80% RL-agent Rule-based 2,500 3 3 3 RL-agent 2,500 1 2 80% Rule-based 2,500 Rule-based 2,500 Rule-based 4 2,500 RL-agent 70% Rule-based 2,500 2 10% 2,500 RL-agent 2 80% 2,500 2,500

2805 6. 6.1 TCG 80% COM TCG 6.2 TCG TCG TCG 1 5.4 TCG COM 1 TCG COM COM COM COM 7. TCG TCG TCG TCG

2806 TCG COM COM TCG 1 Web 1) (1999). http://www.yugioh-card.com/japan/ 2) Guinness World Records. http://www.guinnessworldrecords.com/ 3) (1996). http://www.pokemon.co.jp/ 4) fnirs Vol.2006, No.134, 2006-EC-3, pp.29 35 (2006). 5) 2006 pp.157 164 (2006). 6) futility pruning Vol.47, No.8, pp.884 889 (2006). 7) Vol.102, No.731, pp.167 172 (2003). 8) Ishii, S. and Fujita, H.: A Reinforcement Learning Scheme for a Partially- Observable Multi-Agent Game, Machine Learning, Vol.59, pp.31 54 (2005). 9) Vol.J88-D-II, No.11, pp.2277 2287 (2005). 10) Fujita, H. and Ishii, S.: Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation, Neural Computation, Vol.19, pp.3051 3087 (2007). 11) (2005). 12) RNNPB TL-2006-22, NLC-2006-18, PRMU2006-99, pp.45 50 (2006). 13) Vol.2004, No.74, 2004-HI-109, pp.1 6 (2004). 14) Sutton, R.S. and Barto, A.G.: Reinforcement Learning, An Introduction, MIT Press (1998). 15) Vol.44, No.1, pp.31 48 (1996). ( 21 3 19 ) ( 21 9 11 ) 2007 2009 1991 HCI 21 CREST CrestMuse