JAIST Reposi https://dspace.j Title 少数の記録からプレイヤの価値観を機械学習するチー ムプレイ AI の構成 Author(s) 和田, 堯之 ; 佐藤, 直之 ; 池田, 心 Citation 研究報告ゲーム情報学 (GI), 2015-GI-33(5): 1-8 Issue Date 2015-02-26 Type Journal Article Text version publisher URL Rights http://hdl.handle.net/10119/13464 社団法人情報処理学会, 和田堯之, 佐藤直之, 池田心, 研究報告ゲーム情報学 (GI), 2015-GI-33(5), 2015, 1-8. ここに掲載した著作物の利用に関する注意 : 本著作物の著作権は ( 社 ) 情報処理学会に帰属します 本著作物は著作権者である情報処理学会の許可のもとに掲載するものです ご利用に当たっては 著作権法 ならびに 情報処理学会倫理綱領 に従うことをお願いいたします Notice for the use of th material: The copyright of this mate retained by the Information Processi Japan (IPSJ). This material is publi web site with the agreement of the a the IPSJ. Please be complied with Co of Japan and the Code of Ethics of t any users wish to reproduce, make de work, distribute or make available t any part or whole thereof. All Right Copyright (C) Information Processing Japan. Description Japan Advanced Institute of Science and
AI 1,a) 1,b) 1,c) RPG AI AI AI AI AI 70.6% 67.1% 3.5% AI, RPG,, Design of a Teammate AI by Learning Human-player Utility from a few Records of Actions Wada Takayuki 1,a) Sato Naoyuki 1,b) Ikeda Kokolo 1,c) Abstract: Some genres of commercial video games, especially RPG games, allow players to play the game with the AI players as the teammates. But the AI players as the teammates often take actions that the human player does not expect them to do. Such mismatches between the expectations of the human players and the actions taken by the AI players often cause dissatisfaction of the players. One of the reasons for such mismatches is that there are several types of sub-goals in these games and the AI players act without understanding which types of sub-goals are important for each human player. The purpose of this study is to propose a method to develop teammate AI players that estimate the sub-goal preference of the human players and act with causing less dissatisfaction of the players. In an evaluation experiment, we prepared some artificial players with various preferences for the sub-goals and tried to estimate their sub-goals by the proposed method. The selected actions based on the estimated sub-goal preferences were the same as the selected actions by the original artificial players at the rate of 67.1% in one setting. The upper bound of the rate is about 70.6% (in this setting), which is the rate at which the same actions are selected when the preference of sub-goals is the same. Thus the proposed method is only 3.5% inferior in performance in the worst case compared to an ideal estimation. Keywords: Game AI, RPG, Utility, Machine Learning, Team-mate, Cooperation game 1 JAIST, Asahidai 1-1, Nomi, Ishikawa, Japan a) s1310082@jaist.ac.jp b) satonao@jaist.ac.jp c) kokolo@jaist.ac.jp 1. AI 1
AI AI AI Sander,B. [1] QuakeIII AI AI AI RPG AI AI AI AI AI 2. AI AI [2] AI Infinite Mario Bros. Matteo [3] AI Believability AI Sander AI QuakeIII AI AI [1] RPG QuakeIII RPG 1 Sander AI [9] [7] [4][8] [10], [11] [5] (w 1 + w2 ) w 1 0 w 2 0 RPG AI [6] 100 2
1 3. AI AI AI (a) AI (b) AI (c) (d) [6] (a) AI 1 ( 1 ) RPG RPG ( 2 ) AI ( 3 ) ( 4 ) ( 5 ) ( 6 ) (5) ( 7 ) 4. 1 1 3
4.1 S A j s j S A s j A a j A s j {(s j, a j )} j 2 4.2 s a A s 1 a a A s π S R A s a π i s i (s, a, π) s i (s, a, π) S R n x i (s, a, π) m x(s, a, π) = 1 m m x i (s, a, π) (1) i=1 x 2 1 π [7] x(s, a, π) a 2 a 1 π x(s, a 1, π ) x(s, a 2, π) 4.3 s S A s A a A s a A s π : S R A Π s i (s, a, π) x i (s, a, π) R n x(s, a, π) R n w u( x, w) R 1 s s a π i s i (s, a, π) { x i (s, a, π)} i w x u : x R (2) u( x(s, a, π), w) = x(s, a, π) w (2) x(s, a, π) s a π w a (3) max u( x(s, π Π a, π), w) max u( x(s, a, π), w) (3) π Π,a A s Π π w W w a* (3) w W w W 1 4
Algorithm 1 for each w W do p w = 0 end for for each (s, a ) {(s j, a j )} j do for each w W do u = max π Π u( x(s, a, π), w) for each a A s \ a do if u < max π Π u( x(s, a, π), w) then p w + = 1 end if end for end for end for return arg min p w w W 4.4 a π x(s, a, π) u( x, w) arg max u( x(s, a, π) w) a A s,π Π 5. 5.1 5 HP 0 0 MP 3 5.2-1 1 1 6 10 5.3 3 1 RPG 6. 5
2 2 HP MP 1 134 30 60 28 2 108 60 44 34 1 52 0 40 26 2 82 32 38 32 3 70 0 50 30 6.1 5 2 2 MP MP HP 6.2 7 ( 1 ) ( 2 ) HP HP ( 3 ) 5 ( 4 ) 5 MP ( 5 ) 5 ( 6 ) (2) (4) ( 7 ) (2) (5) 6.3 1000 1 2 HP MP Turn AI 3 3 HP 99% 99% 96% 96% MP Turn 98% 83% 93% 69% AI 8 10% 7. w 7.1 3 x 4 x = {x HP, x MP, x T urn } (4) 4 5 7 a, b, x HP HP = HP (5) x MP MP = MP (6) x T urn = b a (7) 7.2 w x 3 x HP, x MP, x T urn 1 1 [1,10,0.1] MP [1,0.1,0.1] HP W x MP, x T urn 2 31 31 32 1 32 7.3 w 6
4 [1, 4, 8] 5 [1, 1 8, 1 16 ] 2 AI w = [1, 4, 8] MP Turn 4 w = [1, 1 8, 1 16 ] HP 5 [1, 4, 8] 10% [1, 1 8, 1 16 ] [1, 4, 8] 1 2 2 8 1 8 8 1 20 5 [1, 1/8, 1/16] [1, 1/16, 1/32] 6 1 1 3 5 8 5 [1,12,0.167] MP 4 15% MP 8. AI 8.1 4 4 5 20 RPG 6 7 5 4 8.2 7
4 HP MP Turn HP 1 0.071 0.071 Turn 1 0.143 18 MP Turn 1 10 10 MP 1 12 0.167 5 AI 4.1 MP Turn AI 3.1 MP AI 3.4 4.3 Turn Turn AI 4.0 MP AI 2.6 4.0 MP Turn Turn AI 3.0 MP AI 2.7 AI AI 7 2 8 1 3 MP Turn MP Turn 1 4 2 1 4 1 AI 2 1 5 AI 3 AI 2 [1, 0.3, 3] Turn AI [1, 4, 0.25] MP AI AI 7 5 Turn Turn AI (4.0) MP AI (2.6) (4.3) [1, 0.5, 16] [1, 0.3, 3] 9. [1] Sander Bakkes, Pieter Spronck and Eric Postma : TEAM : The Team-Oriented Evolutionary Adaptability Mechanism, Entertainment Computing - ICEC 2004, pp.273-282, 2004. [2] 55(7) pp.1655-1664 2014. [3] Matteo Bernacchia Hoshino Jun ichi AI platform for supporting believable combat in role-playing games, 2014 pp.139-144 2014. [4] 48 6 (2) pp.123-124 1994. [5] pp.1-28 1997. [6] AI 29 pp.1-8 2013. [7] Remi Coulom Computing Elo ratings of move patterns in the game of Go International Computer Games Association Journal 30 (2007) pp.198208 2007. [8] 2011 pp.46-53 2011. [9] 2001 pp.17-24 2001. [10] 2006 pp.78-83 2006. [11] AI Vol 2010-GI-24 No.3 pp.1-7 2010. 8