Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

1,a) 2,3,b) Q ϵ- 3 4 Q greedy 3 ϵ- 4 ϵ- Comparation of Methods for Choosing Actions in Werewolf Game Agents Tianhe Wang 1,a) Tomoyuki Kaneko 2,3,b) Abstract: Werewolf, also known as Mafia, is a kind of game with imperfect information that features information asymmetry. Recently, artificial intelligence achieved excellent performance in the world of perfectinformation games, however, there are still room for development of imperfect-information games. In this paper, we implemented the agents of Villager, Seer and Werewolfin which we applied our model on the basis of Q-learning, a reinforcement-learning technique, and compared ϵ-greedy,, and methods for choosing actions. For each method, we evaluated it in 3-to-4-player games by the win ratio of agents after learning multiple cases. The experimental results by evalutaing win ratio in greedy method showed that the agent learned in ϵ-greedy performed best in 3-player games while in 4-player games, the Villager agent used method, the Werewolf agent used ϵ-greedy method and the Seer agent used method in stage of learning performed best after learning respectively. 1. 1 Graduate School of Interdisciplinary Information Studies, The University of Tokyo 2 Interfaculty Initiative in Information Studies, The University of Tokyo 3 JST, PRESTO a) wangtianhe@g.ecc.u-tokyo.ac.jp b) kaneko@acm.org [2] [6] 2017 Information Processing Society of Japan - 177 -

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4] Q [3] ϵ- softmax Q Q 3.1 CO A B B C C A B B C CO CO 1 N 3 N N CO i j i j 3.2 Q 6 2017 Information Processing Society of Japan - 178 -

1 0 0 1 2 0 CO 1 2 1 1 0 CO 1 2 2 2 0 2 1 CO ( 1 ) ( 2 ) ( 3 ) ( 4 ) 8 ( 5 ) ( 6 ) 3.3 Q [3] 100 0 Q 4.1 3-4 [5] 3 4 2 Q Sample Player 5 ϵ- 100000 0.9 0.8 1 3 1 4 3 20000 ϵ- 100000 ϵ- ϵ- 12 3.4 CO CO CO 4. [3] [5] 5 5 5 5 5 0 100002000030000400005000060000700008000090000100000 2 3 2017 Information Processing Society of Japan - 179 -

2 4 3 10000 ϵ- 2 0.8 5 5 5 5 5 0 100002000030000400005000060000700008000090000100000 5 4 5 4 4 6000 ϵ- 5 3 3 3 4 3 10000 ϵ- 5 0.9 0.8 4 4 4 4 4 3 60000 ϵ- 16 6 4 6 4 4 3 50000 ϵ- 13 ϵ- ϵ- ϵ- 2017 Information Processing Society of Japan - 180 -

2 3 2 3 ϵ- 44 15 110 44 15 103 44 15 111 44 15 124 3 2592 3888 18144 3 4 ϵ- 227 202 1700 227 202 1631 227 202 1610 227 202 1802 4 2519424 3779136 37791360 1 Sample Player Sample Player CO 2 4.2 Q ϵ- Q 10000 4 5 4 3 ϵ- 70.0% 37.2% 39.6% 60.8% 37.1% 36.0% 66.7% 35.8% 38.7% 66.4% 36.0% 39.6% ϵ- ϵ- 5 4 ϵ- 66.8% 52.7% 28.1% 57.3% 50.1% 21.3% 53.8% 51.8% 52.2% 7% 52.2% 31.2% ϵ- Q Q Q 5. Q ϵ- 3 ϵ- 4 ϵ- 3 2017 Information Processing Society of Japan - 181 -

JSPS 16H02927 JST [1] Sutton, R. S. and Barto, A. G.: Reinforcement learning: An introduction, Vol. 1, No. 1, MIT press Cambridge (1998). [2] Technologies, D.: DeepMind, https://deepmind.com/. Accessed July 24, 2017. [3] Vol. 29, pp. 1 3 (2015). [4] 30 pp. 174 179 (2014). [5] Vol. 2015 (2015). [6] (2016). 2017 Information Processing Society of Japan - 182 -