Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

Similar documents
COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

The 19th Game Programming Workshop 2014 SHOT 1,a) 2 UCT SHOT UCT SHOT UCT UCT SHOT UCT An Empirical Evaluation of the Effectiveness of the SHOT algori

29 jjencode JavaScript

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

人工知能学会研究会資料 SIG-KBS-B Analysis of Voting Behavior in One Night Werewolf 1 2 Ema Nishizaki 1 Tomonobu Ozaki Graduate School of Integrated B

IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1

2 ( ) i

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2


FA

昭和恐慌期における長野県下農業・農村と産業組合の展開過程

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

【HP用】26.12月号indd.indd

26.2月号indd.indd

26.1月号indd.indd

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

,,,,., C Java,,.,,.,., ,,.,, i

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

2017 (413812)

0801297,繊維学会ファイバ11月号/報文-01-青山

1 1 tf-idf tf-idf i

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

FUJII, M. and KOSAKA, M. 2. J J [7] Fig. 1 J Fig. 2: Motivation and Skill improvement Model of J Orchestra Fig. 1: Motivating factors for a

1 StarCraft esportsleague WallPlayed.org 200 StarCraft Benzene StarCraft 3 Terran Zerg Protoss Terran Terran Terran 3 Terran Zerg Zerg Worker D

ODA NGO NGO JICA JICA NGO JICA JBIC SCP

29 Short-time prediction of time series data for binary option trade

Kyushu Communication Studies 第2号

130 Oct Radial Basis Function RBF Efficient Market Hypothesis Fama ) 4) 1 Fig. 1 Utility function. 2 Fig. 2 Value function. (1) (2)


04_奥田順也.indd

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website


企業の信頼性を通じたブランド構築に関する考察

untitled

4.1 % 7.5 %

9_18.dvi

Web Web Web Web Web, i

1_26.dvi


149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :

pp a p p. 6 45

IPSJ SIG Technical Report Vol.2012-IS-119 No /3/ Web A Multi-story e-picture Book with the Degree-of-interest Extraction Function

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

..,,...,..,...,,.,....,,,.,.,,.,.,,,.,.,.,.,,.,,,.,,,,.,,, Becker., Becker,,,,,, Becker,.,,,,.,,.,.,,

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

Web Basic Web SAS-2 Web SAS-2 i

untitled

特集_02-03.Q3C

百人一首かるた選手の競技時の脳の情報処理に関する研究

5 5 5 Barnes et al

untitled

Comparison of the strengths of Japanese Collegiate Baseball Leagues in past 30 seasons Takashi Toriumi 1, Hirohito Watada 2, The Tokyo Big 6 Baseball

fiš„v5.dvi

FA FA FA FA FA 5 FA FA 9

The Journal of the Japan Academy of Nursing Administration and Policies Vol 7, No 2, pp 19 _ 30, 2004 Survey on Counseling Services Performed by Nursi

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D


i JR NPO NPO 18

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

untitled

IPSJ SIG Technical Report Vol.2014-CE-126 No /10/11 1,a) Kinect Support System for Romaji Learning through Exercise Abstract: Educatio


Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2015-GI-34 No /7/ % Selections of Discarding Mahjong Piece Using Neural Network Matsui

Microsoft Word - toyoshima-deim2011.doc

ISSN ISBN C3033 The Institute for Economic Studies Seijo University , Seijo, Setagaya Tokyo , Japan

16_.....E...._.I.v2006

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

, IT.,.,..,.. i

Š²”u

WikiWeb Wiki Web Wiki 2. Wiki 1 STAR WARS [3] Wiki Wiki Wiki 2 3 Wiki 5W1H Wiki Web 2.2 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 2.3 Wiki 2015 Informa

13 RoboCup The Interface System for Learning By Observation Applied to RoboCup Agents Ruck Thawonmas

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

知能と情報, Vol.30, No.5, pp

05_藤田先生_責


untitled

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag

Vol.11-HCI-15 No. 11//1 Xangle 5 Xangle 7. 5 Ubi-WA Finger-Mount 9 Digitrack 11 1 Fig. 1 Pointing operations with our method Xangle Xa

10生活環境研究報告.indd

Š²”u

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

<30315F836D815B83675F95D08BCB8E812E696E6464>

( ) fnirs ( ) An analysis of the brain activity during playing video games: comparing master with not master Shingo Hattahara, 1 Nobuto Fuji

IT,, i

e-learning e e e e e-learning 2 Web e-leaning e 4 GP 4 e-learning e-learning e-learning e LMS LMS Internet Navigware

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

[1] AI [2] Pac-Man Ms. Pac-Man Ms. Pac-Man Pac-Man Ms. Pac-Man IEEE AI Ms. Pac-Man AI [3] AI 2011 UCT[4] [5] 58,990 Ms. Pac-Man AI Ms. Pac-Man 921,360

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

Library and Information Science No

Transcription:

1,a) 2,3,b) Q ϵ- 3 4 Q greedy 3 ϵ- 4 ϵ- Comparation of Methods for Choosing Actions in Werewolf Game Agents Tianhe Wang 1,a) Tomoyuki Kaneko 2,3,b) Abstract: Werewolf, also known as Mafia, is a kind of game with imperfect information that features information asymmetry. Recently, artificial intelligence achieved excellent performance in the world of perfectinformation games, however, there are still room for development of imperfect-information games. In this paper, we implemented the agents of Villager, Seer and Werewolfin which we applied our model on the basis of Q-learning, a reinforcement-learning technique, and compared ϵ-greedy,, and methods for choosing actions. For each method, we evaluated it in 3-to-4-player games by the win ratio of agents after learning multiple cases. The experimental results by evalutaing win ratio in greedy method showed that the agent learned in ϵ-greedy performed best in 3-player games while in 4-player games, the Villager agent used method, the Werewolf agent used ϵ-greedy method and the Seer agent used method in stage of learning performed best after learning respectively. 1. 1 Graduate School of Interdisciplinary Information Studies, The University of Tokyo 2 Interfaculty Initiative in Information Studies, The University of Tokyo 3 JST, PRESTO a) wangtianhe@g.ecc.u-tokyo.ac.jp b) kaneko@acm.org [2] [6] 2017 Information Processing Society of Japan - 177 -

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4] Q [3] ϵ- softmax Q Q 3.1 CO A B B C C A B B C CO CO 1 N 3 N N CO i j i j 3.2 Q 6 2017 Information Processing Society of Japan - 178 -

1 0 0 1 2 0 CO 1 2 1 1 0 CO 1 2 2 2 0 2 1 CO ( 1 ) ( 2 ) ( 3 ) ( 4 ) 8 ( 5 ) ( 6 ) 3.3 Q [3] 100 0 Q 4.1 3-4 [5] 3 4 2 Q Sample Player 5 ϵ- 100000 0.9 0.8 1 3 1 4 3 20000 ϵ- 100000 ϵ- ϵ- 12 3.4 CO CO CO 4. [3] [5] 5 5 5 5 5 0 100002000030000400005000060000700008000090000100000 2 3 2017 Information Processing Society of Japan - 179 -

2 4 3 10000 ϵ- 2 0.8 5 5 5 5 5 0 100002000030000400005000060000700008000090000100000 5 4 5 4 4 6000 ϵ- 5 3 3 3 4 3 10000 ϵ- 5 0.9 0.8 4 4 4 4 4 3 60000 ϵ- 16 6 4 6 4 4 3 50000 ϵ- 13 ϵ- ϵ- ϵ- 2017 Information Processing Society of Japan - 180 -

2 3 2 3 ϵ- 44 15 110 44 15 103 44 15 111 44 15 124 3 2592 3888 18144 3 4 ϵ- 227 202 1700 227 202 1631 227 202 1610 227 202 1802 4 2519424 3779136 37791360 1 Sample Player Sample Player CO 2 4.2 Q ϵ- Q 10000 4 5 4 3 ϵ- 70.0% 37.2% 39.6% 60.8% 37.1% 36.0% 66.7% 35.8% 38.7% 66.4% 36.0% 39.6% ϵ- ϵ- 5 4 ϵ- 66.8% 52.7% 28.1% 57.3% 50.1% 21.3% 53.8% 51.8% 52.2% 7% 52.2% 31.2% ϵ- Q Q Q 5. Q ϵ- 3 ϵ- 4 ϵ- 3 2017 Information Processing Society of Japan - 181 -

JSPS 16H02927 JST [1] Sutton, R. S. and Barto, A. G.: Reinforcement learning: An introduction, Vol. 1, No. 1, MIT press Cambridge (1998). [2] Technologies, D.: DeepMind, https://deepmind.com/. Accessed July 24, 2017. [3] Vol. 29, pp. 1 3 (2015). [4] 30 pp. 174 179 (2014). [5] Vol. 2015 (2015). [6] (2016). 2017 Information Processing Society of Japan - 182 -