1 4 1,a) 1,b) 1,c) 2,d) 1,e) 1,f) 4 1 1 4 1 4 4 1 4 Adapting One-Player Mahjong Players to Four-Player Mahjong by Recognizing Folding Situations Naoki Mizukami 1,a) Ryotaro Nakahari 1,b) Akira Ura 1,c) Makoto Miwa 2,d) Yoshimasa Tsuruoka 1,e) Takashi Chikayama 1,f) Abstract: There is not much research on computer mahjong and state-of-the-art computer players are not as strong as average human players. In this work, we first analyze the differences between four-player mahjong and one-player-mahjong, which does not have any difficulty of multi-player games, and then adapt a oneplayer mahjong program to four-player mahjong by filling the differences. We have found that the biggest difference lies in the necessity of recognizing folding situations and built a machine learning component to recognize such situations. Experimental results show that the playing strength of our mahjong program with the recognition component is comparable to that of average human players, despite the handicap of not being able to use any tiles discarded by other players. 1. 2 1 The University of Tokyo 2 School of Computer science,the University of Manchester a) mizukami@logos.t.u-tokyo.ac.jp b) nakahari@logos.t.u-tokyo.ac.jp c) ura@logos.t.u-tokyo.ac.jp d) makoto.miwa@manchester.ac.uk e) tsuruoka@logos.t.u-tokyo.ac.jp f) chikayama@logos.t.u-tokyo.ac.jp 1 4 1 1 4 4 1 1 4 1 4-1 -
2. 4 13 1 1 14 34 4 136 1 9 3 3 1 1 1 1 3 3 2 8 3. 1 3 [1] 3 2 [2] [3] [5], [6] 2 2 [4] 2 10 2 3 [5] [6] 4. 1 1 4 1 ( 1 ) 1 (5 ) ( 2 ) 1 4 (6 ) ( 3 ) (7 ) ( 4 ) 1 4 (8 ) 5. 1 4 5.1 1 4 1 4 4 () [7] - 2 -
1 1 27 1 1 (%) 51 1 48 36 Plain UCT 7 4 170 5.2 1 1 1 x w (1) n f(x, w) = x i w i (1) i=1 14 1 13 t 0 t 1 (2) w = w + x t0 x t1 (2) [1] 1 9 n-gram (n=1 6) 37,488 5.3 [8] *1 0.1% *1 2009 2 20 2010 1 31 5.4 1 1 1 100 1 1 [2] Plain UCT 1 1 1 50% 1 [8] 6. 1 4 1 4 4 1 3 3 1 4 4 [7] - 3 -
2 Rank 1 Rank 2 Rank 3 0.62 0.85 0.93 1 0.53 0.77 0.85 1 4 n Rank n [8] 27 1,342 1 1,342 1 1 4 1 1 2 Rank 1 62% 1 53% 3 [1] Rank 1 56% 1 Rank 3 15% 193 3 4 1 3 1 4 6% 7. 6 3 89 0.42 19 0.10 17 0.09 12 0.06 10 0.05 8 0.04 5 0.03 4 0.02 29 0.16 1 7.1 1 3 5,716 (A) (B) 1,053 4 0.75 2 8 1 3 2 2 6 4 4 A/B 83 23 25 922-4 -
2 7.1 7.1 5 1 19 3 34 7.3 7.2 2 Support Vector Machine (SVM) 5 SVM LIBSVM [9] (c) (g) 2 10 2 10 2 3 7.2 5 6 Precision 0.76 0.97 Recall 0.71 0.97 F1 value 0.73 0.97-5 -
7 1 2 3 4 0.237 0.240 0.259 0.264 2.54 834 0.181 0.216 0.252 0.351 2.77 504 8 0.181 0.144 0.188 0.190 4 4 () (R) (3) 531 5,185 c=2 8 g=2 7 F1 6 5 8. 1 4 1 4 1 4 1 7.2 7.2 R = R + (50 Rank 20 + AveR R ) 0.2 (3) 40 Rank AveR R R 1500 0.1 100 8.2 2 7 t 2 1% 8 [7] 1507 1 1 8.1 1 [8] 5 10 [8] 8.3 1-6 -
1 4 1 2 3.3% 1.7% [7] 1 [1] Proceedings of 12th Game Programming Workshop (2007). [2] (2010). [3] Kocsis, L. and Szepesvári, C.: Bandit based monte-carlo planning, Machine Learning: ECML 2006, pp. 282 293 (2006). [4] Bowling, M., Risk, N. A., Bard, N., Billings, D., Burch, N., Davidson, J., Hawkin, J., Holte, R., Johanson, M., Kan, M. et al.: A demonstration of the Polaris poker system, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 1391 1392 (2009). [5] Risk, N. A. and Szafron, D.: Using counterfactual regret minimization to create competitive multiplayer poker agents, Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1-Volume 1, International Foundation for Autonomous Agents and Multiagent Systems, pp. 159 166 (2010). [6] (2013). [7]! (2009). [8] http://tenhou.net/ (2013). [9] Chang, C.-C. and Lin, C.-J.: http://www.csie.ntu.edu.tw/ cjlin/libsvm/ (2001). 9. 1 1 4 1 [8] 834 1507 1 1 1-7 -