2016 Future University Hakodate 2016 System Information Science Practice Group Report AI Project Name AI love Deep Learning TORCS Deep Learning Group Name TORCS Deep Learning /Project No. 14-B /Project Leader 1014041 Daichi Fukuda /Group Leader 1014053 Masafumi Takahashi /Group Member 1014018 Kaede Noto 1014023 Sora Ito 1014053 Masafumi Takahashi 1014066 Masataka Kato 1014094 Reina Saito 1014126 Tomoya Minamoto Advisor Takashi Takenouchi Kiyohito Nagano Kengo Terasawa Yasuhiro Katagiri 2017 1 18 Date of Submission January 18, 2017
,.,., ( ).,., A, B. A, B,.,...,., Python Long Short Term Memory(LSTM), Unity., Asynchronous method, Deep Q-Network(DQN), LSTM, TORCS. Asynchronous method.,,,,, Unity, TORCS - i -
Abstract Recently, AI attracts attention because it can imitate humans in various cases. AI is a kind of technology of Machine Learning. We use it to implement some intelligence. These are same intelligence as human s natural learning abilities. Especially, Deep Learning has archived good results in the field of image processing. In this project, our goal is to imitate and surpass human s thoughts with using Machine Learning. After we discussed our goal, we made two groups. These were group A and group B. The members in group A aimed to develop a combination of pitches expectation system. And the members in group B aimed to develop a car agent that can drive cars faster than humans. We belong to group B and use Deep Reinforcement Learning to develop such a car agent. Deep Reinforcement Learning is technique that applies technique of Deep Learning to function approximation of Reinforcement Learning. The problems to learn in good order are to set appropriate network, rewards, and environment. In the first semester, we implemented Long Short Term Memory (LSTM) on Python, rewards, and environment of a racing game on Unity. And we had the car agent learn with using these works. In the second semester, we had the car agent learn with using techniques of Asynchronous method, Deep Q-Network (DQN), and LSTM on TORCS. TORCS is open source car simulator. Finally, Asynchronous method car agent can drive cars faster in an oval track than humans. Keyword Artificial Intelligence, Machine Learning, Deep Learning, Reinforcement Learning, Unity, TORCS - ii -
1 1 1.1.......................................... 1 1.2.......................................... 1 1.3........................................ 1 1.4..................................... 2 1.5.......................................... 2 2 3 2.1....................................... 3 2.2...................................... 3 2.3...................................... 3 2.3.1............................... 3 2.3.2................................... 4 2.3.3................................... 4 2.3.4............................... 4 2.3.5................................... 5 2.4.................................... 5 2.4.1............................. 5 2.4.2............................. 6 3 7 3.1.............................. 7 3.2.............................. 8 4 9 4.1....................... 9 4.1.1 Python................................... 9 4.1.2 Unity.................................... 9 4.2....................... 10 4.2.1 Asynchronous method.......................... 10 4.2.2 Deep Q-Network.............................. 11 4.2.3 Long Short Term Memory........................ 11 4.3........................ 12 4.3.1 (, Python, Asynchronous method )..... 12 4.3.2 Python, DQN........................ 13 4.3.3 (Python, LSTM )........................ 13 4.3.4 Unity, DQN......................... 14 4.3.5 Unity, LSTM....................... 14 - iii -
4.3.6 Unity, Asynchronous method.............. 15 5 16 5.1 Asynchronous method............................... 16 5.2 Deep Q-Network................................... 16 5.3 Long Short Term Memory............................. 17 6 18 7 19 8 20 8.1................................. 20 8.2...................................... 20 8.2.1 Unity.................................... 20 8.2.2 Python................................... 21 8.3...................................... 22 8.3.1 Asynchronous method.......................... 22 8.3.2 Deep Q-Network.............................. 22 8.3.3 Long Short Term Memory........................ 22 8.4.......................................... 23 8.4.1............................... 23 8.4.2............................... 24 8.5....................................... 25 8.5.1............................. 25 8.5.2............................. 26 8.6....................................... 27 8.6.1................................ 27 8.6.2................................ 28 9 29 9.1.......................... 29 9.1.1.......................... 29 9.1.2.................... 29 9.2.......................... 30 9.2.1.......................... 30 9.2.2.................... 31 9.2.3................................... 32 33 - iv -
1 1.1,., ImageNet Large Scale Visual Recognition Challenge 2012(ILSVRC2012), 10% [1].,,., Google Google Brain YouTube, [2]., DeepMind AlphaGo, 4 [3]. 1.2,.,,.,,.,,,. 1.3, DeepMind Deep Q- Network(DQN). DQN,. DQN,,. Atari2600 49,, 43. 29. DQN 1.1. Group Report of 2016 SISP - 1 - Group Number 14-B
1.1: DQN 1.4 DQN 2. 1,. DQN., DQN,,. 2,.,, 1. DQN 4. 4. 1.5, 2. 1,, 2,,. 2,.,,. Group Report of 2016 SISP - 2 - Group Number 14-B
2 2.1. 1,. 2.2, DQN,., 1.4. (1). (2). 2.3 2. 2. 2.3.1,. Unity Python, Experience Replay, Fixed Target Q-Network, Long Short Term Memory(LSTM). Group Report of 2016 SISP - 3 - Group Number 14-B
2.3.2. Unity (1), (2),,,. Experience Replay, Fixed Target Q-Network (1), (2) Deep Q-Network. (1),,. Long Short Term Memory(LSTM) (2), 2.3.3,.,,,., OpenAI OpenAI gym,,. 2.3.4,. TORCS Python, Asynchronous Method, Deep Q-Network, Long Short Term Memory(LSTM). OpenAI gym,. Group Report of 2016 SISP - 4 - Group Number 14-B
2.3.5. TORCS (1), (2),,,. Asynchronous Method, Deep Q-Network, Long Short Term Memory (1), (2) Asynchronous Method, Deep Q-Network, Long Short Term Memory. (1),,. OpenAI gmy OpenAI gym,,. 2.4 2.4.1 Python, Unity Python, Unity 2,,.. Python Python-Unity LSTM. Experience Replay. Fixed Target Q-Network. Unity,.. Python,. Group Report of 2016 SISP - 5 - Group Number 14-B
2.4.2 Python, TORCS. Python Unity, 2 3. Python Unity 1,, Unity, Python.,. Asynchronous method : Asynchronous method,.. Deep Q-Network :. : Deep Q-Network. Long Short Term Memory : Long Short Term Memory,. : Long Short Term Memory,. Group Report of 2016 SISP - 6 - Group Number 14-B
3,. 1, 2, Python, 3 Python., TORCS TORCS, 1 2., 3,.,,. 3.1, Python ( ), Unity 2. Python, Unity. Python, Unity.,,., Python ILSVRC2012 AlexNet. Unity,, 6., MessagePack WebSocket, AlexNet Unity,., Python, Experience Replay LSTM Fixed Target Q-Network 3, Unity,., 3.1,. 3.1: Group Report of 2016 SISP - 7 - Group Number 14-B
3.2,. 2.3.3.,.,., gym-torcs TORCS, Unity TORCS.,,,,., 3, 1 2, 3.,, 2 1 3,.,,. Group Report of 2016 SISP - 8 - Group Number 14-B
4, 3., 5. 4.1,. 4.1.1 Python Experimence Replay, Fixed Target Q-Network, Long Short Term Memory,,. Python Unity, Life in Silico,. 4.1.2 Unity Unity, Unity Unity, C#.,, Unity, Blender 3D,., Python Unity Python, Life in Silico. Group Report of 2016 SISP - 9 - Group Number 14-B
4.2,. 4.2.1 Asynchronous method Asynchronous method, TORCS, TORCS,., Asynchronous method, Asynchronous method., Asynchronous method. 5.1.,,, OpenAI gym CartPole.,,., CartPole.,., Asynchronous method, Asynchronous method, 2,., 2,,., Asynchronous method, CartPole.,.,.,,,,.,,,.,. Group Report of 2016 SISP - 10 - Group Number 14-B
4.2.2 Deep Q-Network Deep Q-Network, TORCS,., Deep Q-Network., 1 ATARI.,,., Deep Q-Network, OpenAI gym CartPole., TORCS,. CartPole. CartPole, TORCS, TORCS., Deep Q-Network, TORCS,,. 4.2.3 Long Short Term Memory Long Short Term Memory, CartPole, TORCS.,, LSTM. CartPole, (Cart) (Pole). CartPole,,.,,,.,,,., TORCS, TORCS.,.,,,., TORCS,. TORCS,,. Group Report of 2016 SISP - 11 - Group Number 14-B
4.3,. 4.3.1 (, Python, Asynchronous method ) (1),,,. (2) Python Unity MessagePack, WebSocket,. (3), Python,,,. (4), OpenAI gym CartPole, 2, 3. (5) Asynchronous method,,. (6),,,.,. Group Report of 2016 SISP - 12 - Group Number 14-B
4.3.2 Python, DQN (1), Python, Unity, Python Unity, Life in Silico( LIS ),,. (2) LIS,,,.. (3) Python 1 1, Fixed Target Q- network,. (4), CartPole Deep Q-Network Experience Replay Fixed Target Q-network. (5) DQN,. (6) CartPole,. 4.3.3 (Python, LSTM ) (1). (2) Chainer,. (3) Experience Replay,,. (4) gym-torcs Ubuntu. (5) RNN, LSTM. (6) RNN, LSTM, CartPole, TORCS. (7),,,,. Group Report of 2016 SISP - 13 - Group Number 14-B
4.3.4 Unity, DQN (1) Unity, Unity [5]. (2). (3), Slack.,. (4) Blender,,. (5). (6),. (7) gym-torcs Ubuntu. (8), DQN. (9),. (10),,, DQN. 4.3.5 Unity, LSTM (1),.,. (2), 4. (3) RNN, LSTM,. (4),,. Group Report of 2016 SISP - 14 - Group Number 14-B
4.3.6 Unity, Asynchronous method (1) Unity,,. (2), WebSocket Python, 6. (3),. (4) Python Unity Unity WebSocket. (5),. (6). Group Report of 2016 SISP - 15 - Group Number 14-B
5,. 5.1 Asynchronous method,,.,,.,,., θ θ., (1)θ θ, (2)θ θ dθ, (3)dθ θ,.,, RMSprop 2,. 5.2 Deep Q-Network Deep Q-Network Experience Replay, Fixed Target Q-Network, Clipping 3. Experience Replay,,,,.,.,.. Fixed Target Q-Network, target. TD target θ, θ., target θ, θ. Clipping,, 1, -1.,. Group Report of 2016 SISP - 16 - Group Number 14-B
5.3 Long Short Term Memory Long Short Term Memory. 3. (1),, (2), (3).,,., LSTM,. Group Report of 2016 SISP - 17 - Group Number 14-B
6 B, 2.2. 6.1. 6.1:, Unity 3D Blender 3D GitHub, Python C# Unity Chainer Group Report of 2016 SISP - 18 - Group Number 14-B
7, Unity. Unity C#. C# II. II, Java., C#.,, Python.,,. I II.,,.,. Group Report of 2016 SISP - 19 - Group Number 14-B
8 8.1 3 TORCS,., Asynchronous method,.,, 1 1 37.36 1 37.27. 8.2 Unity, Python. 8.2.1 Unity Unity. Unity.,. Blender Unity (400m ). 8.1 Unity, 8.2 Blender.,. Python. Unity,. Unity Python. Python. Group Report of 2016 SISP - 20 - Group Number 14-B
8.1: Unity 8.2: 8.2.2 Python Python.,. Unity.. LSTM. Group Report of 2016 SISP - 21 - Group Number 14-B
8.3,, 3,. 8.3.1 Asynchronous method Asynchronous one step Q-Learning, [6]. TORCS. Asynchronous method, TORCS. CartPole Asynchronous method. OU process... 8.3.2 Deep Q-Network TORCS. GPU CPU. Excel. TORCS Experience Replay. TORCS Fixed Target Q-Network.. 8.3.3 Long Short Term Memory TORCS,,. LSTM, TORCS LSTM.,.,. Group Report of 2016 SISP - 22 - Group Number 14-B
8.4,.. 8.4.1 69, 8.1. 8.1: 1 0 0 2 0 0 3 2 1 4 2 2 5 5 3 6 12 7 7 16 16 8 15 20 9 7 9 10 3 5 7 6 7. 032 7. 476,.......,. Group Report of 2016 SISP - 23 - Group Number 14-B
8.4.2 76, 8.2. 8.2: 1 0 0 2 0 0 3 0 0 4 1 0 5 1 5 6 12 7 7 19 11 8 26 25 9 12 19 10 5 9 0 0 7.631 7.960 +0.599 +0.484.......,,..... Group Report of 2016 SISP - 24 - Group Number 14-B
,.,,.,,. 8.5 B,,,.. 8.5.1, 8.3. 1 5 5. 8.3: 4,. 3. 3,. 4.,. 3. Group Report of 2016 SISP - 25 - Group Number 14-B
8.5.2, 8.4. 1 5 5. 8.4: 4,. 4. 5 3,,., 5. 5 AI TORCS, ( ).,,,.,,, 5. 5,,.,,,.,,.,, 5. 4, 4. Group Report of 2016 SISP - 26 - Group Number 14-B
8.6,.. 8.6.1 8.5. 8.5: Python Unity,,.,. Unity,. Unity. Unity..,.,.,,.,. Group Report of 2016 SISP - 27 - Group Number 14-B
8.6.2 8.6. 8.6:,.,. Deep Q-Network, TORCS, TORCS.,,.,,.,,, AI,.,,.,.,. CartPole DQN TORCS.,,. Group Report of 2016 SISP - 28 - Group Number 14-B
9 9.1,. 9.1.1 Python, Unity,.,,. 9.1.2. (Python, ) Python-Unity,, Python LSTM DQN Clipping.,,., LSTM,,. (Python ) LIS,, Python DQN Fixed Target Q-Network., LIS., Fixed Target Q-Network,. (Python ) Python DQN Experience Replay.,,. (Unity ), 4., Group Report of 2016 SISP - 29 - Group Number 14-B
4.,. (Unity ) Unity,, Python.,,. (Unity ), Unity.,,,,. 9.2,. 9.2.1 3., TORCS. Asynchronous method, TORCS., 2.1. Group Report of 2016 SISP - 30 - Group Number 14-B
9.2.2. (Asynchronous method, ) PC GPU, Asynchronous method,,.,. (Deep Q-Network ) CartPole Deep Q-Network Experience Replay, Fixed Target Q-network Clipping., TORCS Experience Replay Fixed Target Q- network., CartPole, TORCS. (Long Short Term Memory ) RNN, LSTM,. TORCS,,.. (Long Short Term Memory ), LSTM. LSTM,,,,,. (Asynchronous method ),,.,. (Deep Q-Network ) Ubuntu TORCS. Group Report of 2016 SISP - 31 - Group Number 14-B
9.2.3,.,.,,, 3.,, TORCS. TORCS.,,..,.,,., TORCS,.,..,,,.,. Group Report of 2016 SISP - 32 - Group Number 14-B
[1] All results, http://imagenet.org/challenges/lsvrc/2012/results.html.(2016/07/15 ) [2] Le Q, Ranzato M, Monga R, Devin M, Chen K, Corrado G, Dean J, and Ng A, Building high-level features using large scale unsupervised learning, In ICML, 2012. [3] nikkei BP net, 2016 3 31, AI http://www.nikkeibp.co.jp/atcl/matome/15/325410/032800202/(2016/07/15 ) [4],,, 2015. [5], Unity5 3D/2D,, 2015. [6] Volodymyr Mnih, Adri Puigdomnech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu, Asynchronous Methods for Deep Reinforcement Learning, In ICML, 2016. [7] Simon O Haykin, Neural Networks and Learning Machines, Pearson, 2008. [8] Yann LeCun, Leon Bottou, Genevieve B Orr, Klaus Robert Mller, Efficient BackProp, Springer Berlin Heidelberg, 2002. [9] Richard S Sutton, Andrew G Barto,,,,, 2000. [10] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller, Playing Atari with Deep Reinforcement Learning, NIPS Deep Learning Workshop 2013, 2013. [11] Daniele Loiacono, Luigi Cardamone, Pier Luca Lanzi, Simulated Car Racing Championship: Competition Software Manual, 2013. [12] Ćirović Velimir, Braking torque control using recurrent neural networks, In Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering 226(6), May 2012. [13] Sepp Hochreiter, Jrgen Schmidhuber, Long short-term memory, In NEURAL COMPU- TATION 1997. Group Report of 2016 SISP - 33 - Group Number 14-B