2015/9 Vol. J98 D No. 9 Shidara [7] t s t V (s t)=e[r t+1 + γr t+2 + γ 2 r t+3 + ] (1) r t t E γ 0 1 V (s t) TD V new(s t 1) V

Similar documents
21 David Marr Marr Marr Marr

{x 1 -x 4, x 2 -x 5, x 3 -x 6 }={X, Y, Z} {X, Y, Z} EEC EIC Freeman (4) ANN Artificial Neural Network ANN Freeman mesoscopicscale 2.2 {X, Y, Z} X a (t

GJG160842_O.QXD

COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

1, 2, 2, 2, 2 Recovery Motion Learning for Single-Armed Mobile Robot in Drive System s Fault Tauku ITO 1, Hitoshi KONO 2, Yusuke TAMURA 2, Atsushi YAM

Fig. 2 Signal plane divided into cell of DWT Fig. 1 Schematic diagram for the monitoring system

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

山梨大学医科学雑誌23-2

( ) fnirs ( ) An analysis of the brain activity during playing video games: comparing master with not master Shingo Hattahara, 1 Nobuto Fuji

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

Fig. 1 Relative delay coding.

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

a) Extraction of Similarities and Differences in Human Behavior Using Singular Value Decomposition Kenichi MISHIMA, Sayaka KANATA, Hiroaki NAKANISHI a

MDD PBL ET 9) 2) ET ET 2.2 2), 1 2 5) MDD PBL PBL MDD MDD MDD 10) MDD Executable UML 11) Executable UML MDD Executable UML

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

Study of the "Vortex of Naruto" through multilevel remote sensing. Abstract Hydrodynamic characteristics of the "Vortex of Naruto" were investigated b

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

兵庫県立大学学報vol.17

IPSJ SIG Technical Report Vol.2009-DPS-141 No.23 Vol.2009-GN-73 No.23 Vol.2009-EIP-46 No /11/27 t-room t-room 2 Development of

Mizuki Kaneda and Naoyuki Osaka (Kyoto University) The Japanese Journal of Psychology 2007, Vol. 78, No. 3, pp


Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

PFI

202

Vol.-ICS-6 No.3 /3/8 Input.8.6 y.4 Fig....5 receptive field x 3 w x y Machband w(x =

130 Oct Radial Basis Function RBF Efficient Market Hypothesis Fama ) 4) 1 Fig. 1 Utility function. 2 Fig. 2 Value function. (1) (2)

2.R R R R Pan-Tompkins(PT) [8] R 2 SQRS[9] PT Q R WQRS[10] Quad Level Vector(QLV)[11] QRS R Continuous Wavelet Transform(CWT)[12] Mexican hat 4

The Japanese Journal of Psychology 1974, Vol. 44, No. 6, AN ANALYSIS OF WORD ATTRIBUTES IMAGERY, CONCRETENESS, MEANINGFULNESS AND EASE OF LEAR

CONTENTS N T

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

知能と情報, Vol. 21, No. 1, pp

( )



2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

23_02.dvi

main.dvi

SICE東北支部研究集会資料(2012年)

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

Study on Throw Accuracy for Baseball Pitching Machine with Roller (Study of Seam of Ball and Roller) Shinobu SAKAI*5, Juhachi ODA, Kengo KAWATA and Yu


9_18.dvi

fiš„v2.dvi

indd

modelingEffectOfTaskAndGraphicalRepresentation.PDF

01.Œk’ì/“²fi¡*

5b_08.dvi

SICE東北支部研究集会資料(2017年)

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

Preliminary Version Manning et al. (1986) Rand Health Insurance Experiment Manning et al. (1986) 3 Medicare Me

DEIM Forum 2012 E Web Extracting Modification of Objec

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L


3.1 Thalmic Lab Myo * Bluetooth PC Myo 8 RMS RMS t RMS(t) i (i = 1, 2,, 8) 8 SVM libsvm *2 ν-svm 1 Myo 2 8 RMS 3.2 Myo (Root

[1] AI [2] Pac-Man Ms. Pac-Man Ms. Pac-Man Pac-Man Ms. Pac-Man IEEE AI Ms. Pac-Man AI [3] AI 2011 UCT[4] [5] 58,990 Ms. Pac-Man AI Ms. Pac-Man 921,360

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

日歯雑誌(H19・5月号)済/P6‐16 クリニカル  柿木 5

越智59.qxd


ばらつき抑制のための確率最適制御

081116ヨコ/妹尾江里子 199号

3_23.dvi

COE-RES Discussion Paper Series Center of Excellence Project The Normative Evaluation and Social Choice of Contemporary Economic Systems Graduate Scho

光学

Izard 10 [1]Plutchik 8 [2] [3] Izard Neviarouskaya [4][5] 2.2 Hao [6] 1 Twitter[a] a) Shook Wikipedia

22 Google Trends Estimation of Stock Dealing Timing using Google Trends

人工知能学会研究会資料 SIG-FPAI-B Predicting stock returns based on the time lag in information diffusion through supply chain networks 1 1 Yukinobu HA

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

DEIM Forum 2009 B4-6, Str

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

2 251 Barrera, 1986; Barrera, e.g., Gottlieb, 1985 Wethington & Kessler 1986 r Cohen & Wills,

Vol. 43 No. 2 Feb. 2002,, MIDI A Probabilistic-model-based Quantization Method for Estimating the Position of Onset Time in a Score Masatoshi Hamanaka

IPSJ SIG Technical Report Vol.2014-CE-126 No /10/11 1,a) Kinect Support System for Romaji Learning through Exercise Abstract: Educatio

bosai-2002.dvi

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

08医療情報学22_1_水流final.PDF


2 The Characteristics of Two Negative Peaks on Visual Evoked Potentials with Depth Perception Yoichi MIYAWAKI, Yasuyuki YANAGIDA, Taro MAEDA, and Susu

1604きらり_P01

Vol.8 No (July 2015) 2/ [3] stratification / *1 2 J-REIT *2 *1 *2 J-REIT % J-REIT J-REIT 6 J-REIT J-REIT 10 J-REIT *3 J-

: u i = (2) x i Smagorinsky τ ij τ [3] ij u i u j u i u j = 2ν SGS S ij, (3) ν SGS = (C s ) 2 S (4) x i a u i ρ p P T u ν τ ij S c ν SGS S csgs

Library and Information Science No


IPSJ SIG Technical Report Vol.2011-UBI-30 No /5/ , 1 1 Evaluation on Effect of Presenting False Information for Biological Information Vi

丸田忠雄.ec6

* A Consideration of Motor Skill Learning Brain Pathway Shift and Memory Consolidation Keiko HASHIMOTO * In motor skill learning, it is known

Łñ“’‘‚2004

プリント


スケート夏6 (2).pdf

johnny-paper2nd.dvi

<31302D8EC091488CA48B862D8E E7190E690B691BC2D3296BC976C2E706466>

日本統計学会誌, 第44巻, 第2号, 251頁-270頁

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

Transcription:

a) b) Modeling the Function of the Ventral Striatum in Reinforcement Learning Based on the Analysis of Neuronal Activity Masanari SHINOTSUKA a), Masahiko MORITA b), and Munetaka SHIDARA TD striosome striosome 1. [1] Schultz TD Graduate School of Systems and Information Engineering, University of Tsukuba, 1 1 1 Tennodai, Tsukuba-shi, 305 8573 Japan Faculty of Engineering, Information and Systems, University of Tsukuba, 1 1 1 Tennodai, Tsukuba-shi, 305 8573 Japan Faculty of Medicine, University of Tsukuba, 1 1 1 Tennodai, Tsukuba-shi, 305 8577 Japan a) E-mail: m.shinotsuka2@gmail.com b) E-mail: mor@bcl.esys.tsukuba.ac.jp DOI:10.14923/transinfj.2014JDP7137 [2] Barto [3] Doya [4] striosome V (s) striosome striosome [5], [6] D Vol. J98 D No. 9 pp. 1277 1287 c 2015 1277

2015/9 Vol. J98 D No. 9 Shidara [7] 2. 2. 1 2. 1. 1 t s t V (s t)=e[r t+1 + γr t+2 + γ 2 r t+3 + ] (1) r t t E γ 0 1 V (s t) 2. 1. 2 TD V new(s t 1) V old (s t 1)+αδ t 1 (2) δ t 1 δ t 1 = r t + γv (s t) V (s t 1) (3) TD temporal differencetd 1 Fig. 1 Neural circuits of the basal ganglia. (1) (3) 0 TD 2. 1. 3 basal ganglia cerebral cortex 1 (striatum) striosome matrix striosome (DA cell) matrix (internal segment of globus pallidus, GPi) (substantia nigra pars reticulata, SNr) GPi/SNr (thalamus) 2. 1. 4 Schultz [2] 1278

TD 2. 1. 5 striosome matrix striosome matrix striosome [8] [9] Hebb TD 2. 1. 6 matrix striosome actor critic Barto [3] matrix Q(s, a) Doya [4] striosome 2 striosome s t V (s t) striosome TD striosome TD striosome 2 Fig. 2 Structure common to conventional reinforcement learning models of the basal ganglia. striosome striosome [10], [11] Cromwell [11] Shidara [7] [12] Goldstein [5] Kim [6] 1279

2015/9 Vol. J98 D No. 9 striosome 2. 2 Shidara [7] 2. 2. 1 3A Wait Go OK 1 1 3 3B 1 3 1 2 3 1/2 2 1 2/3 3 2 3B 3C 1/1, 1/2, 2/2, 1/3, 2/3, 3/3 1/1, 2/2, 3/3 1 1/2 1/2, 1/3, 2/3 1/6 1/2 2. 2. 2 100 200 100 200 1 5 3 Shidara [7] Fig. 3 Multiple trial reward schedule task (adapted from Shidara et al. [7]). 1 Shidara [7] Table 1 Response in the cue condition (adapted from Shidara et al. [7]). 1/3 1/2 2/3 3/3 2/2 1/1 n (1) 16 (2) 13 (3) 6 (4) 3 (5) 3 1280

[12] 3. 1 (1) (2) (1) (2) 2/3, 3/3, 2/2 1/3, 1/2, 1/1 Shidara 3. 1 3. 2. 1 26 σ =10 4 400 200 ms 200 ms 1000 ms 0 90% 3. 2 3. 2. 1 5 26 4 Fig. 4 Response period. 5 Fig. 5 Histogram of the response onset time. 3. 1 0 14/26 [13], [14] 100 ms 0 100 ms 8 3. 2. 2 2 3 3 2 2 1 1281

2015/9 Vol. J98 D No. 9 6 Fig. 6 Classification diagram of history dependence for the ventral striatum neurons. 1 1/2 1/3 2/3 2 26 22 5% 11 11 6 n = n 1 (1) (5) - 3. 3 1 (1) (2) 15 12 (1) 8 (2) 7 (3) (5) 8 100 ms Shidara striosome 4. 4. 1 2 1 1282

Fig. 7 7 Structure of the proposed model. 8 Fig. 8 Network output to the test sequence. Elman [15] 7 1 1 4. 2 4. 2. 1 t cue t cue t r t+1 1 1/2 1/3 2/3 1 0 50 1 Elman TD δ t 1 = r t + γo t O t 1 (4) 0 O t t r t 0 1 TD r t O t 1 200 γ 0.3 2 200 10 4. 2. 2 8 10 1/2 2/2 1/3 2/3 3/3 3. 2. 2 2 3 3 6 9 1283

2015/9 Vol. J98 D No. 9 10 10 4 10 (a) 2 F (1, 190) = 34.1 p <0.01; 2 F (1, 190) = 19.1 p <0.01 9 Fig. 9 Classification diagram of history dependence for the middle elements of the model. 2 2 10 (b) F (1, 190) = 4.99 p<0.05 1 1 10 11 (a): F (1, 145) = 15.9 p<0.01; 2 F (1, 145) = 4.21 p <0.05, (b): F (1, 227) = 4.36 p <0.053. 2. 2 4. 3 Fig. 10 10 Example of the response of middle elements to a random sequence. Fig. 11 11 Example of the response of ventral striatum neurons in the random condition. 1284

Fig. 12 12 Correspondence of the proposed model to the brain structure. 13 Fig. 13 State values estimated from the internal state. 12 13 F (5, 193) = 3.53 p <0.01 2/2 3/3 2/3 3/3 vs 2/3 t(70) = 4.41 p <0.01 2/2 vs 2/3 t(60) = 2.6 p<0.011/1, 1/2, 1/3 [10], [11] 1 1 V V 12 1285

2015/9 Vol. J98 D No. 9 [12] TD V V Q matrix [16] 5. 2 4. 3 TD 1 1 TD 17022052 (B) 22300079, 1286

22300138, 25282246 [1] R.S. Sutton and A.G. Barto, Reinforcement Learning, MIT Press, 1998. [2] W. Schultz, P. Dayan, and P.R. Montague, A neural substrate of prediction and reward, Science, vol.275, pp.1593 1599, 1997 [3] A.G. Barto, Adaptive critics and the basal ganglia, in Models of Information Processing in the Basal Ganglia, ed. J.C. Houk, J.L. Davis, and D.G. Beiser, pp.215 232, MIT Press, 1995. [4] K. Doya, Complementary roles of basal ganglia and cerebellum in learning and motor control, Current Opinion in Neurobiology, vol.10, no.6, pp.732 739, 2000. [5] B.L. Goldstein, B.R. Barnett, G. Vasquez, S.C. Tobia, V. Kashtelyan, A.C. Burton, D.W. Bryden, and M.R. Roesch, Ventral striatum encodes past and predicted value independent of motor contingencies, Journal of Neuroscience, vol.32, pp.2027 2036, 2012. [6] Y.B. Kim, N. Huh, H. Lee, E.H. Baeg, D. Lee, and M.W. Jung, Encoding of action history in the rat ventral striatum, J. Neurophysiology, vol.98, pp.3548 3556, 2007. [7] M. Shidara, T.G. Aiger, and B.J. Richmond, Neuronal signals in the monkey ventral striatum related to progress through a predictable series of trials, J. Neuroscience, vol.18, pp.2613 2625, 1998. [8] C.R. Gerfen, The neostriatal mosaic: Multiple levels of compartmemtal organization in the basal ganglia, Annual Review of Neuroscience, vol.15, pp.285 320, 1992. [9] J.N.J. Reynolds, B.I. Hyland, and J.R. Wickens, A cellular mechanism of reward-related learning, Nature, vol.413, pp.67 70, 2001. [10] W. Schultz, P. Apicella, E. Scarnati, and T. Ljungberg, Neuronal activity in monkey ventral striatum related to the expectation of reward, Journal of Neuroscience, vol.12, pp.4595 4610, 1992. [11] H.C. Cromwell and W. Schultz, Effects of expectations for different reward magnitudes on neuronal activity in primate striatum, J. Neurophysiology, vol.89, pp.2823 2838, 2003. [12] vol.25, no.4, pp.167 171, 2001. [13] Z. Liu and B.J. Richmond, Response differences in monkey TE and perirhinal cortex: Stimulus association related to reward schedules, J. Neurophysiology, vol.83, pp.1677 1692, 2000. [14] Y. Naya, M. Yoshida, and Y. Miyashita, Forward processing of long-term associative memory in monkey inferotemporal cortex, J. Neuroscience, vol.23, pp.2861 2871, 2003. [15] J.L. Elman, Finding structure in time, Cognitive Science, vol.14, pp.179 211, 1990. [16] Y. Sawatsubashi, M.F.B. Samusudin, and K. Shibata, Emergence of discrete and abstract state representation in continuous input task through reinforcement learning, Advances in Intelligent Systems and Computing, vol.208, pp.13 22, 2013. 26 11 11 27 3 17 6 2 26 61 3 4 19 5 6 11 59 61 2 2 13 17 6 1287