Deep Learning Deep Learning GPU GPU FPGA %

Similar documents
2017 (413812)

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

GPGPU

161 J 1 J 1997 FC 1998 J J J J J2 J1 J2 J1 J2 J1 J J1 J1 J J 2011 FIFA 2012 J 40 56

「産業上利用することができる発明」の審査の運用指針(案)

4.1 % 7.5 %

kiyo5_1-masuzawa.indd

kut-paper-template.dvi

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

21 Quantum calculator simulator based on reversible operation

29 jjencode JavaScript

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i

it-ken_open.key

2016 [1][2] H.264/AVC HEVC HEVC

29 Short-time prediction of time series data for binary option trade

MQTT V3.1 プロトコル仕様

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

23 A Comparison of Flick and Ring Document Scrolling in Touch-based Mobile Phones

alternating current component and two transient components. Both transient components are direct currents at starting of the motor and are sinusoidal

ABSTRACT The movement to increase the adult literacy rate in Nepal has been growing since democratization in In recent years, about 300,000 peop

24_ChenGuang_final.indd

1 [1, 2, 3, 4, 5, 8, 9, 10, 12, 15] The Boston Public Schools system, BPS (Deferred Acceptance system, DA) (Top Trading Cycles system, TTC) cf. [13] [

soturon.dvi

23 The Study of support narrowing down goods on electronic commerce sites

udc-2.dvi

,,,,., C Java,,.,,.,., ,,.,, i

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

2 ( ) i

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

クレジットカードの利用に関する一考察―JGSS-2005の分析から―

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

II A LexisNexis JP 80, /03/

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

25 Removal of the fricative sounds that occur in the electronic stethoscope

76_01ver3.p65

16.16%

SPSS

IT i

A Study on Practical Use of Artificial Intelligence. The purpose of this research paper is to demonstrate the ease of using artificial intelligence in

Kochi University of Technology Aca Title 環境分野への深層学習応用研究の立ち上げについて Author(s) 中根, 英昭, 若槻, 祐貴 Citation 高知工科大学紀要, 15(1): Date of issue U


17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

1 1 tf-idf tf-idf i

P2P Web Proxy P2P Web Proxy P2P P2P Web Proxy P2P Web Proxy Web P2P WebProxy i

untitled


n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

Sobel Canny i

特-11.indd

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth

untitled

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

修士論文

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

2

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

Sport and the Media: The Close Relationship between Sport and Broadcasting SUDO, Haruo1) Abstract This report tries to demonstrate the relationship be

WebRTC P2P Web Proxy P2P Web Proxy WebRTC WebRTC Web, HTTP, WebRTC, P2P i

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

paper.dvi

<95DB8C9288E397C389C88A E696E6462>

〈論文〉興行データベースから「古典芸能」の定義を考える

橡最新卒論

〈論文〉組織改革の成果に関する予備的調査--社内カンパニー制導入が財務的業績に与える影響

社会学部紀要 114号☆/22.松村

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

FA

Web Web Web Web i

:... a

16_.....E...._.I.v2006

WASEDA RILAS JOURNAL

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

12 DCT A Data-Driven Implementation of Shape Adaptive DCT

1..FEM FEM 3. 4.

2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name biblive : Recording and sharing experienc

Mikio Yamamoto: Dynamical Measurement of the E-effect in Iron-Cobalt Alloys. The AE-effect (change in Young's modulus of elasticity with magnetization

生活設計レジメ

44 4 I (1) ( ) (10 15 ) ( 17 ) ( 3 1 ) (2)

I II III 28 29


1 2 3

<836F F312E706466>

23 Study on Generation of Sudoku Problems with Fewer Clues

) ,

(43) Vol.33, No.6(1977) T-239 MUTUAL DIFFUSION AND CHANGE OF THE FINE STRUCTURE OF WET SPUN ANTI-PILLING ACRYLIC FIBER DURING COAGULATION, DRAWING AND


Web Web Web Web 1 1,,,,,, Web, Web - i -

28 Horizontal angle correction using straight line detection in an equirectangular image

10-渡部芳栄.indd

2 10 The Bulletin of Meiji University of Integrative Medicine 1,2 II 1 Web PubMed elbow pain baseball elbow little leaguer s elbow acupun

28 TCG SURF Card recognition using SURF in TCG play video

人文地理62巻4号

A5 PDF.pwd

,.,.,,.,. X Y..,,., [1].,,,.,,.. HCI,,,,,,, i


Transcription:

2016 (412825)

Deep Learning Deep Learning GPU GPU FPGA 16 1 16 69%

Abstract Recognition by DeepLearning attracts attention, because of its high recognition accuracy. Lots of learning is necessary for Deep Learning, and GPU which can parallel process a large amount of data to learn it fast is used. However, GPU can process only floating point arithmetic and has a problem that is a large power consumption and high latency. Therefore, in recent years, the dedicated hardware which used FPGA which can process the fixed point arithmetic that low power consumption and highspeed processing are possible than floating point arithmetic is studied. Because a multiplier for most of the configuration gates with this fixedpoint-based hardware, the gate scale increase in proportion to the product of the bit width of a multiplier and multiplicand. Therefore, the reduction of the hardware scale is enabled if make the bit width of a multiplier and multiplicand into necessity minimum. Actually, under conditions of the bit width fixation of each layer, there is a study succeeding for reduction to 16 bits. In this study, aiming at reduced and high-speed hardware, make the operation bit width into a necessity minimum in every layer of the neural network statically and dynamically. In Convolutional Neural Network, it is shown that reduce the multiplier scale 69% in comparison with the conventional technique that operation bit width was 16 bits in all layers.

1 1 1.1.............................. 1 1.2............................ 2 2 Deep Learning 3 2.1....................... 4 2.2 Multi-Layer Perceptron........ 5 2.3 Convolutional Neural Network.............................. 6 2.3.1...................... 6 2.3.2..................... 7 2.4 Deep Learning......................... 8 2.4.1...................... 9 2.4.2..................... 10 3 Deep Learning 13 3.1 GPU....................... 13 3.2 FPGA....................... 14 4 15 4.1..................... 15 4.2..................... 16 4.3.............................. 17 5 19 5.1......................... 19 5.2.............................. 19 5.3.............................. 19 6 22 6.1............................. 22 23 23 i

A 25 A.1 MNIST................. 25 A.2 CIFAR-10................. 28 B 32 ii

2.1....................... 4 2.2..................... 5 2.3........................ 6 2.4....................... 7 2.5............... 10 2.6............... 10 4.7.................. 15 4.8......... 15 4.9.................. 16 4.10......... 16 iii

4.1 MNIST........................ 18 4.2 CIFAR-10....................... 18 5.3 MNIST.......................... 20 5.4 CIFAR-10........................ 20 iv

1 1.1 NN NN 1940 RBM NN NN Deep Learning Deep Learning Deep Learning GPU FPGA GPU FPGA 1

1.2 GPU FPGA NN 16 [1] NN Multi-Layer Perceptron MLP Convolutional Neural Network CNN 2

2 Deep Learning NN 1940 1980 NN 90 3 3 NN 2006 NN Deep Learning 2012 ILSVRC Deep Learning NN 3

2.1 2.1: 2.1 y = f( i w i x i + b) (1) f(x) = 1 1 + exp( x) (2) f(x) = tanh(x) (3) f(x) = max(0, x) (4) 4

2.2 Multi-Layer Perceptron 2.2: MLP CNN MLP 2.2 k i h k i h k i = f(b k i + w kt i h k 1 ) (5) softmax p i = softmax i (w i x i + b i ) = exp(w ix i + b i ) j exp(w j x j + b j ) (6) 5

2.3 Convolutional Neural Network CNN softmax 2.3.1 2.3: 6

2.3 n x n y n w n w n = n x n w + 1 n y = n y n w + 1 2.3.2 2.4: 2.4 Lp h i = 1 P i 7 j P i h j (7)

Lp h i = ( 1 P i h i = max j P i h j (8) h P j ) 1 P (9) j P i (7) k 1 P i k i (8) k 1 P i k i (9) Lp 2.4 Deep Learning Deep Learning NN Deep Learning C = i d i log p i (10) d i 1 1 0 C 8

2.4.1 C w ij w ij w ij + w ij = ϵ C w ij (11) ϵ 9

2.5: 2.6: 2.4.2 C C w ij C (10) 2.5 NN 10

w ij C w ij = (p i d i )h j (12) f(f(f )) 2.6 3 l, i, j i x i = j w ij h j (13) 2.6 w ij C w ij C = C x i (14) w ij x i w ij 1 δ i C x i 2 x i = j w ij h j x i w ij = h j (15) l x l x l = i w li h i = i w li f(x i ) x i C x l (l = 1, 2,...) 11

δ i δ i = C x i = l C x l x l x i (16) 1 δ l C x l 2 x l = i w li f(x i ) x l x i = f (x i )w li (17) δ i = f (x i ) l δ l w li (18) 2.6 δ l δ i 2.6 δ l δ l = p l d l (18) δ i δ i (14) C w ij C w ij = δ i h j (19) δ i δ i 12

3 Deep Learning Deep Learning GPU FPGA FPGA GPU FPGA 3.1 GPU GPU NN GPU GPU GPU NN 13

3.2 FPGA GPU 14

4 4.1 4.7: 10 static 9 8 7 6 error 5 4 3 2 1 0 20 40 60 80 100 epoch 4.8: 16 [1] NN NN 15

4.7 NN 4.8 4.2 4.9: 12 dynamic 10 8 error 6 4 2 0 0 20 40 60 80 100 epoch 4.10: 16

4.9 NN 5epoch 4.10 4.3 Python Theano MNIST CIFAR-10 Deep Learning Tutorials[2] GPU NN 4.1 4.2 CONV POOL FULLY SOFTMAX M i, R i, C i, M o, R o, C o K r, K c 17

K r, K c, R i, C i, R o, C o 4.1: MNIST MNIST M i, R i, C i K r, K c M o, R o, C o INPUT - - 1,28,28 CONV 1,28,28 5,5 20,24,24 POOL 20,24,24 2,2 20,12,12 CONV 20,12,12 5,5 50,8,8 POOL 50,8,8 2,2 50,4,4 FULLY 800,-,- - 500,-,- SOFTMAX 500,-,- - 10,-,- 4.2: CIFAR-10 CIFAR-10 M i, R i, C i M o, R o, C o INPUT - 3,32,32 FULLY 3072,-,- 1000,-,- FULLY 1000,-,- 500,-,- SOFTMAX 500,-,- 10,-,- 18

5 5.1 MNIST CIFAR-10 2 MNIST[3] 0 9 28*28 70000 CIFAR-10[4] 10 32*32 60000 5.2 4.1 4.2 5.3 5.4 DATE 6 2 32 5.3 5.3 CNN DATA 6 5.4 MLP 19

5.3: MNIST CONV(bit) 32 16 12 16 12 12 16 12 16 12 16 FULLY(bit) 32 16 12 16 12 16 12 12 16 12 16 SOFTMAX(bit) 32 16 12 12 16 12 16 12 16 12 DATA(bit) 32 16 16 16 16 16 6 6 (bit) 32 16 15.2 15.2 15.2 14 11 9.0 (%) 99.0 98.9 98.8 98.7 98.4 98.4 98.9 98.7 5.4: CIFAR-10 FULLY(bit) 32 16 12 16 12 16 12 16 SOFTMAX(bit) 32 16 12 16 12 12 16 16 DATA(bit) 32 16 16 16 16 6 (bit) 32 16 15.5 15.5 14.0 11 (%) 56.1 55.6 54.6 54.2 46.8 22.4 CNN CNN 9 16 69% 20

4 16 3 12 1 FPGA 21

6 6.1 Deep Learning FPGA CNN MNIST 9 FPGA 22

[1] S.Fupta A.Agrawal K.Gopalakrishnan and P.Narayanan Deep Learning with Limited Numerical Precision ICML-15 pp.1737 1746 Feb 2015 [2] Deep Learning Deep Learning Tutorials http://deeplearning.net/tutorial/ 2016 1 29 [3] MNIST handwritten digit database THE MNIST DATABASE of handwritten digits http://http://yann.lecun.com/exdb/mnist/ 2016 2 29 23

[4] CIFAR-10 and CIFAR-100 datasets The CIFAR-10 dataset https://www.cs.toronto.edu/ kriz/cifar.html 2016 2 29 24

A A.1 MNIST ##################### # ###################### train_layer0 = theano.function( [index], layer0.output, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], }) train_layer1 = theano.function( [layer1_input], layer1.output.flatten(2) ) train_layer2 = theano.function( [layer2_input], layer2.output, 25

) train_layer3 = theano.function( [index, layer3_input], layer3.negative_log_likelihood(y), updates=updates, givens={ x: train_set_x[index * batch_size: (index + 1) * batch_size], y: train_set_y[index * batch_size: (index + 1) * batch_size] }) ################### # ################### for i in xrange(8): #softmax if i == 0: params[i].set_value(np.round(params[i].get_value(), softmax_digit)) #ReLu elif i == 2: 26

params[i].set_value(np.round(params[i].get_value(), relu_digit)) #conv elif i == 4 or i == 6: params[i].set_value(np.round(params[i].get_value(),conv_digit)) #################### # #################### if this_validation_loss >= best_validation_loss: count += 1 print "best_error %.3f, this_error %.3f, count %d"%(best_validation else: count = 0 print "best_error %.3f, this_error %.3f, count %d, best_error updat if count == 5: conv_digit = 3 relu_digit = 3 softmax_digit = 3 27

print "digit updated" A.2 CIFAR-10 ####################### # ####################### this_error = 1 - erv if this_error >= best_error: count += 1 print "best_error %.3f, this_error %.3f, count %d"%(best_error, this_er else: count = 0 print "best_error %.3f, this_error %.3f, count %d,best_error update"%( best_error = this_error if count == 5: softmax_digit = 4 relu_digit = 4 print "softmax_digit,relu_digit = 4" 28

##################### # ##################### for i, layer in enumerate(mlp.layers): #softmax if i == 2: layer.w.set_value( np.round( layer.w.get_value(), softmax_digit ) ) layer.b.set_value( np.round( layer.b.get_value(), softmax_digit ) ) #ReLu else: layer.w.set_value( np.round( layer.w.get_value(), relu_digit ) ) layer.b.set_value( np.round( layer.b.get_value(), relu_digit ) ) ##################### # ##################### def training( self, XL, tl, data_digit ): 29

X = T.dmatrix( X ) t = T.dmatrix( t ) Y0, Z0 = self.layers[0].output( X ) Y1, Z1 = self.layers[1].output( Z0 ) Y2, Z2 = self.layers[2].output( Z1 ) cost = T.mean( _T_cost( Z2, t ) ) updateslist = [] for layer in self.layers: gradw = T.grad( cost, layer.w ) Wnew = layer.w - 0.1 * gradw updateslist.append( ( layer.w, Wnew ) ) if layer.withbias: gradb = T.grad( cost, layer.b ) bnew = layer.b - 0.1 * gradb 30

updateslist.append( ( layer.b, bnew ) ) train_layer0 = theano.function( [X], Z0 ) train_layer1 = theano.function( [Z0], Z1 ) train_layer2 = theano.function( [Z1, X, t], cost, updates = updateslist output_layer0 = train_layer0( XL ) output_layer0 = np.round( output_layer0, data_digit ) output_layer1 = train_layer1( output_layer0 ) output_layer1 = np.round( output_layer1, data_digit ) cost = train_layer2( output_layer1, XL, tl ) return cost 31

B MNIST CIFAR-10 2 32