2017 (413812) - PDF Free Download

2017 (413812)

Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30

Abstract Multi-layered neural network(nn) has been announeced in 2012 and it can provide a high recognition performance by using Deep Learning. in recent years, many companies and researchers are actively researching application for image recognition, such as vehicle detection, pedestrian recognition, and speech recognition. Especially, Google has been developed the TPU(Tensor Processing Unit) by using ASIC(Application Specific Integrated Circuit), which is a dedicated processor for Deep Learning. The increase of the performance per power consumption for the TPU is ten times that of the conventional method. However, in order to improve the performance cost ratio of this dedicated hardware, it is essential to reduce the number of required transistors while suppressing the computation accuracy to improve the performance per power consumption. In this study, we proposed a method that switch over the operation precision of the multiplier during learning or switch over the operation precision of the multiplier that occupy the majority of constituent gates for each layer of NN. The evaluation results show that the performance of the operation with 12bit are improved by 20 to 30 than the previous study.

1 1 1.1.............................. 1 1.2............................ 2 2 Deep Learning 3 2.1....................... 4 2.2................... 5 2.3.............. 7 2.3.1...................... 8 2.3.2..................... 9 2.4 Deep Learning............. 10 2.4.1...................... 11 2.4.2..................... 11 3 Deep Learinig 15 3.1 GPU....................... 15 3.2 ASIC....................... 16 4 17 4.1............... 17 4.2......... 18 4.3............................ 19 4.4.............................. 20 5 21 5.1............................ 21 5.2.......................... 22 6 24 25 25 A 26 i

B 28 ii

2.1....................... 5 2.2............... 6 2.3............. 7 2.4.............. 7 2.5......................... 9 2.6........................ 10 2.7.................. 14 2.8.................. 14 4.9............... 18 4.10......... 19 iii

5.1 MNIST........................ 21 5.2 MNIST.......................... 23 5.3 ( )............. 23 5.4 ( )....... 23 iv

1 1.1 ( NN) Deep Learning NN 1940 NN NN 2012 ILSVRC2012(ImageScaleVisual RecognitionChallenge) FisherVector Deep Learning 10 2012 Google YouTube 1 Deep Learning Deep Learning 1

Google Facebook,Deep Learning Google Deep Learning TPU(Tensor Processing Unit) [1] TPU Deep Learning ASIC(Application Specific Integrated Circuit: IC) GPU CPU 10 TPU GPU 1.2 Deep Learning NN GPU 3 Deep Learning 2

Deep Learning GPU 8bit Deep Learning GPU FPGA TPU FPGA 16bit [2] 16bit FPGA GPU 2 Deep Learning NN 1940 2 1980 3

NN 2 90 3 3 NN 2006 NN Deep Learning NN NN NN NN 2.1 NN 2.1 y = f( i w i x i + b) 4

x w b f(x) = 1 1 + exp( x) (1) f(x) = tanh(x) (2) f(x) = max(0, x) (3) 2.1: 2.2 (NN) NN 5

1 2 2.2: 1 6

2.3: 2.3 NN(CNN:Convolutional Neural Network) 2.4: CNN 7

softmax 2.3.1 n x n y n w n w n x = n x n w + 1 n y = n y n w + 1 8

2.5: 2.3.2 h i = 1 P i Lp h i = ( 1 P i j P i h j (4) h i = max j P i h j (5) h P j ) 1 P (6) j P i k 1 P i k i k 1 P i k i 9

Lp 2.6: 2.4 Deep Learning Deep Learning NN Deep Learning ( ) C = i d i log p i (7) d i 1 1 0 C 10

2.4.1 w ij w ij w ij + w ij = ϵ C w ij (8) C ϵ 2.4.2 C C w ij C w ij C w ij = (p i d i )h j (9) 11

f(f(f)) 3 l i j i x i = j w ij h j (10) w ij C w ij C = C x i (11) w ij x i w ij 1 δ i C x i 2 x i = j w ij h j x i w ij = h j (12) l x l x l = i w li h i = i w li f(x i ) x i C x l (l = 1 2 ) δ i δ i = C x i = l C x l x l x i (13) 12

1 δ l C x l 2 x l = i w li f(x i ) x l x i = f (x i )w li (14) δ i = f (x i ) l δ l w li (15) δ l δ i δ l δ l = p l d l (15) δ i δ i (11) C w ij C w ij = δ i h j (16) δ i δ i 13

2.7: 2.8: 14

3 Deep Learinig Deep Learning GPU GPU 3.1 GPU GPU NN GPU GPU NN [3] 15

3.2 ASIC ASIC Application Specific Integrated Circuit ASIC GPU Google Apple Google AI AlphaGo Apple ios ASIC GPU ASIC ASIC 16

4 CNN(Convolutional Neural Network) 16bit 12bit 12bit 4.1 NN 16bit NN 17

4.9: 4.2 12 16 5epoch 16 12 3epoch 18

4.10: 4.3 Python GPU Deep Learning Theano Deep Learning Tutorials GPU 1bit 19

4.4 IT IP Deep Learning Deep Learning 12bit 16bit 12bit 16bit 12bit 10 20

5 5.1 MNIST MNIST 0 9 28*28 7000 5.1 MNIST CONV POOL FULLY SOFTMAX RELU M i R i C i M o R o C o K r K c K r K c R i C i R o C o 5.1: MNIST MNIST M i, R i, C i K r, K c M o, R o, C o INPUT - - 1,28,28 CONV 1,28,28 5,5 20,24,24 POOL 20,24,24 2,2 20,12,12 CONV 20,12,12 5,5 50,8,8 POOL 50,8,8 2,2 50,4,4 FULLY 800,-,- - 500,-,- SOFTMAX 500,-,- - 10,-,- 21

5.2 (8 ) 5.2 5.2 12bit 8 5.3 12bit 16bit 1 [4] 5.4. 12bit 10 12bit 16bit GPU 4 16bit 3 12bit 22

5.2: MNIST bit CONV 12 16 RELU 12 16 SOFTMAX 12 CONV 12 16 RELU 12 16 SOFTMAX 12 16 CONV 12 16 RELU 16 SOFTMAX 12 CONV 12 16 RELU 12 SOFTMAX 12 CONV 12 RELU 12 16 SOFTMAX 12 CONV 16 RELU 16 SOFTMAX 16 1.13 1.16 1.26 1.45 1.25 1.05 5.3: ( ) 1 2 3 4 5 6 7 8 12bit 51 30 22 46 39 32 20 28 33.5 16bit 49 70 78 54 61 68 80 72 66.5 5.4: ( ) 1 2 3 4 5 6 7 8 12bit 13 31 15 34 37 17 20 16 22.9 16bit 87 69 85 66 63 83 80 84 77.1 23

6 16bit 12bit 12bit 10 NN CNN NN Deep Learning 24

[1], Google TPU GPU 10, http://itpro.nikkeibp.co. jp/atcl/ncd/14/457163/052001464/ (2016/12/10 ) [2] S.Fupta,et al., Deep Learning with Limited Numerical Precision,ICML-15,pp.1737-1746,2015 [3] NVIDIA, https:// images.nvidia.com/content/apac/events/deep-learning-day-2016 - jp/nvidia-deeplearning-intro.pdf (2017/1/30 ) [4],,,2016 3 25

A ########################### #5 ######################### if this_validation_loss >= best_validation_loss: count += 1 count2 = 0 print "best_error %.3f, this_error %.3f, count %d count12 % if count3 == 1: count16 +=1 if count3 == 0: count12 +=1 else: count = 0 count2 +=1 print "best_error %.3f, this_error %.3f, count %d count12 % if count3 == 1: count16 +=1 26

if count3 == 0: count12 +=1 if count == 5: conv_digit = 4 relu_digit = 4 print "digit updated" count3 = 1 ########################### #3 ######################### if count2 == 3: if count3 == 1: if count4 == 0: conv_digit = 3 relu_digit = 3 print "digit rollback" count3 = 0 count4 = 1 27

############################# B MNIST 28