Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

Convolutional Neural Network 2014 3 A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolutional Neural Network Fukui Hiroshi

1940 1980 [1] 90 3 1 3 ( ) 2 3 90 2012 Deep Learning Deep Learning [2] Drop out[3] Deep Learning Histgram of Oriented Gradients(HOG) [4] Scale Invariant Feature Transform(SIFT) [5] Deep Learning Convolutional Neural Network(CNN) [6] CNN 1989 ii

CNN 5 CNN Deep Learning CNN CNN CNN iii

1 1 1.1................................ 1 1.1.1............................... 2 1.2.............................. 4 1.3.............................. 5 1.3.1............................. 6 1.3.2.............................. 9 1.3.3........................... 12 2 Deep Learning 14 2.1 Convolutional Neural Network......................... 15 2.1.1............................... 16 2.1.2.................................. 18 2.1.3............................. 19 3 21 3.1.................................. 21 3.2................................. 23 3.3................................... 23 3.3.1................................. 23 3.4 CNN.................... 28 3.4.1................................. 28 3.4.2................................. 29 38 iv

40 41 v

1.1................................. 2 1.2.............................. 3 1.3............................ 4 1.4............. 6 1.5............................ 7 1.6............................ 9 2.1......................... 14 2.2 CNN [7][8]................. 15 2.3 CNN............................... 16 2.4............................ 18 2.5.............................. 19 2.6 CNN............................. 20 3.1 MNIST Dataset.............................. 22 3.2........................... 24 3.3............................ 25 3.4................................... 26 3.5................. 27 3.6.................... 28 3.7...................... 29 3.8....................... 31 3.9...................... 32 3.10................. 33 3.11.................. 34 vi

3.12 CNN.................... 36 3.13 epoch........................ 37 vii

3.1.................... 30 3.2.................... 35 viii

1 1 1.1 () 1.1 1 1.1 w 1

1.1. f 1.1: x i w i f y x w d w i y (1.1) ( d ) y = f w i x i i=1 (1.1) 1.1.1 y f x i w i X (1.2) X 2 (1.2) X 0 1 X 0-1 2

1.1. 1 (X > 0) f (X) = 1 (X 0) (1.2) 1.2 (1.3) 0 1 1.2: f (X) = 1 1 + exp ( gx) (1.3) (1.3) g g 1.2 1.2 3

1.2. 1.3.2 f(x) (1.4) f (X) = gf (X) (1 f (X)) (1.4) 1.2 1957 [9] 1.3 3 2 s 1 θ s 2 s 3 a 1 a 2 w j s i w ij a j s d a J S A O 1.3: d J 1 w ij w j -1 1 2 θ θ 4

1.3. A = [a 1, a 2,..., a j,..., a J ] T w = [w 1, w 2,..., w j ] O (1.5) J O = f w j1 a j θ (1.5) j=1 O T (1.6) (1.7) w t+1 = w t + η (T O) A (1.6) θ t+1 = θ t + η (T O) (1.7) (1.6) (1.7) t η η > 0 (1.6) (1.7) 0% 1.3 2 x 1 x 2 2 1.4(a) 2 x 1 x 2 1.4(b) 3 1.4 1.5 3 2 c d J 5

1.3. x 2 x 2 x 1 x 1 1.4: 1.3.1 1 (1.9) E N = 1 N c (T k O k ) 2 (1.8) 2 n=1 k=1 6

1.3. θ γ s 1 s 3 a 1 o 1 o 2 s i w ij a j w ij o k a J s d o c S A O 1.5: E N (1.9) E N η w t+1 = w t η E N w t (1.9) 1.3.2 7

1.3. 1 1 1 (1.10) E n = 1 c (T k O k ) 2 (1.10) 2 k=1 (1.10) E n E n η (1.11) w t+1 = w t η E n w t (1.11) 1.3.2 1 mini batch mini batch mini batch 1 mini batch M (1.12) E m (1.13) E M = 1 M c (T k O k ) 2 (1.12) 2 m=1 k=1 8

1.3. w t+1 = w t η E M w t (1.13) 1.3.2 1.6 1.6 θ γ i j k k d c c S A O T 1.6: d c S i A j O k T k f w ij w jk j (1.14) w ij S j θ j 9

1.3. d A j = f w ij S i + θ j (1.14) i= (1.10) (1.10) O k O k (1.15) T δ k E n O k = (O k T k ) = δ k (1.15) O k P k = j w jk A j + γ k O k (1.16) O k = f (P k ) (1.16) (1.16) k > 2 (1.17) (1.17) P k 1 P j O k = exp (P k) j exp (P j ) (1.17) E n E n E n (1.18) E n = E njk w jk (1.18) 10

1.3. E njk E njk (1.18), (1.19) E njk = E n w jk = E n O k O k w jk = E n O k O k P k P k w jk (1.19) (1.19) E njk (1.20) E njk = E n w jk = δ k O k (1 O k ) A j (1.20) E nij E nij (1.21) E nij = E n w ij = E n A j A j P j P j W ij = E n O k P k A j P j O k P k A j P j w ij ( = δ k O k (1 O k ) W jk ) A j (1 A j ) S i (1.21) k (1.20) (1.21) (1.20) (1.20) (1.11) (1.22) (1.23) w t+1 jk = w t jk η δ k O k (1 O k ) A j (1.22) γ t+1 j = γ t j η δ k O k (1 O k ) (1.23) 11

1.3. (1.24) (1.25) ( wij t+1 = wij t η δ k O k (1 O k ) W jk ) A j (1 A j ) S i (1.24) θ t+1 j = θ t j η k ( k δ k O k (1 O k ) W jk ) A j (1 A j ) (1.25) epoch 1epoch 1 1.3.3 0 1 0 1 2 2 1 t t = 1 C 1 t = 0 C 2 y (1.3) y(x, w) p(c 1 x) p(c 2 x) 1 y(x, w) (1.26) p (t x, w) = y (x, w) t {1 y (x, w)} 1 t (1.26) (1.26) (1.27) N E = {t n ln y n + (1 t n ) ln (1 y n )} (1.27) n=1 12

1.3. Simard [10] (1.28) N C E = t nc ln y c (1.28) n=1 c=1 13

2 Deep Learning Deep Learning 2.1 O T O k i j j j k k d c c S A O T 2.1: 2.1 14

2.1. Convolutional Neural Network Convolutional Neural Network(CNN) CNN CNN 2.1 Convolutional Neural Network CNN CNN Hubel Wisel [11] Hubel 2.2(a) Fukushima 2.2(b) Neocognitron [8] Neocognitron CNN Neocognitron 2.2: CNN [7][8] CNN CNN 15

2.1. Convolutional Neural Network 2.3 CNN 2.3: CNN 2.1.1 CNN 16

2.1. Convolutional Neural Network CNN Hubel n x n y n w n w n x, n y (2.1) n x = n x n w + 1 n y = n y n w + 1 (2.1) 2 2 2 Hubel CNN 2.4 P h i h (2.2) i P h = max i P h i (2.2) 2 2 (2.3) n x = n x /2 n y = n y /2 (2.3) 17

2.1. Convolutional Neural Network P P 1 P 2 P 3 P 4 2.4: 2.1.2 CNN 2.5 Full-Connect Full-Connect 18

2.1. Convolutional Neural Network 2.5: 2.1.3 CNN (2.1) (2.3) n w n w n w n w 2 2 1 Full-Connect 2.6 CNN 19

2.1. Convolutional Neural Network 2.6: CNN 20

3 CNN CNN CNN 2 2 MNIST Dataset CNN 3.1 MNIST Dataset 3.1 MNIST Dataset MNIST Dataset 0 9 50,000 10,000 10,000 28 28pixel 21

3.1. 3.1: MNIST Dataset 22

3.2. 3.2 CNN 5 5 1 6 2 14 2 2 Full-Connect 400 0.1 10 1 1000 CNN 0.1 10 epoch 100 3.3 3.3.1 CNN CNN 3.2 3.2 3.2 23

3.3. 3.2: 24

3.3. 3.3 1 2 3.3 epoch 2 epoch 3.3: 3.4(a) 3.4(b) 3.4(a) 3.4(b) 25

3.3. 1000 Convolutional Neural Network Multi Layer Perceptron 100 10 1 Cross entropy 0.1 0.01 0.001 0.0001 1e-005 50 100 150 200 250 300 350 400 450 500 epoch 100 Convolutional Neural Network Multi Layer Perceptron Miss rate[%] 10 1 50 100 150 200 250 300 350 400 450 500 epoch 26 3.4:

3.3. epoch 3.5 3.5 1 2 3.5 3.5: 3.4(a) epoch 3.6 3.6 27

3.4. CNN 100 Convolutional Neural Network Multi Layer Perceptron 1000 Convolutional Neural Network Multi Layer Perceptron 100 10 1 Miss rate[%] 10 Cross entropy 0.1 0.01 0.001 0.0001 1 50 100 150 200 250 300 350 400 450 500 epoch 1e-005 50 100 150 200 250 300 350 400 450 500 epoch 3.6: 3.4 CNN 3.4.1 CNN CNN CNN 28

3.4. CNN CNN CNN 3.7 3.7: 3.4.2 CNN 3.1 - + 29

3.4. CNN 3.1: -5 +5 0 5 10 15 20 25 30 1.1 1.5 0.5 0.9 3.8(a) 3.8(c) CNN (MLP) 3.8(a) 3.8(c) CNN 3.9 3.9 3.8(a) 3.8(c) 10% 3.9 1 2 3.9 1 2 1 30

3.4. CNN 100 90 80 70 [%] 60 50 40 MLP CNN 30 20 10 30 0-10 -5-4 -3-2 -1 0 +1 +2 +3 +4 +5 +10 [pixel] 25 20 [%] 15 MLP CNN 10 5 0 80 0 1 2 3 4 5 10 15 20 25 30 [ ] 70 60 [%] 50 40 30 MLP CNN 20 10 0 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 3.8: 31

3.4. CNN 3.9: 32

3.4. CNN CNN 3.10 3.10 1 CNN 2 3.10 CNN 3.10: 3.11 CNN 3.11 1 CNN 5pixel 33

3.4. CNN 3.11 2 CNN 20 3.11: 3.13(a) CNN 3.13(b) 3.13(b) 3.13(a) 3.13(b) CNN CNN 34

3.4. CNN 3.2 CNN CNN 3.2 CNN 3.2: 2pixel 2pixel 20 MLP 25.79 23.63 11.41 CNN 3.98 4.80 6.27 CNN 2.11 1.80 4.38 CNN epoch 3.13 epoch 3.13 CNN CNN epoch 3.2 CNN epoch 35

3.4. CNN 100 10 Parallel Shift Learning CNN Rotation Learning CNN No Random Learning CNN 1 0.1 Cross entropy 0.01 0.001 0.0001 1e-005 1e-006 20 40 60 80 100 epoch 100 Parallel Shift Learning CNN Rotation Learning CNN No Random Learning CNN 10 Miss rate[%] 1 0.1 10 20 30 40 50 60 70 80 90 100 epoch 36 3.12: CNN

3.4. CNN 3.13: epoch 37

Convolutional Neural Network 1 2 Deep Learning Convolutional Neural Network Convolutional Neural Network Deep Learning Convolutional Neural Network Convolutional Neural Network 3 Convolutional Neural Network Convolutional Neural Network Convolutional Neural Network Convolutional Neural Network 38

Convolutional Neural Network 39

[1] D. Rumelhart, G. Hintont, and R. Williams, Learning representations by backpropagating errors, Nature, pp.533-536, 1986. [2] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors, CoRR, vol.abs/1207.0580, 2012. [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems 25, pp.1106 1114, 2012. [4] N. Dalal, and B. Triggs, Histograms of oriented gradients for human detection, International Conference on Computer Vision & Pattern Recognition, vol.2, pp.886-893, 2005. [5] D. G. Lowe, Object recognition from local scale-invariant features, Proceedings of the International Conference on Computer Vision-Volume 2 - Volume 2, pp.1150, IEEE Computer Society, 1999. [6] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, vol.1, pp.541-551, 1989. [7] J. W. Kimball, Kimball s biology pages,, 2000, http://www.dls.ym.edu.tw/ol biology2/ultranet/visualprocessing.html. 41

[8] K. Fukushima, and S. Miyake, Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position, Pattern Recognition, vol.15, no.6, pp.455 469, 1982. [9] F. Rosenblatt, The perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, vol.65, no.6, pp.386-408, 1958. [10] S. Patrice, V. Bernard, L. Yann, and D. John S, Tangent Prop: a formalism for specifying selected invariances in adaptive networks, NIPS, pp.895 903, 1991. [11] B. Y. D. H. Hubel, and A. D. T. N. Wiesel, Receptive fields, binocular interaction and functional architecture in the cats visual cortex, The Journal of physiology, vol.160, pp.106 154, 1962. 42

Convolutional Neural Network () 2014 3