Random Forests 2013 3 A Graduation Thesis of College of Engineering, Chubu University Proposal of an efficient feature selection using the contribution rate of Random Forests Katsuya Shimazaki
[1] SBS [2] SBS Random Forests[3] Random Forests ii
Random Forests Random Forests iii
ii 1 Random Forests 1 1.1....................................... 1 1.2....................................... 4 1.3...................................... 5 2 9 2.1................................... 9 2.1.1............................... 10 2.1.2.............................. 10 2.1.3 SFS.................................. 10 2.1.4 SBS.................................. 11 2.1.5 s r.......................... 12 2.1.6 SFFS................................. 13 3 14 3.1......................... 14 3.1.1............................ 14 3.2......................... 16 3.2.1............................. 16 3.2.2............................ 16 4 18 4.1..................................... 18 4.1.1.............................. 18 iv
4.1.2............................. 19 4.1.3............. 19 4.2..................................... 20 4.2.1............................. 20 4.2.2............................. 23 4.2.3.......................... 23 25 27 28 v
1.1 Random Forests............................. 2 1.2 Random Forests.......................... 4 1.3 Random Forests ( [5] )............ 5 1.4 Random Forests ( [9] )....... 6 1.5 Semantic Texton Forests............................ 6 1.6 Random Forests ( [8] )........... 7 1.7 Hough Forests ( [6] ).............. 7 2.1 SFS......................... 11 2.2 SBS......................... 12 3.1........................... 15 3.2.................................. 15 3.3......................... 16 3.4........................ 17 4.1 3-fold cross-validation.................. 20 4.2 Pendigits.......................... 21 4.3 Waveform.......................... 21 4.4 Spambase.......................... 22 4.5 Optdigits.......................... 22 4.6................... 23 vi
2.1........................... 10 4.1.............................. 19 4.2 Random Forests...................... 19 4.3....................... 23 vii
1 Random Forests Random Forests 1.1 Random Forests Random Forests Randomized Forests Randomized Trees, Randomized Deicision Forests Random Forests Random Forests Breiman [3] Bagging[4] Random Feature Selection [5] [6] [7] [8] Random Forests 1.1 1.2 1.3 1.1 Random Forests 1.1 (Split Node) (Leaf Node) 1
1.1. 1 t 1 t T 1.1: Random Forests Random Forests Require: : I Require: : T Require: : D Require: T : I = (I 1, I 2,..., I T ). Require: : F Require: : T H 1: For k = 1,..., T 2: I k 3: For l = 1,..., F 4: - f 5: - For m = 1,..., T H 6: 7: t f t I l I r 8: - I l = {i I n f(v i ) < t} 9: - I r = I n \ I l 10: E 11: - E = I l I n l) Ir I n r) 12: if E > E old f, t, I l, I r 13: - end for 14: end for 15: if gain = 0 D 16: - P (c l) 17: 18: else I l, I r end for 2
1.1. I T D I = (I 1, I 2,..., I T ) F T H F,T H f t f t Random Feature Selection F T H F T H E E n I n f t 1.1, (1.2) I l I r I l = {i I n f(v i ) < t} (1.1) I r = I n \ I l (1.2) I l I r (1.3) E E = E(I) I l I n E(I l) I r I n E(I r) (1.3) E(I) (1.4)( ) (1.5)( ) E(I) = n p(c i ) log p(c i ) (1.4) i=1 E(I) = n p(c i )(1 p(c i )) (1.5) i=1 p(c i ) c i ( ) 0 l P (c l) 3
1.2. 1.2 1.2 v P (c l) (P 1 (c l), P 2 (c l),..., P T (c l)) P (c v) P (c v) = 1 T T P t (c l) (1.6) t=1 C i = arg max c i P (c i v) (1.7) 1.2: Random Forests 4
1.3. 1.3 Random Forests Amit 1.3(a) Random Forests 1.3(b) [5] 1.3: Random Forests ( [5] ) Lepetit 1.4 Random Forests [9] 5
1.3. 応用例 (a) パッチを学習した RTs (b) 射影変化に頑健な結果 図 1.4: Random Forests を用いた特徴点マッチング (文献 [9] より引用) Shotton 等は図 1.5 に示すように画像パッチを Semantic Texton Forests Features 特徴 量を用いて Random Forests により学習する Semantic Texton Forests を提案した [7] Semantic Texton Forests は Moosmann 等の手法と同様に ノードを visual word とする ことにより 特徴表現を行うことができ これを用いてセマンティックセグメンテーショ ン 画像識別が可能であることを実験により示している 正解画像 A[g] -B[b] > 28 入力画像 A[b] -B[g] > 37 A[r] + B[r] > 363 A[b] + B[b] > 284 A[g] -B[b] > 13 画像パッチ A[b] > 98 A[r] -B[b] > 21 図 1.5: Semantic Texton Forests 6
1.3. Shotton 1.6 Random Forests [8] CG 31 2 Random Forests 3 1.6: Random Forests ( [8] ) Gall 1.7 Hough image Hough Forests [6] Hough image Hough Forests 1.7: Hough Forests ( [6] ) 7
1.3. [7] [5] [9] [8] [6] 8
2 2.1 n d SFS SBS s r SFFS 9
2.1. 2.1.1 n d n C d n d [10] 2.1.2 d [10] 2.1 v 1, v 2, v 3 2.1: [%] v 1 7 v 2 14 v 3 21 (v 1, v 3 ) 4 (v 1, v 2 ) 6 2.1.3 SFS Whiteney (Sequential Forward Selection)[11] SFS 2.1 1 1 1 10
2.1. 2 3 d v 1,v,...,v 2 n v 1 v 2 v n max v,v,...,v 1 3 n 2.1: SFS 2.1.4 SBS SFS Marill (Sequential Backward Selection)[2] SBS 2.2 11
2.1. v,v,...,v 1 2 n v,v,...,v v,v,...,v v,v,...,v 2 3 n 1 3 n 1 2 n-1 max v 1,v 3,...,vn v2 2.2: SBS 2.1.5 s r Stearns s r [12] SFS SBS k X k 1. X k+s X k SFS s 2. X k+s r X k+s SBS r 3. d = k + s r 1 2 s > r s = 2, r = 1 s r s r s = 1, r = 0 SFS s = 0, r = 1 SBS 12
2.1. 2.1.6 SFFS Pudil SFFS (Sequential Floating Forward Selection)[13] s r s, r SFFS k = 0 1. SFS k k + 1 2. k 2 1 3 3. SBS 1 4. 3 3 5 5. k = d 1 SFFS SFS SBS 13
3 Random Forests 3.1 Random Forests 3.1.1 Random Forests C(v d ) (3.1) C(v d ) j d v d S d S (3.1) d 14
3.1. v d j T S d S C(v d ) = T t=1 j f(v d ) S t,j j J S t,j 100 (3.1) 3.1 1 f(v d ) f (v ) d I n y f (v d ) x I l I r 3.1: 3.2 t 1 t 1 t 1 t 1 v 1 v 2 v 3 v 4 v 5 v 6 C(v 2 ) t 1 v 1 v 2 v 3 v 4 v 5 v 6 C(v 4 ) t 1 v 1 v 2 v 3 v 4 v 5 v 6 3.2: 15
3.2. 3.2 2 3.2.1 Random Forests 3.3 3.3: 3.2.2 Random Forests 3.4 16
3.2. 3.4: 17
4 4.1 3 2 SBS(Sequential Backward Selection) 10% 4.1.1 UCI Machine Learning Repository[14] UCI Machine Learning Repository UCI Machine Learning Repository Pendigits, Waveform, Spambase, Optdigits 4 4.1 18
4.1. 4.1: Pendigits 14988 10 16 Waveform 5000 3 21 Spambase 4601 2 57 Optdigits 5620 10 64 4.1.2 Random Forests 4.2 10 30 10 200 10 10 1.0 4.2: Random Forests Pendigits 20 80 Waveform 30 150 Spambase 30 90 Optdigits 20 110 4.1.3 cross-validation [15] Seymour Geisser K-fold cross-validation K-fold cross-validation N K K K 1 K-fold cross-validation K 19
4.2. K K N K 3 3-fold cross-validation 3-fold cross-validation 4.1 4.1: 3-fold cross-validation 4.2 4.2.1 SBS 10% 4.2, 4.3, 4.4, 4.5 SBS SBS 20
4.2. 4.2: Pendigits 4.3: Waveform 21
4.2. 4.4: Spambase 4.5: Optdigits 22
4.2. 4.2.2 10% 4.3 4.3: SBS [ ] [ ] [ ] Pendigits (16) 12 12 11 Waveform (21) 16 15 16 Spambase (57) 53 51 50 Optdigits (64) 55 53 53 4.2.3 SBS 3 10% 1 4.6 4.6: 23
4.2. Pendigits SBS 9 Pendigits 38 SBS SBS 24
Random Forests 1 Random Forests Random Forests 2 3 Random Forests 2 4 SBS Spambase Optdigits 25
SBS SFS 26
27
[1] vol. 48, no. 16, pp. 1-24, 2007. [2] Marill, T, D. M. Green, On the effectiveness of receptors in recognition system, IEEE Trans. Inform. Theory 9, pp. 11-17, 1963. [3] L. Breiman, Random Forests, Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. [4] L. Breiman, Bagging Predictors, Machine Learning, vol. 24, no. 2, pp. 123-140, 1996. [5] Y. Amit, G. August and D. Geman: Shape quantization and recognition with randomized trees, Neural Computation, no. 9, pp. 1545-1588, 1996. [6] Gall, J. and Yao, A. and Razavi, N. and Van Gool, L. and Lempitsky, V., Hough forests for object detection, tracking, and action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 11, pp. 2188-2202, 2011. [7] J. Shotton, M. Johnson and R. Cipolla, Semantic texton forests for image categorization and segmentation, Computer Vision and Pattern Recognition, 2008. [8] J. Shotton,and A. Fitzgibbon, and Cook, M. and Sharp, T. and Finocchio, M. and Moore, R. and Kipman, A. and Blake, A., Real-time human pose recognition in parts from single depth images, Computer Vision and Pattern Recognition, 2011. [9] V. Lepetit and p. Fua, Keypoint recognition using randomized trees, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 9, pp. 1465-1479, 2006. 28
[10],, pp51-55. [11] Whitney, A. W, A direct method of nonparametric measurement selection, IEEE Trans. Comput.20,pp. 1100-1103, 1997. [12] S. D. Stearns: On selecting features for pattern classifies, Proc. Third Internat. Conf. Pattern Recognition, pp. 71-75, 1976. [13] P. Pudil, J. Novovicora and J. Kittler: Floating search methods in feature selection, Pattern Recognition Letters, Vol. 15, No. 11, pp. 1119-1125, 1994. [14] UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/. [15] Kohavi, Ron: A study of cross-validation and bootstrap for accuracy estimation and model selection, 1995. 29
Random Forests ( ) 2013 3