E-Mail: matsu@nanzan-u.ac.jp [13] [13] 2 ( ) n-gram 1 100 ( ) (Google ) [13] (Breiman[3] ) [13] (Friedman[5, 6])
2 2.1 [13] 10 20 200 11 10 110 6 10 60 [13] 1: (1892-1927) (1888-1948) (1867-1916) (1862-1922) (1872-1943) (1873-1939) (1872-1939) (1897-1949) (1896-1934) (1909-1948),,,,,,,,,,,,,,,,,,,,,,,,,, M,,,,,,,,,,,,,,,, 1, 2, 3,,, 1, 2,,,, 1, 2, 3, 1, 2,,,,,,,,,,,,,,,,,,,,,,,,,,,,, -,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, ()
10 20 200 5 10 50 2.2 1. 2. 3. 2.3 MeCab R MeCab RMeCab ( [8] ) 2.4 [13] 0 [9, 10, 11, 12] n-gram
[9] [10] [11] n-gram n-gram n n-gram n-gram < > < > < > < > < > < > < > < > < > < > < > < > < > < >N =2 2 2: n-gram(n=2) [ ] 1 0.077 [ ] 2 0.154 [ ] 2 0.154 [ ] 1 0.077 [ ] 1 0.077 [ ] 1 0.077 [ ] 1 0.077 [ ] 3 0.231 [ ] 1 0.077 2.5 [13] k Bagging Boosting RandomForest RandomForest Bagging [13]
Boosting (AdaBoost) Boosting Boosting CART 3 3.1 CART CART Breiman et al.[1] CART ( [14] ) 1. 2 2. 3. 3.2 Boosting Boosting 3.2.1 AdaBoost AdaBoost Freund and Schapire[4] Boosting 3.2.2 Friedman[5, 6] Boosting Boosting CART Boosting L(y, f(x)) 3 R
3: 1 2 f(x)]2 y f(x) y f(x) sign[y f(x)] 2y 2 log(1 + exp( 2yf(x))) (1+exp(2yf(x))) (2 ) (S S s=1 y s log p s (x) y s p s (x) ) 3.3 Bagging Bagging Breiman[2] Bagging 3.4 RandomForest RandomForest Breiman[3] RandomForest Bagging Bagging ( ) Bagging RandomForest 4 4.1 10 20 5 10 S (S 1,S 2,, 3) 100 4.2 (recall) (precision) F i (i =1, 2,,n) A, B
G i A R i A A P i A A 4 F (2) a i, b i 0 (3) : R i = a i a i + c i (1) 1:P i = 2:P i = a i a i + b i (2) a i + d i a i + b i + c i + d i (3) : ˆR = 1 n n i=1 F a i a i + c i : ˆP = 1 n n i=1 a i + d i a i + b i + c i + d i (4) F = 2 ˆP ˆR ˆP + ˆR (5) 4: G i A B A a c B b d 4.3 50 587 10 56 F 2 ( ) 2 1 2 1 2 AdaBoost [13] 0 0
F F 19 17 15 13 11 9 8 7 6 5 4 3 9 8 7 6 5 4 3 1: (2 F ) 2 F 1 AdaBoost Bagging RandomForest RandomForest RandomForest (sfchaos[7] ) 1:9 1:4 F F 19 17 15 13 11 9 8 7 6 5 4 3 9 8 7 6 5 4 3 2: ( F )
F 2 RandomForest Bagging AdaBoost RandomForest Bagging AdaBoost RandomForest AdaBoost F 2 2 AdaBoost RandomForest 4.4 50 24 3 22 F F 19 17 15 13 11 9 8 7 6 5 4 3 9 8 7 6 5 4 3 3: ( F ) 3 2 AdaBoost F Bagging RandomForest RandomForest Bagging AdaBoost AdaBoost Bagging RandomForest F 0.944 3 0.853 0.892 3 0.844
4.5 8 6 2 F Bagging AdaBoost RandomForest Bagging RandomForest AdaBoost 2 RandomForest Bagging AdaBoost RandomForest F 0.65 [10] 3 21 10 200 5 50 4.6 50 38 5 28 2 Bagging AdaBoost RandomForest RandomForest Bagging AdaBoost RandomForest Bagging AdaBoost n-gram F 0.8 0.591 4.7 n-gram n-gram 151 116 4 2 AdaBoost Bagging RandomForest RandomForest Bagging AdaBoost F 0.941 6 0.9 3 0.845 RandomForest Bagging F 0.147 RandomForest F 0.916 4 0.8 n-gram RandamForest
F F 19 17 15 13 11 9 8 7 6 5 4 3 9 8 7 6 5 4 3 4: n-gram ( F ) 5 2 AdaBoost Boosting AdaBoost 2 RandamForest RandomForest n-gram n-gram 2 6 RandomForest AdaBoost
[1] Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. j.(1984): Classification And Regression Trees, Wadsworth. [2] Breiman, L.(1996): Bagging predictors Machine Learning 26(2) 123-140. [3] Breiman, L.(2001): Random Forests Machine Learning 45(1) 5-32. [4] Freund, Y. and Schapire, R. E.(1996): Experiments with a new boosting algorithm Machine Learning Proceedings of the Thirteen International Conference 148-156. [5] Friedman, J. H.(2001): Greedy function approximation: a gradient boosting machine The Annals of Statistics 29(5) 1189-1232. [6] Friedman, J. H.(2002): Stochastic gradient boosting: Nonlinear methods and data mining Computational Statistics and Data Analysis 38 367-378. [7] sfchaos(2012):, www.slideshare.net/sfchaos/ ss-11307051. [8] (2008) RMeCab, rmecab.jp/wiki/index.php?plugin= attach&refer=rmecab&openfile=manual.pdf. [9] (1993) 46 5 (3) 131-132. [10] (1996),, 5(2), 13-21. [11] (2002),, 11(2), 15-2. [12] (2004),, 32, 384-385. [13] (2007), 55(2), 255-268. [14] (2005),, 18(2), 123-164.