DEIM Forum 2018 J7-3 305-8573 1-1-1 305-8573 1-1-1 305-8573 1-1-1 () 151-0053 1-3-15 6F URL SVM Identifying Know-How Sites basedonatopicmodelandclassifierlearning Jiaqi LI,ChenZHAO, Youchao LIN, Ding YI,ShutoKAWABATA, Takahide KASUGA, Takehito UTSURO, and Yasuhide KAWADA Grad. Sch. of Systems and Information Engineering, University of Tsukuba, Tsukuba 305-8573 Japan College of Engneering Systems, School of Science and Engineering, University of Tsukuba, Tsukuba 305-8573 Japan Faculty of Engineering, Information and Systems, University of Tsukuba, Tsukuba 305-8573 Japan Logworks Co., Ltd. Tokyo 151-0053, Japan 1. Yahoo!, [5] AND [3] [5]
1 1 A B C D E 1 [3] URL
2 2 30 (/) 926 12,078 82 (51/31) 959 13,256 85 (40/45) 850 9,745 102 (45/57) 958 13,742 89 (57/32) 838 9,462 92 (45/47) 815 7,573 98 (28/70) 5,346 65,856 548 (266/282) AND N p P (s, N) N =20 P w P w = s S P (s, N) Google Custam Search API 1 p s S(p) { } S(p) = s S p P (s, N) SVM 2. Google 100 1,000 2 S s S 3. P w (LDA; Latent Dirichlet Allocation) [1] LDA w V w(w V ) K ( K =50 ) z n (n =1,...,K) w P (w z n)(w V ) p z n P (z n d) (n =1,...,K) p 1https://developers.google.com/custom-search/
URL 3 SVM f 1 : URL com org jp net co.jp 5 (f 11,...,f 15 ) f 2 : URL (Secure) (HTTPS) (HTTP) f 3 : f 4 : f 5 : f 6 : f 7 = f 3 f 4 : f 8 = f 7/f 5 :( )/() f 9 = f 5 f 6 : f 10 = f 5/f 6 : / p P w z n(n =1,...,K) P w(z n) { } P w(z n)= d P w zn = argmax P (z u d) z u (u=1,...,k) 4. [3] [3] P w 30 1 1 A B C ( 2) [3] 30 P w 30 ( ) 2 1 A E A B C A 3 4 5 5. 3 5. 1 5. 4 (5. 5 ) 5. 1 URL URL com org jp net co.jp 5 (f 11,...,f 15) URL (Secure) (HTTPS) (HTTP) 5. 2 T t( T ) t P (t) p( P (t)) S(p) S(t) = p P (t) S(p) t S(t) t p( P (t)) S(p) p P (t)
PR 3 A http://kenjasyukatsu.com/ 4 A http://www.wedding-recipe.com/ t t 5. 3 t P (t) P (t) 5. 4 p( P (t)) p z(p) p z(p) = argmax P (z u p) z u (u=1,...,k) t( T ) p z(p) z(t) = p P (t) { z(p) } t z(t) 5. 5 1 p P (t) S(p) S(t)
5 A http://www.kafuntaisaku1.com/ 2 ( )/( ) S(p) S(t) p P (t) P (t) 3 P (t) z(t) 4 / P (t) z(t) 6. 6. 1 LIBSVM 2 RBF 6 5 1 6 SVM f 7 10 f 1 6 f 1 4,7 10 8 3 2https://www.csie.ntu.edu.tw/~cjlin/libsvm/ 3 () R conf c () S(conf > = c) (conf > = c) (conf > = c) (conf > = c)= R S(conf > = c) R (conf > = c)= R S(conf > = c) S(conf > = c) 6. 2 6 7. 10 50 32 4 32 40 100 f 3 89 f 4 111 f 7 35 f 8 111 f 9 47 f 10 80
6 8. [6] [2] 2014 12 NTCIR-11 4 [2] Task Mining Task Task Mining Task [4] Task Mining Task 4http://research.nii.ac.jp/ntcir/ntcir-11/index-ja.html 9. [1] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, Vol.3, pp. 993 1022, 2003. [2],,,,. Web. 6 DEIM, 2014. [3],,,,,,.. 9 DEIM, 2017. [4] Y. Liu, R. Song, M. Zhang, Z. Dou, T. Yamamoto, M. Kato, H. Ohshima, and K. Zhou. Overview of the NTCIR-11 IMine task. In Proc. 11th NTCIR Workshop Meeting, pp. 8 23, 2014. [5],,,,,,.. 7 DEIM, 2015.
4 : 1 10 / : A B / ID 1 2 3 4 5 6 7 8 9 10 B B B B B A A A B A JOBRASS Jobweb Do 1 2 TOEIC 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 SPIWEB (TG-WEB) 20 21 22 23 24 25 26 27 28 29 30 31 32 [6] A. Sun, E.-P. Lim, and W.-K. Ng. Web classification using support vector machine. In Proc. 4th WIDM, pp. 96 99, 2002.