Wikipedia 1 2 3 Wikipedia Wikipedia Extracting Difference Information from Multilingual Wikipedia Yuya Fujiwara, 1 Yu Suzuki 2 and Akiyo Nadamoto 3 There are multilingual articles on the Wikipedia. The information between multilingual articles is different. Especially, the case of the articles which is written about culture is very different between languages. In this paper, we propose the system which extracts different information between Japan and other countries on the Wikipedia. Specifically, the system compare Japanese Wikipedia article which is written about foreign things with English Wikipedia. Then it extract different information which is written in English version. In this time, the granularity of information between two language is different, it means a Japanese article is not suitable for an English article. Then we propose how to extracts multiple English articles which is written about same a Japanese article based on link graph. 1. Wikipedia 1 Wikipedia 250 2 ( 1 ) Gallery ( 1 ) Wikipedia 8 4 3 1 Konan University 2 Nagoya University 3 Konana University 1 Wikipedia http://www.wikipedia.org/ 2 Wikipedia: http://ja.wikipedia.org/wiki/wikipedia: 3 asahi.com http://www.asahi.com/national/update/0303/tky201003030157.html 1 c 2011 Information Processing Society of Japan
Fish and chips Wikipedia Wikipedia 1 Wikipedia Wikipedia 2 ( 1 ) ( 2 ) ( 3 ) 1 Wikipedia Fig. 1 Example of multilingual Wikipedia pages ( 4 ) (2) ( 5 ) ( 6 ) (5) ( 7 ) ( 8 ) (4) (2) ( 9 ) (8) 2 3 Wikipedia 4 5 6 7 2 c 2011 Information Processing Society of Japan
2. 2 Fig. 2 System Flow 1)2) Wikipedia Wikipedia Wikipedia Wikipedia 3) Wikipedia Wikipedia Wikipedia 4) Wikipedia 5) Wikipedia Wikipedia Wikipedia 6) pfibf pfibf Wikipedia 3. 3.1 Wikipedia 1 Cricket ( 1 ) ( 2 ) 3 c 2011 Information Processing Society of Japan
Fig. 3 3 Analysis of the link structure ( 5) ( 3 ) Cricket Batting ( 4 ) 5 (1) W kl ( 3) ( 3 ) ( 4 ) 3.2 7) 3.1 CricketBatting Batting Wikipedia ( 1 ) ( 4) ( 2 ) W kl = af cos(k, l) + af i=1 ( 1 d i ) n i (n i o i + 1) (1) af d i i n i i o i i cos(k, l) k l Cricket Batting Batting Cricket Bat and ball Pitch, wickets and creases af=2 Bat and ball 3 d 1 =3 3 25 n 1 =25 3 Bat and ball 11 o 1=11 Pitch, wickets and crease d 2 =3 n 2 =25 o 2 =3 (1) Batting 0.71 0.6 Batting 4 c 2011 Information Processing Society of Japan
Fig. 4 4 Wikipedia segment contents structure of Wikipedia 4. Fig. 5 5 Wikipedia Tree posture Creator of the article of Wikipedia 4.1 Wikipedia Wikipedia Wikipedia Wikipedia ( 6 ) GENE95 1 GENE95 Google Ajax api 2 Microsoft Translator api 3 Bowls World Cricket League 8 Wikipedia 1 GENE95 http://www.namazu.org/ tsuchiya/sdic/data/gene.html 2 Google Ajax api http://code.google.com/apis/language/ 3 Microsoft Translator api http://www.microsofttranslator.com/dev/. (2) 0.3 xi y i cos(x, y) = x 2 i (2) yi 2 x 1 y 1 x i 1 i y i 1 i 5 c 2011 Information Processing Society of Japan
Fig. 6 6 Table of contents structure and contents 7 Fig. 7 output of prototypesystem 5. Ruby 1 Mecab 2 Tree Tagger 3 MySQL 4 7 7 Wikipedia 1 Ruby http://www.ruby-lang.org/ja/ 2 Mecab http://mecab.sourceforge.net/ 3 Tree Tagger http://www.ims.uni-stuttgart.de/projekte/corplex/treetagger/ 4 Mysql http://www-jp.mysql.com/ 6. 1 2 6.1 1 8) Cricket Warwick Castle Snooker Fish and chips Goodwood Festival of Speed Bowls Polo Association football 8 0.6 0.7 0.6 Criket 56 Warwick Castle 2 Snooker 15 Fish and Chips 5 Goodwood Festival of Speed 2 Bowls 1 Polo 4 Association football 9 6 c 2011 Information Processing Society of Japan
1 1 Table 1 Result of experiment1 F F Cricket 100 40 50 97 55 70 Warwick Castle 0 0 0 50 33 40 Snooker 100 7 13 63 33 43 Fish and chips 67 20 31 50 60 55 Goodwood Festival of Speed 0 0 0 100 100 100 Bowls 0 0 0 0 0 0 Polo 0 0 0 100 25 40 Association football 100 33 50 100 63 77 46 13 19 70 46 53 1 Cricket Cricket Cricket Warwick Castle Warwick Castle 1 46% 70% 13% 46% F 19% 53% Cricket List of international Cricket Council members Cricket List of international Cricket Council members Cricket F Cricket Association football F Snooker Snooker n i (1) 2 2 Table 2 Result of experiment2 F 56 83 69 79 88 83 100 75 86 83 63 71 88 74 80 81 80 80 Bowls World Bowls Events World Bowls Events 6.2 2 5 1 F 6 25 8 8 19 2 2 F 80% History International structure 7 c 2011 Information Processing Society of Japan
strike head grass 7., 2 Wikipedia 1 72% 49% F 57% 81% 80% F 80% Wikipedia 1) Wikipedia 50 ( 72 ) No.5, pp.181 182 (2010). 2) Wikipedia 21 (2009). 3) 71 No.2, pp.269 270 (2009). 4) DEIM Forum 2009. 5) Wikipedia DEIM Forum 2009, pp. 1 8. 6) K Nakayama T Hara S Nishio Wikipedia Mining for An Association Web Thesaurus Construction, WISE 2007, pp.1 11. 7) Wikipedia 73 No.1, pp.1.575 1.576. 8) Wikipedia No.2011. 8 c 2011 Information Processing Society of Japan