DEIM Forum 2015 A3-1 565 0871 1-5 E-mail: {nakamura.tatsuya,shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.jp Wikipedia 1.,,, Twitter () Twitter 44 1 ( ) () 12015 1 (Beta ) [13] Wikipedia Wikipedia Wikipedia Wikipedia () ( )
Wikipedia 2. Mihalcea Wikify! [8] 1) 2) Wikify! [8] Wikipedia (keyphraseness) keyphraseness TF IDF [11] Lesk [6] 3 Naive Bayes Cucerzan [2] Wikipedia Wikipedia Wikipedia Milne [10] Wikipedia Miner 2 Wikipedia () [9] 2http://wikipedia-miner.cms.waikato.ac.nz/ Wikify! Kulkarni [5] Cucerzan Milne Hoffart [4] AIDA 3 AIDA Mention-Entity Graph (Mention) (Entity) Ferragina [3] TAGME 4 TAGME Wikipedia () Meij [7] keyphraseness 33 Twitter Wikipedia Miner TAGME [1] AIDA 15 2 [4] TAGME 2 [3] TAGME Wikipedia 3http://www.mpi-inf.mpg.de/yago-naga/aida/ 4http://tagme.di.unipi.it/
3. TAGME [3] TAGME Wikipedia Wikipedia 3. 1 TAGME TAGME [3] Wikipedia TAGME 1) 2) 3) TAGME 3. 1. 1 TAGME Wikipedia a 1 a 2 Wikipedia lp(a 1)lp(a 2) lp(a 1) < lp(a 2) a 2 lp(a 1) > = lp(a 2) a 1 a 2 lp(a) a lp(a) = link(a) freq(a) link(a) Wikipedia a freq(a) Wikipedia a (1) keyphraseness [8] Wikipedia a a link(a) a freq(a) lp(a) keyphraseness a lf(a) a df(a) keyphraseness(a) = lf(a) df(a) a keyphraseness 3. 1. 2 (2) a A P g(a) p a P g(a) a p a voting scheme rel a (p a ) = p b P g(b) rel(p b, p a ) P r(p b b) (3) P g(b) b A\{a} rel(p b, p a ) [9] P r(p b b) b p b commonness (3) P r(p b b) > τ (3) p a a (3) ϵ% commonness p a a τ ϵ TAGME [3] τ = 0.02ϵ = 30% 3. 1. 3 3. 1. 2 Wikipedia 3. 1. 2 (a, p a) ρ(a, p a ) = 1 2 (lp(a) + coherence(a, p a)) (4) ρ(a, p a ) > ρ NA a p a coherence(a, p a ) coherence(a, p a ) = 1 S 1 p b S\{p a } rel(p b, p a ) (5) S (4) (5) 4 ρ NA 3. 2
入 力力の テキスト 集 合 アンカーテキスト の 周 辺 情 報 P(w), P(w a) TAGMEの 枠 組 み W 杯 得 点 王 の ハメスがレアルに 移 籍 テキスト ( 太 文 字 下 線 はキーワード) メス レアル W 杯 抽 出 された 語 句句 (アンカーテキスト) 付 与 された 記 事 集 約 後 の 記 事 (1)キーワード 抽 出 (2)キーワードの 曖 昧 性 解 消 (3) 記 事 集 約 1 Wikipedia Wikipedia ( Wikipedia ) TAGME Wikipedia Wikipedia Wikipedia FIFA FIFA FIFA FIFA World Cup2014 FIFA World Cup FIFA FIFA World CupFIFA World Cup 2014 FIFA World Cup 2014 FIFA World Cup 3. 3 3. 1 TAGME Wikipedia Wikipedia 1 Wikipedia TAGME Wikipedia 3. 2 3. 4 w 5 5
a () con(w, a) w a ( ) P r(w a) w () P r(w) con(w, a) = P r(w a) P r(w) (6) w con(w, a) = 1 P r(w a) P r(w) count(w, a) P r(w a) = w count(w, a) a count(w, a) P r(w) = w,a count(w, a) count(w, a) w a ( ) (6) w a () (6) a w (6) (7) (8) (4) (6) a () w ρ(w, a, p a) = lp(a) + coherence(a, p a) + (1 con(w, a)) 3 (9) ρ(w, a, p a) > ρ NA a p a con(w, a) w con(w, a) (6) 20 (6) (9) 20 TAGME (4) 3. 5 FIFA World Cup2014 FIFA World Cup Microsoft WindowsiPhone () ワールドカップ キーワード (アンカーテキスト) カテゴリ iphone 候 補 となる 記 事 集 合 (a) カテゴリ iphone に 所 属 する 記 事 (b) 2 ( 数 字 )_ トピック 名 を トピック 名 に 集 約 集 約 後 の 記 事 カテゴリと 同 名 の 記 事 に 集 約 集 約 後 の 記 事 Wikipedia ( 2) 3. 4 3. 5. 1 2(a) 2014 FIFA World Cup() p x title x () re withyear extract(title x ) 1 p x re withyear title x 2 title x extract(title x ) = title y p y (1) (2) p x p y 35,847 8,240 6 3. 5. 2 Wikipedia ( 2(b)) 62014 11 06 Wikipedia
1 k 1 439,955 73,478 2 815,458 83,518 3 1,076,751 84,587 p x categories(p x ) c x cattitle x c x c parent (c x ) p x 1 p x categories(p x ) title x c x categories(p x ) 2 cattitle x = title y p y p x p y 3 k k c parent (c x ) k p x Wikipedia ID 1 k 4. 4. 1 Twitter ( ) Wikipedia Shirakawa [12] 2014 11 1 2015 1 15 2 300 () 2 1 5 647,937 13.0 91,219 55.1 () ( ) 100 5 () ( ) (::) : 2014 FIFA FIFA World Cup2014 FIFA World Cup
(hop0) k k = 1 k = 2 (hop1hop2) ( (6)) 2 TAGME [3] (TAGME(LP)) keyphraseness (TAGME(KP)) Micro (Precision) (Recall) (9) (4) ρ NA 0 1 4. 2 3 4 3 hop0hop1hop2 ( ) 3 hop1 4 TAGME TAGME 5 (hop1) TAGME(KP) ρ NA TAGME ρ NA = 0.2 : : ρ NA = 0.3 2 free mobilepisseddescend Web Wikipedia free mobile free mobile app 適 合 率率率 0.95 TAGME(LP) 0.9 TAGME(KP) 0.85 提 案 手 法 (hop1) 0.8 0.75 0.7 0.65 0.6 0 0.2 0.4 0.6 0.8 1 再 現 率率率 3 適 合 率率率 0.8 0.75 TAGME(LP) 0.7 TAGME(KP) 0.65 提 案 手 法 (hop0) 0.6 提 案 手 法 (hop1) 0.55 提 案 手 法 (hop2) 0.5 0.45 0.4 0.35 0.3 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 再 現 率率率 4 TAGME(KP) TAGME(LP) keyphraseness TAGME keyphraseness (hop0hop1) TAGME(KP) TAGME 5 (9) ρ (6) con : TAGME(KP) TAGME hop0hop1hop2 hop0 hop1 hop2 hop1
(a) (hop1) Twitter A(26240013) IT IT (2012 2016 ) (b) TAGME(KP) 5 ρ NA ice rink Ice Hockey Wikipedia ID Xperia Z3Sony Xperia hop2 () 2 hop0 163,564 hop1 hop2 152,288 139,680 5. Wikipedia [1] M. Cornolti, P. Ferragina, and M. Ciaramita, A Framework for Benchmarking Entity-annotation Systems, In WWW, pp.249 260, 2013. [2] S. Cucerzan, Large-Scale Named Entity Disambiguation Based on Wikipedia Data, In EMNLP-CoNLL, pp.708 716, 2007. [3] P. Ferragina, and U. Scaiella, Fast and Accurate Annotation of Short Texts with Wikipedia Pages, IEEE Software, vol.29, no.1, pp.70 75, 2011. [4] J. Hoffart, M.A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum, Robust Disambiguation of Named Entities in Text, In EMNLP, pp.782 792, 2011. [5] S. Kulkarni, A. Singh, G. Ramakrishnan, and S. Chakrabarti, Collective Annotation of Wikipedia Entities in Web Text, In KDD, pp.457 466, 2009. [6] M. Lesk, Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone, In SIGDOC, pp.24 26, 1986. [7] E. Meij, W. Weerkamp, and M. de Rijke, Adding Semantics to Microblog Posts, In WSDM, pp.563 572, Feb. 2012. [8] R. Mihalcea, and A. Csomai, Wikify!: Linking Documents to Encyclopedic Knowledge, In CIKM, pp.233 242, 2007. [9] D. Milne, and I.H. Witten, An Effective, Low-cost Measure of Semantic Relatedness Obtained from Wikipedia Links, In AAAI Workshop on Wikipedia and Artificial Intelligence, pp.25 30, July 2008. [10] D. Milne, and I.H. Witten, Learning to Link with Wikipedia, In CIKM, pp.509 518, 2008. [11] G. Salton, and C. Buckley, Term-weighting Approaches in Automatic Text Retrieval, Information processing & management, vol.24, no.5, pp.513 523, 1988. [12] M. Shirakawa, T. Hara, and S. Nishio, MLJ: Language- Independent Real-Time Search of Tweets Reported by Media Outlets and Journalists, In VLDB, vol.7, no.13, pp.1605 1608, 2014. [13],,,, Wikipedia,, vol.2014-dbs-160, no.11, pp.1 9, 2014.