2011 2012 3 26 ( : A8TB2114)
i
1 1 2 3 2.1 Espresso................................. 3 2.2 CPL................................... 4 2.3.................................... 5 2.4......................... 5 2.5.................................. 5 2.6................................ 6 2.7......................... 6 3 8 3.1.................... 8 3.2............................ 8 4 12 4.1......................................... 12 4.2......................................... 12 4.3......................................... 13 4.4........................................... 14 5 18 A 22 ii
1 [10] [8, 2] [1, 16, 6] [1, 16, 6] Espresso [11] [7] 3.1 3.1 X X 1
5 2 3 Espresso CPL 4 5 2
2 Pantel and Pennacchiotti Espresso [11] Carlson CPL [3] 2.1 Espresso Espresso [11]... X X N Espresso p r π (p) i r ι (i) r π (p) = 1 pmi(i, p) I max pmi r ι(i) (2.1) i I r ι (i) = 1 pmi(i, p) P max pmi r π(p) (2.2) p P pmi(i, p) = log 2 i, p i,, p (2.3) P I P I i, i, p p i, p p i max pmi pmi 3
pmi (2.3) pmi(i, p) i, p i, p + 1 min( i,,, p ) min( i,,, p ) + 1 (2.4) (2.1) (2.2) Espresso Espresso N N 2.2 CPL CPL [3] 3 1 x y x y 3 x 7000 3 7000 Carlson p 2.5 i c count(i, p) P recision(p) = count(p) (2.5) c count(i, p) i p count(p) p c N 2 4
2.3 XX Espresso CPL [7] 2.4 Curran 1 Mutual Exclusion Bootstrapping [4] 2.2 Carlson CPL [3] Curran 2.5 Girju [5] 5
Pennacchiotti and Pantel [12] Wikipedia IMDB Sadamitsu LDA [13] LDA Sadamitsu 2.6 Min [9] Vyas [14] 2.7 Vyas [15] 6
7
3 2 3.1 2 3.1 2 3.1 X X 3.2 3.2 8
"! "! " " "! " " " "! "! "!!!! " " "! "! #$%! #$&! #$' #$%&'()!! 3.1: * * 3.2: * * 2 * * 2 4.2 4.2 2 3.3 X X 9
" " " " " " 3.3: 3.3 X X X 3.1 3.1 C p Score(C, p) = Entropy(C, p) Recall(C, p) (3.1) Entropy(C, p) = c C P c (p) log C P c (p) (3.2) Recall(C, p) = P c (p) = c C cooccur(p, c) c C I s c (3.3) cooccur(p, c) c C cooccur(p, c) (3.4) cooccur(p, c) = I sc I pc (3.5) I sc c I pc p c I pc I sc p c C C Entropy(C, p) 10
Recall(C, p) Entropy(C, p) Recall(C, p) Score(C, p) p Score(p) N N 100 15 11
4 4.1 1 1 KNP 2 41 4.2 1 2 4.2 15 A [17] Wikipedia Carlson CPL Pantel and Pennacchiotti Espresso Espresso X 13 4 CPL 15 Espresso 15 15 4.2 (Precision) (4.1) = (4.1) 12
'"# +,-# &$# &"# %$# %"#./012//3+-#./012//3+-4 567# 5674 $$# $"#!$#!"# ("""# ($""# )"""# )$""# *"""# *$""#!"""#!$""# $"""# $$""# %"""# 4.1: 4.3 8 4.1 Espresso() CPL Espresso()+CPL+ 4.1 Espresso()+CPL+ Espresso() CPL CPL Espresso() 4.4 4.1 Espresso() CPL Espresso() Espresso( )+Espresso() Espresso()+ 90 4 15 4.1 4.1 4.1 4.1 15 Espresso() 13
!"#$%""& '(!"#$%""&) '() * ) ) ) ) ) 4.1: Espresso 90 15 Espresso()+ 4.1 90 Espresso() Espresso()+ 4.2 4.2 3 4.2 Espresso() Espresso()+ 4.4 4.4 4.1 Espresso() Espresso()+ 4.1 14
4.1 4.2 2 1. 2. 1. 1. 2. Wikipedia 4.1 CPL Espresso() CPL Espresso() Espresso() CPL Espresso() 15
CPL CPL CPL Espresso() Espresso() CPL 16
!!! "#$%&'! "#&$(%! "#&)%% "#%)+&! "#&'+)! "#+*%, "#*'+,! "#*+++! "#*+++ *#""""! *#""""! "#$+*$ *#""""! "#$%&'! "#$%&' "#$"$)! "#'%&*! "#((+( "#,&,)! "#&&*"! "#&("$ *#""""! *#""""! *#"""" "#'&*'! "#,*(%! "#%%*' "#&,&,! "#((*)! "#+,($ "#$%&'! "#'%+"! "#'%+" "#%$'$! "#%"()! "#&"$" *#""""! "#$)$$! "#'&*' "#*%*'! "#*)"+! "#*"') "#'$)"! "#'&*'! "#'+)( "#(&'&! "#((%"! "#(*'+ "#'$)"! "#,%+&! "#&%$' "#%(,+! "#)"*(! "#*%&% 4.2: 90 Espresso 17
5 Pantel and Pennacchiotti Espresso Carlson CPL 4.4 18
NHK 19
[1] S. Abney. Understanding the Yarowsky Algorithm. Computational Linguistics, Vol. 30, No. 3, 2004. [2] Razvan Bunescu and Raymond Mooney. Collective information extraction with relational markov networks. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL 04), Main Volume, 2004. [3] Andrew Carlson, Justin Betteridge, Richard C. Wang, Estevam R. Hruschka Jr., and Tom M. Mitchell. Coupled semi-supervised learning for information extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining (WSDM 2010), 2010. [4] James R. Curran, Tara Murphy, and Bernhard Scholz. Minimising semantic drift with mutual exclusion bootstrapping. In Pacific Association for Computational Linguistics, 2007. [5] Roxana Girju, Adriana Badulescu, and Dan Moldovan. Automatic discovery of part-whole relations. Comput. Linguist., 2006. [6] Marti A. Hearst. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th conference on Computational linguistics - Volume 2. Association for Computational Linguistics, 1992. [7] Mamoru Komachi, Taku Kudo, Masashi Shimbo, and Yuji Matsumoto. Graph-based analysis of semantic drift in espresso-like bootstrapping algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2008. [8] Andrew McCallum and Wei Li. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, 2003. [9] Bonan Min and Ralph Grishman. Fine-grained entity set refinement with user feedback. In Proceedings of the RANLP 2011 Workshop on Information Extraction and Knowledge Acquisition, 2011. [10] Patrick Pantel, Eric Crestan, Arkady Borkovsky, Ana-Maria Popescu, and Vishnu Vyas. Web-scale distributional similarity and entity set expansion. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2, 2009. [11] Patrick Pantel and Marco Pennacchiotti. Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, 2006. 20
[12] Marco Pennacchiotti and Patrick Pantel. Automatically building training examples for entity extraction. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning, 2011. [13] Kugatsu Sadamitsu, Kuniko Saito, Kenji Imamura, and Genichiro Kikui. Entity set expansion using topic information. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2. Association for Computational Linguistics, 2011. [14] Vishnu Vyas and Patrick Pantel. Semi-automatic entity set refinement. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2009. [15] Vishnu Vyas, Patrick Pantel, and Eric Crestan. Helping editors choose better seed sets for entity set expansion. In Proceedings of the 18th ACM conference on Information and knowledge management, 2009. [16] David Yarowsky. Unsupervised word sense disambiguation rivaling supervised methods. In In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, 1995. [17],,. Wikipedia. = Journal of natural language processing, Vol. 16, No. 3, 2009. 21
A
23