1,327 Annotation of Focus for Negation in Japanese Text Suguru Matsuyoshi This paper proposes an annotation scheme for the focus of negation in Japanese text. Negation has a scope, and its focus falls within this scope. The scope of negation is the part of the sentence that is negated. The focus of negation is the part of the scope that is prominently negated. In natural language processing, correct interpretation of negated statements requires precise detection of the focus of negation in the statements. As a foundation for developing a focus detector, we have annotated a part of Rakuten Travel: User Review Data and a part of a newspaper subcorpus of the Balanced Corpus of Contemporary Written Japanese, with our annotation scheme. In this scheme, a negation cue in the text data is linked to the focus by annotation with identifying clues. These clues include focus particles such as wa and shika, and other expressions in the context. We report 1,327 negation cues and the foci in the corpora. Key Words: Negation, Focus of Negation, Corpus Annotation, Modality, Interdisciplinary Graduate School of Medicine and Engineering, University of Yamanashi
Vol. 21 No. 2 April 2014 1 MeCab 1 JUMAN 2 CaboCha 3 KNP 4 5 KNP SynCha 6 ( 2007) (1) [ ] (2) [ ] (1) (1) (2) (1) (2) (1) (2) 1 http://mecab.googlecode.com/svn/trunk/mecab/doc/index.html 2 http://nlp.ist.i.kyoto-u.ac.jp/index.php?juman 3 http://code.google.com/p/cabocha/ 4 http://nlp.ist.i.kyoto-u.ac.jp/index.php?knp 5 6 https://www.cl.cs.titech.ac.jp/ ryu-i/syncha/ 250
( 2007; Blanco and Moldovan 2011a) 2 ( 2009) 2 3 4 5 2 6 2 (Huddleston and Pullum 2002; 2010; 2007) ( 2007, 2009) ( 1986; 1999; 2009; 2009) BioScope (Vincze, Szarvas, Farkas, Móra, and Csirik 2008) not without Morante 251
Vol. 21 No. 2 April 2014 (Morante, Liekens, and Daelemans 2008) Li BioScope (Li, Zhou, Wang, and Zhu 2010) *SEM 2012 7 Shared task 1 Conan Doyle 8 ( 2011) Blanco PropBank (Babko-Malaya 2005) (Blanco and Moldovan 2011a) (1) not MNEG (2) MNEG (3) A0, A1, A2, TMP, LOC 9 Blanco (Blanco and Moldovan 2011a, 2011b) *SEM 2012 Shared task 1 10 Rosenberg 4 (Rosenberg and Bergler 2012) 1 ( 2010) 3 7 http://ixa2.si.ehu.es/starsem/ 8 http://www.clips.ua.ac.be/sem2012-st-neg/ 9 MNEG 10 http://www.clips.ua.ac.be/sem2012-st-neg/ 252
3.1 (BCCWJ) 11 12 (3) [PN1a 00002] (4) [PN2f 00002] (5) [PN2f 00003] (6) [PN4g 00001] 1 ( 2007) 13 (3) (4) (5) (6) WHO 11 http://www.ninjal.ac.jp/corpus center/bccwj/ 12 PN BCCWJ 13 c- ( 2010; 2006) 253
Vol. 21 No. 2 April 2014 ( 2007; Blanco and Moldovan 2011a) (5) (4) 1 3.2 3 ( 1989; 2007) 1 (7) [PN1b 00004] 3.3 2 1 2 254
2 ( 1998; 2000) ( 1989; 1998) 3.4 (8) [PN2f 00002] (9) [PN2g 00004] (10) [PN3b 00004] (8) (9) (10) ( 2007) 14 (8) (10) 14 ( 2007) 255
Vol. 21 No. 2 April 2014 15 1 3.1 ( 2007) (11) (12) (11) [PN1e 00004] (12) [PN1b 00002] 3.5 ( 2009) (13) [PN3d 00003] (14) [PN3b 00004] (13) (14) 16 15 3.1 1 16 (14) 256
(14) 3.1 (13) 3.1 (13 ) (13 ) [ ] ( 2009) (15) [PN1e 00003] 3.1 2 ( 1986; 1999; 2009; 2009) 2 2 ( 1999) (16) [( 1999) p. 29] 4.1 3.1 257
Vol. 21 No. 2 April 2014 3.6 2 ( 2007) i j (17) 1 j i j [ ] (18) j i j [ ] (19) k j i i j [ ] (20) j i j [ ] (17) 2 1 (18) (19) 3 2 (20) 17 (17) (18) 3.1 (17) 3.1 1 (19) 3.1 17 5 258
3.5 3.1 (21) i i j [ ] 4 4.1 1 3.1 (1) (2) (3) A B (3) 4.2 5 259
Vol. 21 No. 2 April 2014 ID ID 3.2 YYYYMMDD UniDic 18 7 ID ID - - - - - - 1 1 1 1 - - 18 http://sourceforge.jp/projects/unidic/ 260
19 1 20 1 1 1 4.3 4.4 1 XML 3.1 (3) 19 20 1 2 261
Vol. 21 No. 2 April 2014 1 XML [PN1a 00002] 1 <sentence> <SUW> <tok> 1 ID BCCWJ XML -f 3 CaboCha <sentence> ID XML <wsb:negation> <wsb:focus> <wsb:description> <wsb:clue> 21 <wsb:negation> 1 <sentence> 4.2 @wsb:orthtoken : @wsb:morphid : ID @wsb:pos : @wsb:doublenegative : 21 wsb 262
@wsb:lastupdate : <wsb:focus> <wsb:negation> 1 @wsb:scope <wsb:description> <wsb:clue> <wsb:focus> 22 @wsb:orthtoken : @wsb:morphid : ID @wsb:argtypes : @wsb:toritate : @wsb:class : <wsb:description> 1 <wsb:clue> @wsb:sid : ID @wsb:orthtokens :. @wsb:morphids : ID. <wsb:clue> 2 <sentence> 1 BCCWJ XML XML CaboCha ( 2010) 5 2 (1) 23 : (2) BCCWJ (PN) 22 @wsb:numofcandidates 4.2 1 pl 23 http://travel.rakuten.co.jp/ 263
Vol. 21 No. 2 April 2014 5.1 : : ( 2012) 24 90% 1 58 10 58 40 5,178 1,246 5.2 BCCWJ BCCWJ 1/100 25 340 1 54 A 1 XML <sentence> 2,708 406 5.3 4.4 XML HTML HTML 2 HTML XML 100 3 XML 2 2 304 103 2 2 3 24 25 http://d.hatena.ne.jp/masayua/20120807/1344313720 264
2 HTML 1 1 1 5.4 2 1 2 1,023 304 2 301 72 29% (301/1,023) 24% (72/304) 30% 265
Vol. 21 No. 2 April 2014 2 1 2 3 2 35% (129/373) 3.5 4 - - 26 1 2 637 173 810 116 33 149 19 34 53 211 53 264 28 6 34 12 5 17 (1,023) (304) (1,327) 94 30 124 121 72 193 8 0 8 (223) (102) (325) 1,246 406 1,652 141 18 159 30 5 35 7 6 13 49 11 60 17 6 23 5 4 9 3 2 5 3 1 4 1 2 3 20 7 27 8 8 16 1 0 1 1 2 3 1 0 1 14 0 14 301 72 373 26 1 266
86 20 8 1 160 4.2 5 2 2 373 375 87% (327/375) 3 4 66 13 79 34 7 41 7 1 8 0 1 1 107 22 129-13 5 18-27 12 39-10 9 19-40 3 43-10 5 15-43 12 55-125 15 140-19 11 30-14 0 14 301 72 373 5 271 56 327 32 16 48 303 72 375 267
Vol. 21 No. 2 April 2014 6 2 3 1 2 BCCWJ 3 ( 2013) BCCWJ (B) 25870278 268
Babko-Malaya, O. (2005). PropBank Annotation Guidelines. ACE (Automatic Content Extraction) Program. http://verbs.colorado.edu/~mpalmer/projects/ace/pbguidelines. pdf. Blanco, E. and Moldovan, D. (2011a). Semantic Representation of Negation Using Focus Detection. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pp. 581 589. Blanco, E. and Moldovan, D. (2011b). Some Issues on Detecting Negation from Text. In Proceedings of the 24th International Florida Artificial Intelligence Research Society Conference, pp. 228 233. (1998)... Huddleston, R. and Pullum, G. K. (Eds.) (2002). The Cambridge Grammar of the English Language. Cambridge University Press. (2006)... (2010)... (2011). Ver.2.4. Technical Report of Department of Information Science, Ochanomizu University. (2009).., 136, pp. 121 151. (2012).. 18, pp. 1188 1191. Li, J., Zhou, G., Wang, H., and Zhu, Q. (2010). Learning the Scope of Negation via Shallow Semantic Parsing. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pp. 671 679. (1998)... (2010)... D,, 93 (6), pp. 705 713. (1999).., 28, pp. 27 36. Morante, R., Liekens, A., and Daelemans, W. (2008). Learning the Scope of Negation in Biomedical Texts. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 715 724. (1989)... 269
Vol. 21 No. 2 April 2014 (2007). 3.. (2009). 5.. (2000)... (2009)... (1986)... (2013).. 19, pp. 936 939. Rosenberg, S. and Bergler, S. (2012). UConcordia: CLaC Negation Focus Detection at *Sem 2012. In Proceedings of the 1st Joint Conference on Lexical and Computational Semantics: SemEval 12, pp. 294 300. Vincze, V., Szarvas, G., Farkas, R., Móra, G., and Csirik, J. (2008). The BioScope Corpus: Biomedical Texts Annotated for Uncertainty, Negation and their Scopes. In BMC Bioinformatics, pp. 1 9. 2003 2008 2013 9 20 2013 11 28 2013 12 13 270