Vol. 3 No. 2 91 101 (June 2010) 1 1 1 2 1 TSC2 Automatic Evaluation of Text Summaries by Using Paraphrase Kazuho Hirahara, 1 Hidetsugu Nanba, 1 Toshiyuki Takezawa 1 and Manabu Okumura 2 The evaluation of computer-produced summaries has been recognized as an important research problem for automatic text summarization. Traditionally, computer-produced summaries were evaluated automatically by n-gram overlap with human-produced texts. However, these methods cannot evaluate summaries correctly, if the n-grams do not overlap between computer-produced and human-produced summaries, even though the two summaries convey the same meaning. We explored the use of paraphrases for the refinement of traditional automatic methods for summary evaluation. To confirm the effectiveness of our method, we conducted some experiments using the data from the Text Summarization Challenge 2. We found that the use of paraphrases created using a statistical machine translation technique improved the traditional evaluation methods. 1. 13) 4),5) 18) 3 4 5 6 7 1 Graduate School of Information Sciences, Hiroshima City University 2 Precision and Intelligence Laboratory, Tokyo Institute of Technology 91 c 2010 Information Processing Society of Japan
92 2. 2.1 2.2 2.1 BLEU ROUGE BLEU 14) BLEU 1 N 9) BLEU BLEU ROUGE 10) Lin ROUGE 1 ROUGE-N ROUGE-N N S {Reference Summaries} gram ROUGE-N = Count n S match(gram n ) Count(gram (1) S {Reference Summaries} gram n S n) n N gram n Count match (gram n ) N Lin N 1 4 N =1 2 N=1 2.2 6) 16) 1) 16) 2 carbon dioxide carbon dioxide 1 1) 8),11) 2 2 7),18) Zhou ParaEval 18) (1) (2) 2 1 2 Greedy 3 1 2 ROUGE Zhou ParaEval ROUGE Zhou 3 1 2 Zhou ParaEval Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
93 Kauchak 7) Kauchak 3. TSC2 3) 30 10 20% 10 100 1 318 318 5 A 64 20.1% B 78 24.5% C 38 11.9% 1 5 1 10 100 D 39 12.3% E 99 31.2% A 3 B C D X Y X Y E A D Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
94 4. 4.1 4.2 A D 4.1 2 ParaEval ParaEval ParaEval ParaEval ParaEval (1) C D (2) (1) B A (3) (1) (2) (4) (1) (2) (3) (1) (2) Greedy (4) Lin 10) ParaEval ParaEval (1) (2) (1) C D (3) (1) (2) B A (4) (1) (2) (3) 4.2 ParaEval 4 SMT SMT DS WN WordNet NTT NTT SMT Zhou X Y X Y The Daily Yomiuri 150,000 17) GIZA++ 1 2 85,858 3 DS 1) (1) CaboCha 3 56 1 http://www.fjoch.com/giza++.html 2 3 http://chasen.org/ taku/software/cabocha/ Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
95 1 Table 1 Paraphrasing methods for text evaluation. Table 2 2 Correspondence of the classification of paraphrases and paraphrasing methods. SMT DS WordNet WN NTT NTT (2) (1) (3) (4) SMART 15) (2) (3) WordNet WN WordNet 2009 3 2), 1 WordNet synset WordNet synset NTT NTT NTT 4 1 2 3 4 4 1 http://nlpwww.nict.go.jp/wn-ja/ SMT DS Word NTT Net A 20.1% (64/318) B 24.5% (78/318) C 11.9% (38/318) D 12.3% (39/318) E 31.2% (99/318) 5. 4 5.1 TSC2 3) 1,150 30 20 20% 600 20 10 10 2 600 3 1 10 10 8 1 1 2 Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
96 3 3 4 Table 3 The distribution of evaluation results by three human subjects. Table 4 List of 15 proposed methods and a baseline method. A B C 4 0 0 29 5 53 0 67 6 89 38 180 7 143 372 200 8 165 170 93 9 89 20 27 10 61 0 4 1 3 10 A D 4 2, 3 SMT DS NTT NTT WordNet WN 4 15 ROUGE-1 4 4 5 EX-1 1 3 A 7.55 B 7.29 C 6.58 3 2 3 10 1 4 3 + A B C D 5.5 1 3 6 4 B 6.5 5.5 3 4 D 4.5 3 1 4 2.1ROUGE Lin 10) ROUGE-1 ROUGE-2 TSC2 Nanba 12) ROUGE-1 ROUGE-1 5 2 1 1 SMT DS WordNet NTT S D W N SD SW SN DW DN WN SDW SDN SWN DWN SDWN ROUGE-1 EX-4 4.1 ParaEval EX-5 EX-8 ParaEval ParaEval EX-1 9 EX-2 9 EX-3 9 EX-4 9 ParaEval EX-5 9 EX-6 9 Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
97 EX-7 9 EX-8 9 3 3 1.5 39 4 1 30 1 4 5.2 ParaEval ParaEval 5 6 30 5 6 5 EX-1 ROUGE-1 5 15 DW ROUGE-1 0.027 8.1% 6 15 DN ROUGE-1 0.020 5.9% 1 9 B 4 9 5 4 4 Table 5 Table 6 5 ParaEval Evaluation results using an extract-type reference summary (ParaEval method). EX-1 EX-2 S SMT 0.280 0.326 D DS 0.338 0.379 W WordNet 0.332 0.376 N NTT 0.332 0.367 SD 0.340 0.369 SW 0.358 0.336 SN 0.276 0.338 DW 0.359 0.326 DN 0.343 0.374 WN 0.332 0.376 SDW 0.339 0.331 SDN 0.348 0.356 SWN 0.346 0.350 DWN 0.358 0.327 SDWN 0.340 0.326 ROUGE-1 0.332 0.376 6 ParaEval Evaluation results using an abstract-type reference summary (ParaEval method). EX-3 EX-4 S SMT 0.334 0.364 D DS 0.349 0.421 W WordNet 0.337 0.448 N NTT 0.337 0.428 SD 0.348 0.374 SW 0.337 0.435 SN 0.294 0.352 DW 0.334 0.420 DN 0.357 0.403 WN 0.337 0.448 SDW 0.325 0.412 SDN 0.345 0.374 SWN 0.341 0.424 DWN 0.326 0.416 SDWN 0.329 0.400 ROUGE-1 0.337 0.448 Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
98 7 ParaEval 8 ParaEval Table 7 Evaluation results using an extract-type reference summary (Reverse ParaEval method). Table 8 Evaluation results using an abstract-type reference summary (Reverse ParaEval method). EX-5 EX-6 EX-7 EX-8 S SMT 0.265 0.373 D DS 0.377 0.409 W WordNet 0.346 0.398 N NTT 0.350 0.398 SD 0.343 0.390 SW 0.337 0.382 SN 0.270 0.384 DW 0.348 0.381 DN 0.373 0.409 WN 0.346 0.398 SDW 0.340 0.380 SDN 0.335 0.389 SWN 0.342 0.383 DWN 0.345 0.383 SDWN 0.334 0.382 S SMT 0.308 0.352 D DS 0.337 0.420 W WordNet 0.336 0.440 N NTT 0.335 0.437 SD 0.347 0.389 SW 0.349 0.377 SN 0.310 0.349 DW 0.349 0.375 DN 0.339 0.424 WN 0.336 0.440 SDW 0.350 0.380 SDN 0.342 0.394 SWN 0.368 0.383 DWN 0.351 0.367 SDWN 0.359 0.373 ROUGE-1 0.332 0.376 ROUGE-1 0.337 0.448 ParaEval ParaEval 7 8 7 8 7 D ROUGE-1 0.045 13.6% 8 SWN ROUGE-1 0.031 9.2% ParaEval 6. EX-1 ROUGE-1 EX-3 EX-1 EX-4 1 ParaEval ParaEval EX-1 EX-4 ParaEval EX-5 EX-8 ParaEval EX-2 EX-6 4.1 ParaEval ParaEval ParaEval (1) ParaEval (2) Para- Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
99 Eval (2) ParaEval (3) 1 ParaEval ParaEval ParaEval EX-1 EX-4 ParaEval EX-5 EX-8 SDWN 1 ParaEval 5.2 49.1 ParaEval 0 4.6 ParaEval ParaEval ParaEval ParaEval 4.2 4 SMT DS ParaEval 1 ParaEval EX-1 EX-5 EX-3 EX-7 EX-8 EX-4 1 Kauchak 7) 2 place Kauchak ParaEval 4 D W Word- Net S N NTT S N 30 Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
100 7. Zhou ParaEval Zhou WordNet NTT TSC2 0.045 ParaEval ParaEval 1) Vol.49, No.3, pp.1426 1436 (2008). 2) Bond, F., Isahara, H., Uchimoto, K., Kuribayashi, T. and Kanzaki, K: Extending the Japanese WordNet 15 pp.80 83 (2009). 3) Fukushima, T., Okumura, M. and Nanba, H: Text Summarization Challenge 2/Text Summarization Evaluation at NTCIR Workshop 3, Working Notes of the 3rd NTCIR Workshop Meeting, PART V, pp.1 7 (2002). 4) Vol.47, No.6, pp.1753 1766 (2006). 5) Hovy, E., Lin, C.-Y., Zhou, L. and Fukumoto, J: Automated Summarization Evaluation with Basic Elements, Proc. 5th Conference on Language Resources and Evaluation (2006). 6) Vol.11, No.5, pp.151 198 (2004) 7) Kauchak, D. and Barzilay, R: Paraphrasing for Automatic Evaluation, Proc. 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp.455 462 (2006). 8) Lee, L: Measures of Distributional Similarity, Proc. 37th Annual Meeting of the Association for Computational Linguistics, pp.25 32 (1999). 9) Lin, C.-Y. and Hovy, E: Automatic Evaluation of Summaries Using N-gram Cooccurrence Statistics, Proc. 4th Meeting of the North American Chapter of the Association for Computational Linguistics and Human Language Technology, pp.150 157 (2003). 10) Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries, Proc. ACL-04 Workshop Text Summarization Branches Out, pp.74 81 (2004). 11) Lin, D.: Automatic Retrieval and Clustering of Similar Words, Proc. 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, pp.768 774 (1998). 12) Nanba, H. and Okumura, M: An Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method, Proc. COLING/ACL 2006 Main Conference Poster Sessions, pp.603 610 (2006). 13) Vol.23, No.1, pp.10 16 (2008). 14) Papineni, K., Roukos, S., Ward, T. and Zhu, W.-J.: BLEU: A Method for Automatic Evaluation of Machine Translation, IBM Research Report, RC22176 (W0109-0220) (2001). 15) Salton, G.: The SMART Retrieval System. Experiments in Automatic Document Processing, Prentice-Hall, Inc., Upper Saddle River, NJ (1971). 16) 14 pp.123 126 (2008). 17) Utiyama, M. and Isahara, H.: Reliable Measures for Aligning Japanese-English News Articles and Sentences, Proc. 41st Annual Meeting of the Association for Computational Linguistics, pp.72 79 (2003). 18) Zhou, L., Lin, C.-Y., Munteanu, D.S. and Hovy, E.: ParaEval: Using Paraphrases to Evaluate Summaries Automatically, Proc. 2006 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL), pp.447 454 (2006). ( 21 12 20 ) ( 22 4 7 ) Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan
101 2008 2010 1984 1989 2007 18 ISS 1996 1998 2001 2002 2010 ACL ACM 1962 1984 1989 1992 2000 2009 AAAI ACL Vol. 3 No. 2 91 101 (June 2010) c 2010 Information Processing Society of Japan