Vol. 44 No. 11 Nov. 2003 2 (1) (2) Exploring Transfer Errors in Lexical and Structural Paraphrasing Atsushi Fujita and Kentaro Inui In lexical and structural paraphrasing, meaning-preserving linguistic transformations are performed such as lexical or phrasal replacements and alternations in case, voice. In this paper, we report the results of our investigation into transfer errors, which reveals: (1) what types of errors tend to occur in generating lexical and structural paraphrases of Japanese sentences, and (2) which of them should be given preference as the subject of further research. We found that most types of errors occurred irrespective of the types of transfer. The sorts of lexical errors and syntactic (collocative) errors should be tackled firstly, since they not only occurred more frequently, but also seems to be solved by maintaining revision patterns or utilizing statistical language models. 1. 37),40) 24),25) 5),15) Graduate School of Information Science, Nara Institute of Science and Technology 28) 1),11),16),30) (1) (2) (1) s. t. (2) s. t. (1) (2) (3) s t t t NV 2826
Vol. 44 No. 11 2827 (3) a. N=>N (N N ) b. N V (V ) => N V (V V ) (4.s) (4.r) (4.t) (4) r. N V => N V s. t. 1. (4.t) (5) (5)? 2. (5) (6) (6) 12),18),21),22),35),41) 2 1. 2. 2 1 3 2 4 5 6 7 2. 1 (i) (ii) (iii) (iii) (ii) (i) 1 Fig. 1 A generic model of paraphrasing.
2828 Nov. 2003 V1V2 N1N2V => NAdj(:$X) => V1V2 =>N$X N2N1V 2 Fig. 2 A paraphrasing model bases on syntactic transfer and revision. (iii) 6) 2 2 3. 1 Kura 39) 1. 2 2.
Vol. 44 No. 11 2829 3.1 1 21) (7) (8) + 21) (7) s. t. (8) + + + 0 1. + 2. 3. 4. 5. 22) (9) (10) 22) (9) s. t. (10) N1 N2 V => N2 N1 V a N2 b V c V d V (8)(10) (8) 2 (11) 1 (4.t) (5) (11) t. t. (8) 3 1 9),18),21) 17),22) 22) 23),31) 12),18),21),22),35),41) 2 3.2 4 + 21) 22) 41) 12) (7) (9) 3 Kura Kura 4 1 + 2 27) 4),36) 3 21)22) 4 http://cl.aist-nara.ac.jp/lab/kura/doc/
2830 Nov. 2003 1 Table 1 Knowledge resources for error exploration. 5 4 22 21) 40 IPA 25 22) 291 30 500 41) 261 42 3 185 12) 1 3 KURA N1N2V N1N2V => => N2N1V N2N1V 2 => => => 4 Fig. 3 3 Knowledge decomposition scheme for exploring revision patterns. 21) (8) + + 1 22) (10) (12) s. t. 41) (12) Kura (13) s. t. 12) (13) Kura 3.3 1 3 Step 1 1 Step 2 2 3 Kura Step 3 Kura
Vol. 44 No. 11 2831 => => => N X - V V V Xi Xi-1 Xi Xi-1 Xi Ni-1 Ni Ni-1 Ni 4 Fig. 4 Abstraction of revision patterns. Step 4 Step 1 Step 5 Step 34 Step 2 (10) ad Step 4 (10) c d (10) a b a 2 1 3.4 2 1 2 1 Kura 4 26) IPADIC 3) 14 5 4. 2 3 4.1 28,000 Kura 1 9 1,220 680
2832 Nov. 2003 Table 2 2 Numbers of rules and paraphrase candidates for each type of transfer. 33 291 248 6,642 18 3,630 13,348 3,942 28,152 148 77 19 46 20 60 252 58 680 (9) (12) (13) (7) (14) (15) (16) (17) - 2 4 3 33),34) + 4 (14) s. t. 38) 3 38) (15) s. t. 13) 9) (16) s. t. 9) 2 EDR 8) 21) + 34) 33) (17) s. t. 34) 4.2 630 3 3 4 3 (A) (B) (C) (D) 114 222 294
Vol. 44 No. 11 2833 Table 3 3 Error distribution for each type of transfer. 138 75 19 39 20 60 221 58 630 137 57 9 35 17 53 172 36 516 (a) 125 41 3 31 7 43 47 6 303 (b) 42 14 2 3 5 8 4 78 (c) 6 2 8 (d) 7 4 11 (e) 66 8 28 57 3 162 (f) 0 (g) (e) 3 28 5 36 (h) 30 1 31 (i) 1 5 3 13 22 (j) 2 1 3 6 (k) 1 1 (l) 23 2 7 2 34 (m) 10 1 10 1 22 (n) 2 4 2 8 38 16 2 7 8 3 19 22 115 (A) 9 1 26 4 40 (B) 18 20 38 (C) 7 5 5 1 22 1 41 (D) 8 1 1 1 1 1 2 15 33.9% 114/336 18.1%114/630 5. 3 5.1 (4.t)(11.t) (a) 1 1 1 (b) (18)(19) (18) s. t. t. (19) s. t. t. (a) (i) 5.2 (c)(e) (f)(g) (c)(e) (e) (20.t)(21)t (20) s. t. (21) s. t. t. (20) exceed (15) (20) exceed (21) marry
2834 Nov. 2003 162 22 2 19) 43) 29) / (22) (23) (24) (g) (22) s. t. (23) s. t. (24) s. t. 5.3 (h) ambiguityvagueness (c)(g) (25) EDR 8) 3cf6fc (25) 3be172 1 (25) s. t. 31 12 (25) 19 (26)t (26) s. t. 2 Edmonds 7) near-synonym Inkpen 14) 32) 5.4 (j)(n) (j) (k) (27) (27) s. t. t.
Vol. 44 No. 11 2835 (l) (28) 42) (28) s. t.? t. (m) (n) 31) / 5.5 3 3 (a)(g) 132 109 73.2%246/336 47.2%246/521(a)(g) 6. Allen 2) tri-text 8 Knight 20) 1,600 2 aanthe Knight 3 (i) (ii) (iii)
2836 Nov. 2003 7. (e) 10) 5 Kura 1) ACL: The 2nd International Workshop on Paraphrasing: Paraphrase Acquisition and Applications (IWP) (2003). 2) Allen, J. and Hogan, C.: Toward the development of a postediting module for raw machine translation output: a controlled language perspective, Proc. 3rd International Workshop on Controlled Language Applications (CLAW ), pp.62 71 (2000). 3) IPADIC (2002). 4) Barzilay,R. and McKeown,K.R.: Extracting paraphrases from a parallel corpus, Proc. 39th Annual Meeting of the Association for Computational Linguistics and 10th Conference of the European Chapter of the Association for Computational Linguistics (ACL-EACL), pp.50 57 (2001). 5) Carroll, J., Minnen, G., Pearce, D., Canning, Y., Devlin, S. and Tait, J.: Simplifying text for language-impaired readers, Proc. 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL), pp.269 270 (1999). 6) Dorna, M., Frank, A., van Genabith, J. and Emele, M.C.: Syntactic and semantic transfer with F-structures, Proc. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics (COLING- ACL), pp.341 347 (1998). 7) Edmonds, P.: Semantic representations of near-synonyms for automatic lexical choice, Ph.D. Thesis, CSRI-399, Department of Computer Science, University of Toronto (1999). 8) EDR (1995). 9) 7 pp.331 334 (2001). 10) NL-156-8, pp.53 60 (2003). 11) : 7
Vol. 44 No. 11 2837 (2001). 12) Kura 63 pp.5 6 (2001). 13) CD-ROM (1997). 14) Inkpen, D.Z. and Hirst, G.: Building a lexical knowledge-base of near-synonym differences, Proc. 2nd Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL) Workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations (2001). 15) TL2001-8, pp.51 58 (2001). 16) 8 pp.1 21 (2002). 17) (2001). 18) NL-144-23, pp.167 174 (2001). 19) Vol.9, No.1, pp.3 19 (2002). 20) Knight, K. and Chander, I.: Automated postediting of documents, Proc. 12th National Conference on Artificial Intelligence (AAAI ), pp.779 784 (1994). 21) + Vol.40, No.11, pp.4064 4074 (1999). 22) Vol.42, No.3, pp.465 477 (2001). 23) 15 1A1-06 (2001). 24) 6 pp.21 28 (2000). 25) Lin, D. and Pantel, P.: Discovery of inference rules for question answering, Natural Language Engineering, Vol.7, No.4, pp.343 360 (2001). 26) version 2.2.9 (2002). 27) Melamed, I.D.: Empirical methods for exploiting parallel texts, MIT Press (2001). 28) Mitamura, T. and Nyberg, E.: Automatic rewriting for controlled language translation, Proc. 6th Natural Language Processing Pacific Rim Symposium (NLPRS) Workshop on Automatic Paraphrasing: Theories and Applications, pp.1 12 (2001). 29) Bayesian Network NL-119-12, pp.77 84 (1997). 30) NLPRS: Workshop on Automatic Paraphrasing: Theories and Applications (2001). 31) 8 pp.335 338 (2002). 32) 9 pp.97 100 (2003). 33) (1981). 34) RWCRWC 2 / 5 (1998). 35) Vol.40, No.7, pp. 2937 2945 (1999). 36) Shinyama, Y., Sekine, S., Sudo, K. and Grishman, R.: Automatic paraphrase acquisition from news articles, Proc. Human Language Technology Conference (HLT ) (2002). 37) Vol.36, No.1, pp.12 21 (1995). 38) pp.353 388, (1995). 39) Takahashi, T., Iwakura, T., Iida, R., Fujita, A. and Inui, K.: Kura: a transfer-based lexicostructural paraphrasing engine, Proc. 6th Natural Language Processing Pacific Rim Symposium (NLPRS) Workshop on Automatic Paraphrasing: Theories and Applications, pp.37 46 (2001). 40) Vol.39, No.3, pp.542 550 (1998). 41) (2002). 42)
2838 Nov. 2003 NL-135-8, pp.55 62 (1999). 43) NL-119-11, pp.69 76 (1997). ( 14 9 26 ) ( 15 9 5 ) 1977 2000 2002 1967 1995 1998 1998 2001 21 2001