1,a) 2,b) 3,c) 1,d) CYK 552 1. 2 ( 1 ) ( 2 ) 1 [1] 2 [2] [3] 1 Graduate School of Information Science, Nagoya University, Japan 2 Information Technology Center, Nagoya University, Japan 3 Information & Communications, Nagoya University, Japan a) yoshida@db.ss.is.nagoya-u.ac.jp b) ohno@nagoya-u.jp c) yoshihide@icts.nagoya-u.ac.jp d) matubara@nagoya-u.jp [4], [5], [6], [7], [8] [9], [10], [11], [12], [13] 552 2 c 2015 Information Processing Society of Japan 1
2 3 4 5 2. [1] 1 1 2 1 2 3. 3.1 B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i < j n 2 b i b j b i o i,j = 1 o i,j = 0 d i b i S = O, D P (S B) (1) 式 (2) 低 入力 語順整序 係り受け解析 出力 1 入力文の語順の適切さ 入力 式 (1) 出力 (1) (3) 入力 係り受け解析 語順整序 出力 Fig. 1 Relationships among formulas (1) to (3). P (S B) = P (O, D B) 高 式 (3) = {P (O B) P (D O, B)} α (1) {P (D B) P (O D, B)} 1 α (0 α 1) (1) (2) (3) *1 α (0 α 1) P (O, D B) = P (O B) P (D O, B) (2) P (O, D B) = P (D B) P (O D, B) (3) (2) (3) P (O, D B) 1 (1) (3) S = O, D (2) P (O, D B) (3) (2) (3) (1) P (O, D B) α (1) (3) y = a x 0 a 1 0 x 1 (1) α (2) (1) 2 *1 c 2015 Information Processing Society of Japan 2
o i,j 2 d i *2 P (O B) = n 1 n i=1 j=i+1 P (o i,j B) (4) n 1 P (D O, B) = P (d i O, B) (5) i=1 n 1 P (D B) = P (d i B) (6) i=1 n 1 P (O D, B) = n i=1 j=i+1 P (o i,j D, B) (7) P (o i,j B) B b i b j o i,j P (d i O, B) B O b i d i P (d i B) B b i d i P (o i,j D, B) D B b i b j o i,j P (o i,j D, B) [2] P (o i,j B) P (o i,j D, B) P (d i O, B) [14] P (d i B) P (d i O, B) 3.2 B O D S O D CYK P (O, D B) O D ( 1 ) *2 o i,j d i O D [14] ( 2 ) ( 3 ) *3 (1) (3) D (2) (3) O CYK CYK 3.2.1 2 n n n M i,j (1 i j n) 4 i j M i,j B i,j = b i b j, O i,j D i,j argmax Si,j P (S i,j B i,j ) S i,j d x (i x j) S 1,3 = d 3 2 d3 1 d0 3 b 2 1 b 1 2 b 3 {d 3 1, d3 2, d0 3 } d y x b x b y d 0 j b j 4 6 M i,i d 0 i 7 15 M i,i M i,j M i,j 10 12 concatreorder M i,k M k+1,j C i,j 13 b i b j argmax Si,j C i,j P (S i,j B i,j ) M i,j *3 2 2 c 2015 Information Processing Society of Japan 3
1: input B 1,n := b 1 b n // 2: set M i,j := null (1 i j n) // 3: set C i,j := null (1 i j n) // 4: for i := 1 to n do 5: M i,i := d 0 i 6: end for 7: for d := 1 to n 1 do 8: for i := 1 to n d do 9: j := i + d 10: for k := i to j 1 do 11: C i,j := C i,j concatreorder(m i,k, M k+1,j ) 12: end for 13: M i,j := argmax Si,j C i,j P (S i,j B i,j ) 14: end for 15: end for 16: return M 1,n Fig. 2 2 Word reordering algorithm. 1: function concatreorder(s 1, S 2 ) 2: begin 3: set C := null // 4: set z := last(s 2 ) // last(s 2 ) S 2 5: 6: // 7: // 8: // S 1 b z S 1 9: S 1 := changelastdep(z, S 1) 10: // 11: C := { S 1 S 2 } 12: 13: // 14: for each d {d y x y = z, d y x S 2 } do 15: // S 2 d S2 l Sr 2 16: (S2 l, Sr 2 ) := divide(d, S 2) 17: C := C { S2 l S 1 Sr 2 } 18: end for 19: 20: return C 21: end 3 Fig. 3 concatreorder Function: concatreorder. M 1,n 16 M 1,n 3 concatreorder concatreorder 2 S 1 (= M i,k ) S 2 (= M k+1,j ) M i,j C 2 9 S 1 S 2 changelastdep S 1 S 2 b z S 1 S 1 = d i d i+1 d k 1 d 0 k changelastdep(s 1, z) d i d i+1 d k 1 d z k 11 2 i j : 1 2 3 2 1 3 2 2 3 : 文節 i から j までの部分文節列に対する語順と係り受けの最尤構造 文節 i が j より前に位置し,i が j に係ることを表す. 例えば, 2 1 3 は, 意味する. Fig. 4 4 3 3 4 4 を 下記の構造候補の中でが最大となる構造をに書き込む とから生成される候補 連結プロセスにおいて生成 候補 1: 2 3 4 語順変更プロセスにおいて生成 候補 2: 3 2 4 とから生成される候補 連結プロセスにおいて生成 候補 3: 2 3 4 語順変更プロセスにおいて生成 候補は生成されない Execution example of our search algorithm. S 1 S 2 S 1 S 2 S 1 S 2 M i,j C 0 ( 1 ) ( 2 ) S 1 S 2 14 18 14 S 2 S 2 16 17 S 1 17 S2 l S 1 Sr 2 C S 2 S 2 2 concatreorder concat CYK concat concatreorder 3 13 18 3.2.2 4 n = 4 4 4 4 M 1,1, M 2,2, M 3,3, M 4,4, M 1,2, M 2,3, M 3,4, M 1,3 M 2,4 4 M 2,4 2 10 12 concatreorder 3 concatreorder(m 2,2, M 3,4 ) c 2015 Information Processing Society of Japan 4
M 2,2 d 0 2 d4 2 M 2,2 M 3,4 1 京大テキストコーパス concatreorder(m 2,2, M 3,4 ) M 2,2 b4 3 2 concatreorder(m 2,3, M 4,4 ) (1) 1 文の取り出し 新聞記事文 3 concatreorder(m 2,3, M 4,4 ) 3 3 P (S 2,4 B) = P (O 2,4, D 2,4 B 2,4 ) M 2,4 No Yes (4) 他に異なる語順があるか? No (2) ランダムな語順変更 (3) 母語話者が書きそうか? Yes 語順変更後の読みにくい文 作業者が判断 4. 採用 Fig. 5 新聞記事文 : 評価用データ 5 Construction procedure of evaluation data. 4.1 [15] Version 2 5 ( 1 ) 1 * 4 3.2 (1) (3) *4 消費税増税が国会を通った後でも 世論調査では 増税反対が まず, 大勢だ に係る 3 つの部分構造 ( 青太線枠 ) の順序を変更. 次に, 通った に係る 2 つの部分構造 ( 赤点線枠 ) の順序を変更. 語順変更後の読みにくい文 : 大勢だ 増税反対が世論調査では国会を消費税増税が通った後でも大勢だ Fig. 6 6 Example of random reordering. ( 2 ) 3.2 (1) (3) 6 ( 3 ) (2) 1 Yes (1) No (4) [1] ( 4 ) (2) Yes (2) c 2015 Information Processing Society of Japan 5
1 Table 1 Size of evaluation data. 2 Table 2 Experimental results for each set. 552 4,906 No (1) 1995 1 9 1 4.2 (1) α 552 5 552 5 4 α α 1 5 α 2 α 0.1 α α 0.1 0.01 α α (4) (7) 7 1 1 3 8 7,976 [16] [2] 2 2 2 1 [14] [2] 2 CaboCha[17] [2] 552 1 P (o i,j D, B) α 2 1 0.66 83.02% (3,790/4,565) 32.73% (36/110) 2 0.75 83.83% (4,308/5,139) 29.09% (32/110) 3 0.66 83.30% (3,862/4,636) 32.73% (36/110) 4 0.66 81.89% (3,496/4,269) 28.83% (32/111) 5 0.75 86.91% (4,018/4,623) 31.53% (35/111) 3 Table 3 Experimental results (word reordering). 2 83.82% (19,474/23,232) 30.98% (171/552) 1 82.39% (19,140/23,232) 26.99% (149/552) 2 83.35% (19,365/23,232) 26.63% (147/552) 76.78% (17,838/23,232) 0% (0/552) 1 P (d i O, B) 2 CaboCha 4.3 α 2 1 α 2 4 4 2 α α 0.66 0.75 3 5 2 *5 (p < 0.05) 2 2 (p > 0.05) 1 (p < 0.05) 1 7 *5 c 2015 Information Processing Society of Japan 6
例 1 入力文 麻薬売買にも手を越境前の生活費を得るため出す 語順整序結果 越境前の生活費を得るため麻薬売買にも手を出す 例 2 入力文 生産量は少なくなるが高品質のブドウをブドウの樹液を濃縮し収穫するためだ 語順整序結果 生産量は少なくなるがブドウの樹液を濃縮し高品質のブドウを収穫するためだ 例 3 入力文 コペンハーゲンからの報道によるとチェチェン紛争への軍事介入を理由に七日デンマーク政府はロシアとの昨年九月に結んだ軍事協力協定を一時的に凍結した 語順整序結果 コペンハーゲンからの報道によるとデンマーク政府は七日チェチェン紛争への軍事介入を理由に昨年九月に結んだロシアとの軍事協力協定を一時的に凍結した 7 Fig. 7 Examples of sentences correctly reordered by our 4 Table 4 method. 2 Agreement rates calculated separately for the case that the word order between two bunsetsus in the input sentence is appropriate and the case that the order is inappropriate. R F T R T T 66.30% (3,576/5,394) 88.77% (15,834/17,838) 1 52.65% (2,840/5,394) 91.38% (16,300/17,838) 2 51.67% (2,787/5,394) 92.94% (16,578/17,838) 4.4 2 2 4 552 2 2 R F T 2 R T T 2 R F T 2 7 2 入力文 亡くなる前に演出プランが出来ていたため演出が飯沢プラン通り行われた 本手法の出力結果 演出プランが出来ていたため飯沢プラン通り亡くなる前に演出が行われた 正解文 亡くなる前に演出プランが出来ていたため飯沢プラン通り演出が行われた 8 Fig. 8 Example of sentences incorrectly reordered by our method. 2 2 2 8 2 8 2 8 2 2 CaboCha 8 α 2 (3) α α 0.66 0.83 *6 *6 α 0.01 α c 2015 Information Processing Society of Japan 7
Table 5 5 Experimental results (dependency parsing). 83.39% (3,631/4,354) 40.04% (221/552) 1 84.75% (3,690/4,354) 36.78% (203/552) 2 86.08% (3,748/4,354) 37.50% (207/552) α 5. 5.1 4 [14] 5 5 p > 0.05 p < 0.05 5.2 α 4 (1) α α α 552 2 α 4 552 α 4.2 6 α = 0.66 2 2 Table 6 6 α Word reordering results by our method using the optimal α. 2 (α = 0.66) 84.12% (19,542/23,232) 31.16% (172/552) 3 p < 0.05 α 2 2 α 1 2 α α 5.3 α (1) α 3.1 (2) (3) (2) (1) α 4 α 4 ( 1 ) 4.1 552 ( 2 ) 4.1 552 ( 3 ) 4.1 (2) (1) 552 ( 4 ) (3) 5.2 2 α c 2015 Information Processing Society of Japan 8
7 4 Table 7 Experimental results on four kinds of data which have different word order readability. 2 α 2 100% (23,232/23,232) 100% (552/552) 0.92 86.51% (20,098/23,232) 40.40% (223/552) 76.78% (17,838/23,232) 0% (0/552) 0.66 84.12% (19,542/23,232) 31.16% (172/552) 74.46% (17,298/23,232) 18.12% (100/552) 0.52 82.58% (19,184/23,232) 32.07% (177/552) 60.40% (14,032/23,232) 0% (0/552) 0.41 80.66% (18,671/23,232) 24.46% (135/552) 7 2 3 4 2 α 5 6 7 α 3 6. CYK No.24650066 (B) No.25730134 [1] 7 (2009). [2] Vol. 7, No. 4, pp. 163 180 (2000). [3] Vol. 45, No. 5, pp. 1451 1459 (2004). [4] Filippova, K. and Strube., M.: Generating constituent order in German clauses, In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL2007), pp. 320 327 (2007). [5] Harbusch, K., Kempen, G., van Breugel, C. and Koch, U.: A generation-oriented workbench for performance grammar: Capturing linear order variability in German and Dutch, In Proceedings of the 4th International Natural Language Generation Conference (INLG2006), pp. 9 11 (2006). [6] Kruijff, G. M., Kruijff-Korbayová, I., Bateman, J. and Teich, E.: Linear order as higher-level decision: Information structure in strategic and tactical generation, In Proceedings of the 8th European Workshop on Natural Language Generation (ENLG2001), pp. 74 83 (2001). [7] Ringger, E., Gamon, M., Moore, R. C., Rojas, D., Smets, M. and Corston-Oliver, S.: Linguistically informed statistical models of constituent structure for ordering in sentence realization, In Proceedings of the 20th International Conference on Computational Linguistics (COL- ING2004), pp. 673 679 (2004). [8] Shaw, J. and Hatzivassiloglou, V.: Ordering among premodifiers, In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL 99), pp. 135 143 (1999). [9] Goto, I., Utiyama, M. and Sumita, E.: Post-ordering by parsing for Japanese-English statistical machine translation, In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL2012), pp. 311 316 (2012). [10] Elming, J.: Syntactic reordering integrated with phrasebased SMT, In Proceedings of the 22nd International Conference on Computational Linguistics (COL- ING2008), pp. 209 216 (2008). [11] Ge, N.: A direct syntax-driven reordering model for phrase-based machine translation, In Proceedings of Human Language Technologies: The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT2010), pp. 849 857 (2010). [12] Christoph, T. and Hermann, N.: Word reordering and a dynamic programming beam search algorithm for statistical machine translation, Computational Linguistics, Vol. 29, No. 1, pp. 97 133 (2003). [13] Nizar, H.: Syntactic preprocessing for statistical machine translation, In Proceedings of the 11th Machine Translation Summit (MT SUMMIT XI), pp. 215 222 (2007). [14] Vol. 40, No. 9, pp. 3397 3407 (1999). [15] 3 pp. 115 118 (1997). [16] Zhang, L.: Maximum entropy modeling toolkit for c 2015 Information Processing Society of Japan 9
Python and C++, http://homepages.inf.ed.ac.uk/ s0450736/maxent_toolkit.html (2008). [Online; accessed 1-March-2008]. [17] Vol. 43, No. 6, pp. 1834 1842 (2002). c 2015 Information Processing Society of Japan 10