( ) Kevin Duh

Similar documents

f ê ê = arg max Pr(e f) (1) e M = arg max λ m h m (e, f) (2) e m=1 h m (e, f) λ m λ m BLEU [11] [12] PBMT 2 [13][14] 2.2 PBMT Hiero[9] Chiang PBMT [X

A Japanese Word Dependency Corpus ÆüËÜ¸ì¤ÎÃ±¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

IPSJ SIG Technical Report Vol.2014-NL-219 No /12/17 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e) 1. [23] 1(a) 1(b) [19] n-best [1] 1 N

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

taro.watanabe at nict.go.jp

Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet

Rapp BLEU[10] [9] BLEU OrthoBLEU Rapp OrthoBLEU [9] OrthoBLEU OrthoBLEU ) ) ) 1) NTT Natural Language Research

Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet

BLEU Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu. (2002) BLEU: a method for Automatic Evaluation of Machine Translation. ACL. MT ( ) MT

IBM-Mode1 Q: A: cash money It is fine today 2

[1], B0TB2053, i

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

_AAMT/Japio特許翻訳研究会.key

自然言語処理23_175

NTT 465 図 1.,,..,, 1980,.,, [Hori 12]..,, [Kinoshita 09]. REVERB Challange, 30,, [Delcorix 14].,,.,,,,.,.., [ 13]. 2 4 会話シーンを捉えるリアルタイム会話分析 2,. 360,,,

ズテーブルを用いて対訳専門用語を獲得する手法を提案する具体的にはまず専門用語対訳辞書獲得の情報源として用いる日中対訳文対に対して句に基づく統計的機械翻訳モデルを適用すること

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

<> <name> </name> <body> <></> <> <title> </title> <item> </item> <item> 11 </item> </>... </body> </> 1 XML Web XML HTML 1 name item 2 item item HTML

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

[1] B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i

自然言語処理22_289

アジア言語を中心とした機械翻訳の評価第 1 回アジア翻訳ワークショップ概要 Evaluation of Machine Translation Focusing on Asian Languages Overview of the 1st Workshop on Asian Translation

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

Vol. 9 No. 5 Oct (?,?) A B C D 132

N-gram Language Models for Speech Recognition

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

自然言語処理21_249

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

_314I01BM浅谷2.indd

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

(i) 1 (ii) ,, 第 5 回音声ドキュメント処理ワークショップ講演論文集 (2011 年 3 月 7 日 ) 1) 1 2) Lamel 2) Roy 3) 4) w 1 w 2 w n 2 2-g

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

IPSJ SIG Technical Report Vol.2009-BIO-17 No /5/26 DNA 1 1 DNA DNA DNA DNA Correcting read errors on DNA sequences determined by Pyrosequencing

kut-paper-template.dvi

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

計量国語学アーカイブ ID KK 種別特集招待論文 A タイトル Webコーパスの概念と種類, 利用価値語史研究の情報源としてのWebコーパス Title The Concept, Types and Utility of Web Corpora: Web Corpora as

音響モデル triphone 入力音声音声分析デコーダ言語モデル N-gram bigram HMM の状態確率として利用出力層 triphone: 3003 ノードリスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

フレーズベース機械翻訳システムの構築フレーズベース機械翻訳システムの構築 Graham Neubig & Kevin Duh 奈良先端科学技術大学院大学 (NAIST) 5/10/2012 1

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

: ( 1) () 1. ( 1) 2. ( 1) 3. ( 2)

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

2 HMM HTK[2] 3 left-to-right HMM triphone MLLR 1 CSJ 10 1 : 3 1: GID AM/CSJ-APS/hmmdefs.gz

人工知能学会研究会資料 SIG-KBS-B Analysis of Voting Behavior in One Night Werewolf 1 2 Ema Nishizaki 1 Tomonobu Ozaki Graduate School of Integrated B

日本感性工学会論文誌

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

知能と情報, Vol.30, No.5, pp

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

新製品開発プロジェクトの評価手法

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

1 1 tf-idf tf-idf i

66-1 田中健吾・松浦紗織.pwd

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

28 Horizontal angle correction using straight line detection in an equirectangular image

skeiji.final.dvi

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

橡最新卒論

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

Vol.20, No.1, 2018 Castillo [10] Yang [11] Sina Weibo 3 Castillo [10] Twitter 4 Twitter [12] Twitter ) 2 Twitter [13] 3. Twitter Twitter 3

DEIM Forum 2009 B4-6, Str

Honda 3) Fujii 4) 5) Agrawala 6) Osaragi 7) Grabler 8) Web Web c 2010 Information Processing Society of Japan

2014/1 Vol. J97 D No. 1 2 [2] [3] 1 (a) paper (a) (b) (c) 1 Fig. 1 Issues in coordinating translation services. (b) feast feast feast (c) Kran

Transcription:

NAIST-IS-MT1251045 Factored Translation Models 2014 2 6

( ) Kevin Duh

Factored Translation Models Factored translation models Factored Translation Models, NAIST-IS-MT1251045, 2014 2 6. i

Post-ordering with Factored Translation Models for Japanese to English Translation Kazuya Kobayashi Abstract Translation quality using statistical machine translation systems strongly depends on language pair. When translating between Japanese and English, we need to consider long distance reorderings. Current statistical machine translation systems do not work well because of the low flexibility of standard reordering models and the limits of computational complexity. In this thesis, we focus a method called post-ordering to mitigate reordering problem in Japanese to English translation and propose a method using additional information beyond word surface forms. We use Factored Translation Models to use additional information such as POS-tag and word class. Keywords: Statistical Machine Translation, Post-ordering, Factored Translation Models, Japanese to English Translation Master s Thesis, Department of Information Science, Graduate School of Information Science, Nara Institute of Science and Technology, NAIST-IS-MT1251045, February 6, 2014. iii

v Kevin Duh Graham Neubig 2

vii v 1 1 1.1......................... 1 1.2...................... 2 1.3............................. 2 1.4............................. 3 2 5 2.1.............................. 5 2.2.............................. 5 2.2.1 IBM 1......................... 6 2.2.2 IBM 2......................... 7 2.2.3 IBM 3......................... 7 2.2.4 IBM 4 5....................... 8 2.3........................ 8 2.4............................ 9 2.5...................... 9 3 Factored Translation Models 11 4 13 5 15 5.1 Head Finalization........................... 16 5.2............................. 16 6 Factored Translation Models 19 6.1 HFE....................... 20 6.2 HFE........................ 20

7 factor 21 7.1............................. 21 7.2............................ 21 7.3................................ 21 7.4............................ 22 7.4.1 HFE factor...... 22 7.4.2 HFE factor........ 23 7.4.3...... 24 8 27 29 viii

ix 2.1........................ 10 3.1 factored translation models......... 11 4.1................. 14 5.1......................... 15 5.2 XML Enju....................... 17 5.3 Head Finalization.............. 17 6.1 Factored Translation Models................................. 20

xi 7.1................... 23 7.2 HFE.............. 24 7.3 HFE................. 24 7.4 factor................ 25 7.5........... 26

1 1 1.1 noisy channel model f e f = ( f 1, f 2,, f m ) e = (e 1,e 2,,e n ) P(e f) P(e f) = P(e)P(f e) (1.1) P(f) (1.1) e (1.1) ê ê = argmaxp(e)p(f e) (1.2) e P(e) P(f e) e e f noisy channel model Och [14] M h m (e,f) P(e f) P(e f) = exp[ M m=1 λ mh m (e,f)] e exp[ M m=1 λ (1.3) mh m (e,f)]

λ (1.3) e (1.3) ê ê = argmaxp(e f) (1.4) e = argmax e M m=1 λ m h m (e,f) (1.5) 1.2 John hit a ball. SVO SOV n n! 2 1.3 2

factored translation models 1.4 2 3 factored translation models 4 5 6 factored translation models 7 8 3

5 2 Koehn [11] Och [16] 2.1 (1.1) (1.3) n-gram w 1,w 2,...,w l P(w 1 w 2...w l ) = P(w 1 )P(w 2 w 1 )...P(w l w 1 w 2...w l 1 ) (2.1) (2.1) n-gram w i n 1 3-gram P(w i w i 1 w i 2 ) = C(w i 2w i 1 w i ) C(w i 2 w i 1 ) (2.2) C(x) x (2.2) 3-gram 0 Kneser-Ney [8] Witten-Bell [19] 2.2 (1.1) (1.3) IBM [2] IBM e f

a P(f e) P(f e) = P(f, a e) (2.3) a f e m l 1 a a 1 a 2...a m m f j e i a j = i f j e a j = 0 P(f,a e) P(f,a e) = P(m e) M j=1 P(a j f j 1 1,a j 1 1,m,e)P( f j f j 1 1,a j 1 1,m,e) (2.4) f j i f i j IBM 1 5 5 1 5 2.2.1 IBM 1 IBM 1 3 (2.4) P(m e) m e P(a j a j 1 1, f j 1 1,m,e) l (l + 1) 1 P( f j f j 1 1,a j 1 1,m,e) f j e a j f ( f j e a j ) = P( f j f j 1 1,a j 1 1,m,e) (2.5) (2.5) e a j f j P(f,a e) P(f,a e) = ε (l + 1) m m j=1 t( f j e a j ) (2.6) 6

(2.6) f a j 0 l P(f e) P(f e) = = ε (l + 1) m ε (l + 1) m l a 1 =0 l i=0 m j=1 l m a m =0 j=1 t( f j e a j ) t( f j e a j ) (2.7) 2.2.2 IBM 2 2 P(a j a j 1 1, f j 1 1,m,l) l j a j m a(a j j,m,l) = P(a j a j 1 1, f j 1 1,m,l) (2.8) (2.7) (2.8) P(f e) P(f e) = ε = ε l a 1 =0 l m i=0 j=1 l m a m =0 j=1 t( f j e a j )a(a j j,m,l) t( f j e a j )a(a j j,m,l) (2.9) 1 2 a(i j,m,l) (l + 1) 1 1 2 2.2.3 IBM 3 1 2 1 3 e i n(ϕ e) fertility model ϕ = 0 p 1 3 7

2 a(a j j,m,l) d( j i,l.m) 3 n(ϕ e) p 1 t( f j e a j ) d( j i,l.m) 2.2.4 IBM 4 5 3 4 3 4 5 2.3 4 1 8

2.4 Moses[10] Moses Galley [4] f = ( f 1, f 2,, f m ) ē = ( e 1, e 2,, e n ) a = (a 1,a 2,...,a n ) o = (o 1,o 2,...,o n ) P(o ē, f ) = n i=1 P(o i ē i, f ai,a i 1,a i ) (2.10) o i 3 (2.1) monotone M 2 swap S 2 discontinuous D 2 o f m = n i=1 log p(o i = M...) f s = n i=1 log p(o i = S...) f d = n i=1 log p(o i = D...) 2.5 n n! 9

2.1: 10

11 3 Factored Translation Models Koehn [9] factored translation models factor house houses house houses factored translation models 3.1 2 Factored translation models 3.1: factored translation models

13 4 4.1 Collins [3] Collins Katz-Brown [7] 2 Katz-Brown Isozaki [6] Isozaki Sudoh [18] Sudoh Isozaki [6]

4.1: 14

15 5 head-final language Isozaki Sudoh [18] Sudoh 5.1 2 Isozaki [6] (Head-Final English HFE) HFE HFE 2 HFE HFE 2 5.1:

5.1 Head Finalization Head-Fianl English Head Finalization [6] Enju 1 [12] 5.2 Enju Enju 2 Head Finalization 4 Head Finalization 5.3 1. 2. 3. a an the 4. va0 va1 va2 2 4 5.2 5.1 2 HFE HFE HFE Head Finalization HFE HFE HFE HFE HFE 1 http://www.nactem.ac.uk/tsujii/enju/index.html 16

5.2: XML Enju 5.3: Head Finalization 17

19 6 Factored Translation Models factored translation models[9] f e P( f e) P( f word e word ) P( f word ) f f e f P( f e) P( f word e word ) P( f word ) factor translation models factor P( f word, f f actor1, f f actor2, e word ) factor P( f f actor ) P( f e) Factor 2 factor Enju Brown Clustering[1] Brown Clustering 50 1,000 2 50 Enju 50 Brown Clustering 1,000 6.1 head finalization HFE HFE Enju Brown Clustering factor HFE HFE HFE factor HFE HFE HFE

6.1: Factored Translation Models 6.1 HFE HFE HFE factor P( f word ) P( f ) P( f ) HFE factor P( f word, f, f e word ) HFE HFE Brown Clustering 6 6.2 HFE HFE HFE factor HFE factor 12 20

21 7 factor 7.1 Wikipedia Wikipedia Wikipedia 318,443 1,166 1,160 7.2 GIZA++[15] 1 IBM 4 SRILM 2 5-gram factor 7-gram MERT[13] Moses 3 7.3 BLEU[17] RIBES[5] BLEU n-gram BLEU (7.1) BLEU = BP exp( N n=1 w n logp n ) (7.1) 1 https://code.google.com/p/giza-pp/ 2 http://www.speech.sri.com/projects/srilm/download.html 3 http://www.statmt.org/moses/

BP w n n-gram logp n n-gram N = 4 RIBES Kendall τ unigram RIBES (7.2) RIBES = τ + 1 P α (7.2) 2 τ Kendall τ P unigram α unigram 7.4 7.1 factor BLEU RIBES factor 1000 BLEU RIBES Factor 50 Brown Clustering 7.4.1 HFE factor factor HFE Head Finalization 7.2 50 BLEU 22

HFE HFE BLEU RIBES PBMT 16.95 65.23 PBMT + 16.68 64.54 PBMT + 50 12 17.36 65.25 PBMT + 1000 17.56 65.88 PBMT + 50 17.41 65.23 PBMT + 1000 17.47 65.50 16.22 65.73 + 16.22 65.77 + 50 6 12 16.69 65.39 + 1000 16.16 65.89 + 50 16.55 65.45 + 1000 16.79 65.99 7.1: 7.4.2 HFE factor factor factor HFE Head Finalization HFE 7.3 7.2 BLEU RIBES 1,000 BLEU 1,000 BLEU 50 BLEU 7.2 7.3 50 1000 1,000 fator 7.4 BLEU factor BLEU 23

BLEU RIBES PBMT 15.65 68.35 16.06 68.62 50 16.32 68.36 1000 15.61 68.39 50 16.17 68.06 1000 16.09 68.44 7.2: HFE BLEU RIBES PBMT 59.69 82.31 58.85 81.85 50 60.09 82.27 1000 60.74 83.36 50 60.08 82.58 1000 60.99 83.16 7.3: HFE 7.4.3 7.5 7.5 factored translation models HFE BLEU HFE HFE RIBES HFE HFE HFE HFE factor 24

HFE HFE BLEU RIBES 16.22 65.73 + 16.22 65.77 + 50 16.69 65.39 + 1,000 6 12 16.16 65.89 + 50 16.55 65.45 + 1,000 16.79 65.99 + 50 & + 1,000 17.16 65.69 7.4: factor 1,000 BLEU RIBES 1,000 1,000 25

HFE HFE BLEU RIBES 16.22 65.73 + 16.22 65.77 + 50 6 12 16.69 65.39 + 1,000 16.16 65.89 + 50 16.55 65.45 + 1,000 16.79 65.99 16.32 64.64 + 17.16 64.64 + 50 20 12 16.62 63.84 + 1,000 17.01 65.36 + 50 16.84 64.39 + 1,000 17.43 65.25 7.5: 26

27 8 factored translation models factored translation models BLEU n-garm RIBES factor 50 1,000 brown clustering deep learning factor factor HFE factor HFE

29 [1] Peter F Brown, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra, and Jenifer C Lai. Class-based n-gram models of natural language. Computational linguistics, Vol. 18, No. 4, pp. 467 479, 1992. [2] Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, Vol. 19, No. 2, pp. 263 311, 1993. [3] Michael Collins, Philipp Koehn, and Ivona Kučerová. Clause restructuring for statistical machine translation. pp. 531 540, 2005. [4] Michel Galley and Christopher D Manning. A simple and effective hierarchical phrase reordering model. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 848 856, 2008. [5] Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 944 952, 2010. [6] Hideki Isozaki, Katsuhito Sudoh, Hajime Tsukada, and Kevin Duh. Head finalization: A simple reordering rule for sov languages. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pp. 244 251, 2010. [7] Jason Katz-Brown and Michael Collins. Syntactic Reordering in Preprocessing for Japanese to English Translation: MIT System Description for NTCIR-7 Patent Translation Task. Proceedings of NTCIR-7 Workshop Meeting, 2008. [8] Reinhard Kneser and Hermann Ney. Improved backing-off for m-gram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on, Vol. 1, pp. 181 184. IEEE, 1995.

[9] Philipp Koehn and Hieu Hoang. Factored translation models. EMNLP-CoNLL, pp. 868 876, 2007. [10] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, et al. Moses: Open source toolkit for statistical machine translation. pp. 177 180, 2007. [11] Philipp Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 48 54, 2003. [12] Yusuke Miyao and Jun ichi Tsujii. Feature forest models for probabilistic hpsg parsing. Computational Linguistics, Vol. 34, No. 1, pp. 35 80, 2008. [13] Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pp. 160 167, 2003. [14] Franz Josef Och and Hermann Ney. Discriminative training and maximum entropy models for statistical machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 295 302, 2002. [15] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models. Computational linguistics, Vol. 29, No. 1, pp. 19 51, 2003. [16] Franz Josef Och, Christoph Tillmann, Hermann Ney, et al. Improved alignment models for statistical machine translation. In Proc. of the Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 20 28, 1999. [17] Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pp. 311 318, 2002. [18] Katsuhito Sudoh, Xianchao Wu, Kevin Duh, Hajime Tsukada, and Masaaki Nagata. Post-ordering in statistical machine translation. In Proc. MT Summit, 2011. 30

[19] Ian H Witten and Timothy C Bell. The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. Information Theory, IEEE Transactions on, Vol. 37, No. 4, pp. 1085 1094, 1991. 31