NTT 2012 NTT Corporation. All rights reserved.
2
3
4
5
Noisy Channel f : (source), e : (target) ê = argmax e p(e f) = argmax e p(f e)p(e) 6
p( f e) (Brown+ 1990) f1 f2 f3 f4 f5 f6 f7 He is a high school student. a3=0 a2=4 φ e0 e1 e2 e3 e4 e5 p(fj ei) p(aj f, e) 7
p(aj f, e) IBM Model (Brown+ 1990) Model 1: Model 2: j, f, e Model 3: Model 2+ Model 4: Model 5: Model 4+ HMM Model (Vogel+ 1996): Model 1+ 8
(word alignment) IBM/HMM Models Viterbi GIZA++ MGIZA++ (GIZA++ multi-thread ) Chaski (Hadoop wrapper) Berkeley Aligner 9
(Koehn+ 2003) He is a high school student. He He high school student is a high school student p(f e) 10
Log-linear! ê = argmax e p(e f) = argmax e exp X k w k h k (f, e) p(f e), p(e f) p(e) 11
(Chiang 2007) S X1 is X2. a X He is a high school student. He high school student He high school student X He is a high school student. X1 S X2 X1 X2 X1 is a X2. 12
(Galley+ 2004, GHKM) S NP VP P NP NP He is a high school student. S NP VP P NP VP P VP is NP NP NP a NP NP NP high school student / / 13
! ê = argmax e p(e f) = argmax e exp X k w k h k (f, e) Minimum Error Rate Training [Och 2003] Margin Infused Relaxed Alg. [Watanabe+ 2007] Pairwise Ranking Optimization [Hopkins+ 2011] 14
/ 15
16
ê! ê = argmax e p(e f) = argmax e exp X k w k h k (f, e) O(n m n!) n m n 17
[Koehn 2003] He was His He is He He........ He is =1 =2 =3 =4 =5 18
[Chiang 2007, Zollmann 2006] S NP VP P NP VP P VP is NP NP NP a NP NP NP He NP high school student P. NP S He is a high school student. S VP NP NP NP VP P NP NP P 19
... O(n!)... He lost his wallet in the airport yesterday. 20
Moses Philipp Koehn (U. Edinburgh) http://www.statmt.org/moses/ Footer 21
Moses 22
23
Pros: Cons:, Pros:,, Cons: = 24
BLEU IBM (Papineni+ 2002): de-facto standard n-gram We are delighted to inform you that your paper has been accepted. We are sorry to inform you that your paper was not accepted. 1-gram: 10/13 2-gram: 7/12 3-gram: 4/11 4-gram: 3/10 s Y BLEU = n p n min n n-gram 1, length(output) length(reference) brevity penalty : BLEU BLEU 25
WER (Word Error Rate) PER (Position-independent WER) TER (Translation Edit Rate) METEOR 26
RIBES NTT (Isozaki+ 2010, 2011) My paper was rejected because I drunk so much today. I drunk so much today because my paper was rejected. BLEU: 0.74 RIBES: 0.47 Kendall s τ : RIBES = 1+ kendall 2 p 1 BP brevity penalty * α=0.25, β=0.1 GPLv2 http://www.kecl.ntt.co.jp/icl/lirg/ribes/index-j.html RIBES NTT Search 27
BLEU vs. RIBES : Spearman s ρ) ( ) BLEU 0.931 0.511-0.029 RIBES 0.949 0.929 0.716 NTCIR-9 PatentMT (Goto+ 2011) RIBES BLEU 28
29
30
Further Reading... Philipp Koehn, Statistical Machine Translation Cambridge University Press, 2010 2000, 10, 2003 IBM (p.100 ) : ACL, NAACL, EACL, EMNLP, IJCNLP, AMTA, EAMT, MT Summit,... : Computational Linguistics, Machine Translation, ACM TALIP,... 31
... ( ) 32
P. F. Brown et al., A Statistical Approach to Machine Translation, Computational Linguistics, vol. 16, no. 2 (1990) S. Vogel et al., HMM-Based Word Alignment in Statistical Translation, Proc. COLING (1996) 33
P. Koehn et al., Statistical Phrase-Based Translation, Proc. NAACL (2003) M. Galley et al., What s in a translation rule?, Proc. NAACL (2004) D. Chiang, Hierarchical Phrase-Based Translation, Computational Linguistics, vol. 33, no. 2 (2007) 34
F. J. Och, Minimum Error Rate Training in Statistical Machine Translation, Proc. ACL (2003) T. Watanabe et al., Online Large Margin Training for Statistical Machine Translation, Proc. EMNLP (2007) M. Hopkins and J. May, Tuning as Ranking, Proc. EMNLP (2011) 35
K. Papineni et al., BLEU: a Method for Automatic Evaluation of Machine Translation, Proc. ACL (2002) H. Isozaki et al., Automatic Evaluation of Translation Quality for Distant Language Pairs, Proc. EMNLP (2010) et al., RIBES:, (2011) I. Goto et al., Overview of the Patent Machine Translation Task at the NTCIR-9 Workshop, Proc. NTCIR-9 (2011) 36