f ê ê = arg max Pr(e f) (1) e M = arg max λ m h m (e, f) (2) e m=1 h m (e, f) λ m λ m BLEU [11] [12] PBMT 2 [13][14] 2.2 PBMT Hiero[9] Chiang PBMT [X

Similar documents
Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet

Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet

IPSJ SIG Technical Report Vol.2014-NL-219 No /12/17 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e) 1. [23] 1(a) 1(b) [19] n-best [1] 1 N

( ) Kevin Duh

2

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

IPSJ-TOD

IBM-Mode1 Q: A: cash money It is fine today 2

ズテーブルを 用 いて 対 訳 専 門 用 語 を 獲 得 する 手 法 を 提 案 する 具 体 的 には まず 専 門 用 語 対 訳 辞 書 獲 得 の 情 報 源 として 用 いる 日 中 対 訳 文 対 に 対 して 句 に 基 づく 統 計 的 機 械 翻 訳 モデルを 適 用 すること

2014/1 Vol. J97 D No. 1 2 [2] [3] 1 (a) paper (a) (b) (c) 1 Fig. 1 Issues in coordinating translation services. (b) feast feast feast (c) Kran

BLEU Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu. (2002) BLEU: a method for Automatic Evaluation of Machine Translation. ACL. MT ( ) MT

自然言語処理22_289

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

アジア言語を中心とした機械翻訳の評価 第 1 回アジア翻訳ワークショップ概要 Evaluation of Machine Translation Focusing on Asian Languages Overview of the 1st Workshop on Asian Translation

46 583/4 2012

taro.watanabe at nict.go.jp

Rapp BLEU[10] [9] BLEU OrthoBLEU Rapp OrthoBLEU [9] OrthoBLEU OrthoBLEU ) ) ) 1) NTT Natural Language Research

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

Outline ACL 2017 ACL ACL 2017 Chairs/Presidents

x i 2 x x i i 1 i xi+ 1xi+ 2x i+ 3 健康児に本剤を接種し ( 窓幅 3 n-gram 長の上限 3 の場合 ) 文字 ( 種 )1-gram: -3/ 児 (K) -2/ に (H) -1/ 本 (K) 1/ 剤 (K) 2/ を (H) 3/ 接 (K) 文字 (

[1], B0TB2053, i

[1] [2] [3] 1 GPS 1 Twitter *1 *1 GPS [4] [5] [6] 2 [7] 1 [8] Restricted Boltzmann Machine RBM RBM

jpaper : 2017/4/17(17:52),,.,,,.,.,.,, Improvement in Domain Specific Word Segmentation by Symbol Grounding suzushi tomori, hirotaka kameko, takashi n

自然言語処理23_175

フレーズベース機械翻訳システムの構築 フレーズベース機械翻訳システムの構築 Graham Neubig & Kevin Duh 奈良先端科学技術大学院大学 (NAIST) 5/10/2012 1

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

( )

main.dvi

[1] B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i

89-95.indd

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]

10_08.dvi

DSF-517.dvi

gengo.dvi

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

(i) 1 (ii) ,, 第 5 回音声ドキュメント処理ワークショップ講演論文集 (2011 年 3 月 7 日 ) 1) 1 2) Lamel 2) Roy 3) 4) w 1 w 2 w n 2 2-g

2. Bilingual Pivoting Bilingual Pivoting [5] e 1 f f e 2 e 1 e 2 p(e 2 e 1 ) p(f e 1 ) p(e 2 f) p(e 2 e 1 ) = f p(e 2 f, e 1 ) p(f e 1 ) f p(e 2 f) p(

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

( : A8TB2163)

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

_AAMT/Japio特許翻訳研究会.key

Vol. 9 No. 5 Oct (?,?) A B C D 132

main.dvi

[12] Qui [6][7] Google N-gram[11] Web ( 4travel 5, 6 ) ( 7 ) ( All About 8 ) (1) (2) (3) 3 3 (1) (2) (3) (a) ( (b) (c) (d) (e) (1

2006 3

B

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

音声翻訳技術 音声翻訳技術 Graham Neubig 奈良先端科学技術大学院大学 NAIST 2015/5/11 共著者 中村哲 戸田智基 Sakriani Sakti 叶高朋 大串正矢 藤田朋希

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

統計的機械翻訳モデルの構築 各モデルを対訳文から学習 対訳文 太郎が花子を訪問した Taro visited Hanako. 花子にプレセントを渡した He gave Hanako a present.... モデル翻訳モデル並べ替えモデル言語モデル 2

untitled

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

レビューテキストの書き の評価視点に対する評価点の推定 29 3

/4 2012

fiš„v8.dvi

: ( 1) () 1. ( 1) 2. ( 1) 3. ( 2)

JUMAN++ version

Morphological Analysis System JUMAN Copyright 2016 Kyoto University All rights reserved. Licensed under the Apache License, Version 2.0 (the Li

sequence to sequence, B3TB2006, i

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

( : A9TB2096)

1 3 [1] [2, 3] WWW 2.1 WWW WWW DjVu 3 ( 1) 2 DjVu DjVu DjVu[2] 16 ( ) http

名称未設定

Vol.20, No.1, 2018 Castillo [10] Yang [11] Sina Weibo 3 Castillo [10] Twitter 4 Twitter [12] Twitter ) 2 Twitter [13] 3. Twitter Twitter 3

1 (1997) (1997) 1974:Q3 1994:Q3 (i) (ii) ( ) ( ) 1 (iii) ( ( 1999 ) ( ) ( ) 1 ( ) ( 1995,pp ) 1

ニューラル機械翻訳における 脈情報の選択的利 藤井諒 東北大学工学部電気情報物理工学科 1 はじめに近年, ニューラル機械翻訳 (NMT) の登場および発展により翻訳品質は劇的に向上してきた. しかし, 大量のデータに基づくニューラルネットワークの学習をもってしてもなお, 代名詞の誤訳や省略, 生成

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

2014 2

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. TRECVID2012 Instance Search {sak

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

2007/2 Vol. J90 D No Web 2. 1 [3] [2], [11] [18] [14] YELLOW [16] [8] tfidf [19] 2. 2 / 30% 90% [24] 2. 3 [4], [21] 428

E 2017 [ 03] (DAG; Directed Acyclic Graph) [ 13, Mori 14] DAG ( ) Mori [Mori 12] [McDonald 05] [Hamada 00] 2. Mori [Mori 12] Mori Mori Momouchi

Vol. 22 No. 2 June 2015 and language expressions. Based on these backgrounds, in this study, we discuss the definition of a tag set for recipe terms a

johnny-paper2nd.dvi

x T = (x 1,, x M ) x T x M K C 1,, C K 22 x w y 1: 2 2

( )

14 2 5

自然言語処理24_705

mthesis

.w..01 (1-14)

,,, 2 ( ), $[2, 4]$, $[21, 25]$, $V$,, 31, 2, $V$, $V$ $V$, 2, (b) $-$,,, (1) : (2) : (3) : $r$ $R$ $r/r$, (4) : 3

自然言語処理21_249

5 I The Current Situation and Future Prospects of the North Korean Economy presented at the 2014 Korea Dialogue Conference on Strengthenin

"-./0%. "-%!"#$#% $%&'(%)*+,%.!"#+$,$% &'()*% $%&'-(.(/%+,% $%&'0%12*+,'% 1 RMX.. grade gradetype= integer grade[

dsample.dvi

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

% 95% 2002, 2004, Dunkel 1986, p.100 1

DEIM Forum 2012 E Web Extracting Modification of Objec

Modal Phrase MP because but 2 IP Inflection Phrase IP as long as if IP 3 VP Verb Phrase VP while before [ MP MP [ IP IP [ VP VP ]]] [ MP [ IP [ VP ]]]

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

yasi10.dvi

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

™…{,

1 Twitter Twitter Twitter 2. 1 Xu [3] Twitter Twitter Twitter Twitter iphone iphone iphone Twitter Xu [3] Twitter Xu [5] Web Web Web Web

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

Transcription:

1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e) 1. Statistical Machine Translation: SMT[1] [2] [3][4][5][6] 2 Cascade Translation [3] Triangulation [7] Phrase-Based Machine Translation: PBMT[8] 1 Nara Institute of Science and Technology a) miura.akiba.lr9@is.naist.jp b) neubig@is.naist.jp c) ssakti@is.naist.jp d) tomoki@is.naist.jp e) s-nakamura@is.naist.jp PBMT Hierarchical Phrase-Based Machine Translation: Hiero[9] PBMT SMT Hiero PBMT Hiero [10] 2. 2.1 Koehn PBMT[8] PBMT PBMT 1

f ê ê = arg max Pr(e f) (1) e M = arg max λ m h m (e, f) (2) e m=1 h m (e, f) λ m λ m BLEU [11] [12] PBMT 2 [13][14] 2.2 PBMT Hiero[9] Chiang PBMT [X 1 ]visit[x 2 ] [X 1 ] [X 2 ] X 1 X 2 X 1 X 2 X 1,X 2 PBMT 3. PBMT 3.1 [3] 1 PBMT 2 1 n-best [4] 3.2 [3] 2 SMT De Gispert [3] 3.3 PBMT 3 Cohn [7] PBMT T FE,T EG 2

1 2 T FG T FG φ( ) p ω ( ) φ ( f g ) ( ) = φ f e φ (e g) (3) φ ( g f ) ( ) = φ (g e) φ e f (4) ( ) ( ) p ω f g = f e pω (e g) (5) p ω ( ) ( ) p ω g f = p ω (g e) p ω e f f,e, g e T FE T EG e T FE,T EG Utiyama [4] n =1n = 15 BLEU (6) 4. Hiero 4.1 3.1 PBMT 3.3 Hiero PBMT Moses[15] Hiero Travatar[16] Moses PBMT Travatar Hiero Direct ( ) Cascade ( ) Triangulation ( ) 3

3 Direct SMT Cascade 3.1 PBMT Hiero 2 Triangulation 3.3 Moses PBMT (3)-(6) Moses [17] Travatar Hiero PBMT (3)-(6) f,e, g 4.2 MultiUN [10] 5 1 1 0.5 300 Hiero 50 1500 1 Dataset Lang Words Sentencees Average Sentence Length En 13.2M 500k 26.3 Fr 15.7M 500k 31.3 Train Zh 12.4M 500k 24.8 Ar 11.6M 500k 23.2 Ru 11.9M 500k 23.9 En 37.9k 1.5k 25.3 Fr 44.9k 1.5k 29.9 Dev Zh 35.0k 1.5k 23.4 Ar 33.2k 1.5k 22.2 Ru 34.5k 1.5k 23.0 En 38.5k 1.5k 25.7 Fr 45.2k 1.5k 30.2 Test Zh 36.0k 1.5k 24.0 Ar 33.6k 1.5k 22.2 Ru 34.7k 1.5k 23.2 1 KyTea[18] 4

Moses PBMT Travatar Hiero KenLM[19] 5-gram GIZA++[20] Moses Travatar BLEU[11] BLEU 4.3 4.1 Direct 2 Direct Triangulation Cascade 3 3 Direct Pivot Triangulation Cascade BLEU BLEU Score [%] Lang 1 Lang 2 Moses Hiero En Ar 43.03 52.47 37.22 47.82 En Fr 53.58 54.68 50.33 49.56 En Ru 46.21 53.59 41.03 49.66 En Zh 33.87 40.20 34.91 40.80 Ar Zh 31.54 30.29 29.84 28.93 Fr Ru 41.65 47.43 34.70 43.38 Fr Zh 29.77 35.38 28.05 34.36 Ru Zh 32.46 30.64 30.78 30.50 2 PBMT Triangulation Cascade 3 Hiero Triangulation Cascade Cascade Direct Cascade Direct 2 Hiero Triangulation Cascade PBMT 4.1 Hiero Triangulation PBMT (3)-(6) 1 X Hiero a X b() X c( ) X c( ) d X e( ) 2 a X b() c X d( ) X [21] 3 4 Hiero Triangulation Cascade 2 3 Hiero PBMT Hiero PBMT PBMT Hiero 5 Moses PBMT 7 PBMT 5. PBMT Hiero 5

Source Pivot Target MT Method BLEU Score [%] Direct Triangulation Cascade Ar En Zh Moses 31.54 29.40 28.78 Hiero 29.84 28.41 29.11 Fr En Zh Moses 29.77 29.31 29.16 Hiero 28.05 27.57 29.64 Ru En Zh Moses 32.46 30.67 30.25 Hiero 30.78 29.32 30.10 Zh En Ar Moses 30.29 28.82 28.27 Hiero 28.93 26.22 27.62 Zh En Fr Moses 35.38 35.21 35.16 Hiero 34.36 32.26 35.23 Zh En Ru Moses 30.64 30.12 29.55 Hiero 30.50 27.82 29.88 En Fr Zh Moses 33.87 32.13 31.09 Hiero 34.91 32.79 30.57 Zh Fr En Moses 40.20 36.52 35.37 Hiero 40.80 34.94 34.28 En Zh Fr Moses 53.58 45.29 41.21 Hiero 50.33 43.79 35.78 Fr Zh En Moses 54.68 45.22 41.12 Hiero 49.56 43.51 35.16 3 Hiero PBMT [22] Hiero [1] Peter F. Brown, Vincent J.Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, Vol. 19, pp. 263 312, 1993. [2] Christopher Dyer, Aaron Cordova, Alex Mont, and Jimmy Lin. Fast, easy, and cheap: construction of statistical machine translation models with mapreduce. In Proc. WMT, pp. 199 207, 2008. [3] Adrià de Gispert and José B. Mariño. Catalan-english statistical machine translation without parallel corpus: Bridging through spanish. In Proc. of LREC 5th Workshop on Strategies for developing machine translation for minority languages, 2006. [4] Masao Utiyama and Hitoshi Isahara. A comparison of pivot methods for phrase-based statistical machine translation. In Proc. NAACL, pp. 484 491, 2007. [5] Jörg Tiedemann. Character-based pivot translation for under-resourced languages and domains. In EACL12, pp. 141 151, 2012. [6] Xiaoning Zhu, Zhongjun He, Hua Wu, Conghui Zhu, Haifeng Wang, and Tiejun Zhao. Improving pivotbased statistical machine translation by pivoting the cooccurrence count of phrase pairs. In Proc. EMNLP, 2014. [7] Trevor Cohn and Mirella Lapata. Machine translation by triangulation: Making effective use of multi-parallel corpora. In Proc. ACL, pp. 728 735, June 2007. [8] Phillip Koehn, Franz Josef Och, and Daniel Marcu. Statistical phrase-based translation. In Proc. HLT, pp. 48 54, 2003. [9] David Chiang. Hierarchical phrase-based translation. Computational Linguistics, Vol. 33, No. 2, pp. 201 228, 2007. [10] Andreas Eisele and Yu Chen. MultiUN: A Multilingual Corpus from United Nation Documents. In Proc. of the Seventh conference on International Language Resources and Evaluation, pp. 2868 2872, 2010. [11] Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Jing Zhu. BLEU: a method for automatic evaluation of machine translation. In Proc. ACL, pp. 311 318, 2002. [12] Franz Josef Och. Minimum error rate training in statistical machine translation. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, pp. 160 167, 2003. [13] Michel Galley and Christopher D. Manning. A simple and effective hierarchical phrase reordering model. In Proc. EMNLP, pp. 848 856, 2008. [14] Isao Goto, Masao Utiyama, Eiichiro Sumita, Akihiro 6

Tamura, and Sadao Kurohashi. Distortion model considering rich context for statistical machine translation. In Proc. ACL, pp. 155 165, August 2013. [15] Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. Moses: Open source toolkit for statistical machine translation. In Proc. ACL, pp. 177 180, 2007. [16] Graham Neubig. Travatar: A forest-to-string machine translation engine based on tree transducers. In Proc. ACL Demo Track, pp. 91 96, 2013. [17] Philipp Koehn, Amittai Axelrod, Alexandra Birch Mayne, Chris Callison-Burch, Miles Osborne, and David Talbot. Edinburgh system description for the 2005 IWSLT speech translation evaluation. In Proc. IWSLT, 2005. [18] Graham Neubig, Yosuke Nakata, and Shinsuke Mori. Pointwise prediction for robust, adaptable Japanese morphological analysis. In Proc. ACL, pp. 529 533, 2011. [19] Kenneth Heafield. KenLM: faster and smaller language model queries. In Proc, WMT, July 2011. [20] Franz Josef Och and Hermann Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, Vol. 29, No. 1, pp. 19 51, 2003. [21] Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. What s in a translation rule? In Proc. HLT, pp. 273 280, 2004. [22] Michael Paul, Hirofumi Yamamoto, Eiichiro Sumita, and Satoshi Nakamura. On the importance of pivot language selection for statistical machine translation. In Proc. NAACL, pp. 221 224, June 2009. 7