46 583/4 2012



Similar documents
21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

2

IPSJ-TOD

f ê ê = arg max Pr(e f) (1) e M = arg max λ m h m (e, f) (2) e m=1 h m (e, f) λ m λ m BLEU [11] [12] PBMT 2 [13][14] 2.2 PBMT Hiero[9] Chiang PBMT [X

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

gengo.dvi

IPSJ SIG Technical Report Vol.2014-NL-219 No /12/17 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e) 1. [23] 1(a) 1(b) [19] n-best [1] 1 N

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

ズテーブルを 用 いて 対 訳 専 門 用 語 を 獲 得 する 手 法 を 提 案 する 具 体 的 には まず 専 門 用 語 対 訳 辞 書 獲 得 の 情 報 源 として 用 いる 日 中 対 訳 文 対 に 対 して 句 に 基 づく 統 計 的 機 械 翻 訳 モデルを 適 用 すること

[1], B0TB2053, i

Rapp BLEU[10] [9] BLEU OrthoBLEU Rapp OrthoBLEU [9] OrthoBLEU OrthoBLEU ) ) ) 1) NTT Natural Language Research

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

( ) Kevin Duh

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

main.dvi

johnny-paper2nd.dvi

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

自然言語処理22_289

IPSJ SIG Technical Report Vol.2009-CVIM-167 No /6/10 Real AdaBoost HOG 1 1 1, 2 1 Real AdaBoost HOG HOG Real AdaBoost HOG A Method for Reducing

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe


soturon.dvi

untitled

[1] B =b 1 b n P (S B) S S O = {o 1,2, o 1,3,, o 1,n, o 2,3,, o i,j,, o n 1,n } D = {d 1, d 2,, d n 1 } S = O, D o i,j 1 i

2014/1 Vol. J97 D No. 1 2 [2] [3] 1 (a) paper (a) (b) (c) 1 Fig. 1 Issues in coordinating translation services. (b) feast feast feast (c) Kran

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

Vol. 23 No. 5 December (Rule-Based Machine Translation; RBMT (Nirenburg 1989)) 2 (Statistical Machine Translation; SMT (Brown, Pietra, Piet


On the Wireless Beam of Short Electric Waves. (VII) (A New Electric Wave Projector.) By S. UDA, Member (Tohoku Imperial University.) Abstract. A new e

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

2) 3) LAN 4) 2 5) 6) 7) K MIC NJR4261JB0916 8) 24.11GHz V 5V 3kHz 4 (1) (8) (1)(5) (2)(3)(4)(6)(7) (1) (2) (3) (4)

IPSJ SIG Technical Report Vol.2010-CVIM-170 No /1/ Visual Recognition of Wire Harnesses for Automated Wiring Masaki Yoneda, 1 Ta

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

Vol.2.indb

<95DB8C9288E397C389C88A E696E6462>

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

( : A8TB2163)

fiš„v8.dvi

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

浜松医科大学紀要

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

_念3)医療2009_夏.indd

駒田朋子.indd

A Study on Throw Simulation for Baseball Pitching Machine with Rollers and Its Optimization Shinobu SAKAI*5, Yuichiro KITAGAWA, Ryo KANAI and Juhachi

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

( )

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

2 ( ) i

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

[4], [5] [6] [7] [7], [8] [9] 70 [3] 85 40% [10] Snowdon 50 [5] Kemper [3] 2.2 [11], [12], [13] [14] [15] [16]

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

36 581/2 2012

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

新製品開発プロジェクトの評価手法

,,.,.,,.,.,.,.,,.,..,,,, i

A comparative study of the team strengths calculated by mathematical and statistical methods and points and winning rate of the Tokyo Big6 Baseball Le

149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :


独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

,,, 2 ( ), $[2, 4]$, $[21, 25]$, $V$,, 31, 2, $V$, $V$ $V$, 2, (b) $-$,,, (1) : (2) : (3) : $r$ $R$ $r/r$, (4) : 3

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

untitled

untitled

Hansen 1 2, Skinner 5, Augustinus 6, Harvey 7 Windle 8 Pels 9 1 Skinner 5 Augustinus 6 Pels 9 NL Harvey ML 11 NL

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)


39-3/2.論説:藤井・戸前・山本・井上

自然言語処理21_249

3_23.dvi

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

Vol. 9 No. 5 Oct (?,?) A B C D 132

% 95% 2002, 2004, Dunkel 1986, p.100 1

( : A9TB2096)

16−ª1“ƒ-07‘¬ŠÑ

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing


The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

Transcription:

4-3 A Transliteration System Based on Bayesian Alignment and its Human Evaluation within a Machine Translation System Andrew Finch and YASUDA Keiji This paper reports on contributions in two areas. Firstly, we present a novel Bayesian model for unsupervised bilingual character sequence alignment of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an effi cient forward fi ltering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the overfi tting problem inherent in maximum likelihood training. We demonstrate the effectiveness of our Bayesian alignment by using it to build models for phrase-based statistical machine transliteration (SMT) systems. We compare our alignment technique to the commonly used GIZA++ word alignment process, and also to the state-of-the-art m2m bilingual aligner by using their alignments to train transliteration generation systems. In both cases the model resulting from our Bayesian alignment was considerably smaller than competitive technique, and in addition gave an increase in transliteration generation quality. Our second contribution is to conduct a large-scale real-world evaluation of the effectiveness of integrating an automatic transliteration system with a machine translation system. A human evaluation is usually preferable to an automatic evaluation, and in the case of this evaluation especially so, since the common machine translation evaluation methods are often being biassed towards translations in terms of their length rather than the information they convey. We evaluate our transliteration system on data collected in fi eld experiments conducted all over Japan. Our results conclusively show that using a transliteration system can improve machine translation quality when translating unknown words. Transliteration, Human evaluation, Machine translation, Dirichlet process model, Bayesian alignment 45

46 583/4 2012

47

48 583/4 2012

49

50 583/4 2012

51

52 583/4 2012

53

54 583/4 2012

55

56 583/4 2012

57

58 583/4 2012

K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, Bleu: a method for automatic evaluation of machine translation,acl '02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, pp. 311 318, Association for Computational Linguistics, 2001. G. Doddington, Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics,Proceedings of the HLT Conference, San Diego, California, 2002. X. Duan, D. Xiong, H. Zhang, M. Zhang, and H. Li, I2r's machine translation system for iwslt 2009,Proceedings of the International Workshop on Spoken Language Translation, pp. 50 54, 2009. A. Finch and E. Sumita, Phrase-based machine transliteration,proc. 3rd International Joint Conference on NLP, Hyderabad, India, 2008. T. Rama and K. Gali, Modeling machine transliteration as a phrase based statistical machine translation problem,news '09: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Morristown, NJ, USA, pp. 124 127, Association for Computational Linguistics, 2009. S. Noeman, Language independent transliteration system using phrase based smt approach on substrings,news '09: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Morristown, NJ, USA, pp. 112 115, Association for Computational Linguistics, 2009. H. Li, M. Zhang, and J. Su, A joint source-channel model for machine transliteration,acl '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Morristown, NJ, USA, p. 159, Association for Computational Linguistics, 2004. 59

F. J. Och and H. Ney, A systematic comparison of various statistical alignment models,computational Linguistics, Vol. 29, No. 1, pp. 19 51, 2003. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowa, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst, Moses: open source toolkit for statistical machine translation,acl 2007: proceedings of demo and poster sessions, Prague, Czeck Republic, pp. 177 180, June 2007. D. Yang, P. Dixon, Y. C. Pan, T. Oonishi, M. Nakamura, and S. Furui, Combining a two-step conditional random field model and a joint source channel model for machine transliteration,news '09: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Morristown, NJ, USA, pp. 72 75, Association for Computational Linguistics, 2009. D. Marcu and W. Wong, A phrase-based, joint probability model for statistical machine translation,in Proceedings of EMNLP, pp. 133 139, 2002. M. Bisani and H. Ney, Joint-sequence models for grapheme-to-phoneme conversion,speech Commun., Vol. 50, No. 5, pp. 434 451, 2008. P. Blunsom, T. Cohn, C. Dyer, and M. Osborne, A gibbs sampler for phrasal synchronous grammar induction,proceedings of the Joint Conference of the 47th Annual Meeting of the ACL, Suntec, Singapore, pp. 782 790, Association for Computational Linguistics, August 2009. J. DeNero, A. Bouchard-Côté, and D. Klein, Sampling alignment structure under a bayesian translation model,proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP '08, Stroudsburg, PA, USA, pp. 314 323, Association for Computational Linguistics, 2008. G. Neubig, T. Watanabe, E. Sumita, S. Mori, and T. Kawahara, An unsupervised model for joint phrase alignment and extraction,acl, pp. 632 641, 2011. J. Wuebker, A. Mauser, and H. Ney, Training phrase translation models with leaving-one-out,proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, pp. 475 484, Association for Computational Linguistics, July 2010. Y. Huang, M. Zhang, and C. L. Tan, Nonparametric Bayesian Machine Transliteration with Synchronous Adaptor Grammars,ACL (Short Papers), pp. 534 539, 2011. J. Xu, J. Gao, K. Toutanova, and H. Ney, Bayesian semi-supervised chinese word segmentation for statistical machine translation,coling '08: Proceedings of the 22nd International Conference on Computational Linguistics, Morristown, NJ, USA, pp. 1017 1024, Association for Computational Linguistics, 2008. D. Mochihashi, T. Yamada, and N. Ueda, Bayesian unsupervised word segmentation with nested pitman-yor language modeling,acl-ijcnlp '09: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Vol. 1, Morristown, NJ, USA, pp. 100 108, Association for Computational Linguistics, 2009. A. Finch and E. Sumita, A Bayesian Model of Bilingual Segmentation for Transliteration,Proceedings of the seventh International Workshop on Spoken Language Translation (IWSLT), ed. M. Federico, I. Lane, M. Paul, and F. Yvon, pp. 259 266, 2010. S. Goldwater, T. L. Griffiths, and M. Johnson, Contextual dependencies in unsupervised word segmentation,acl-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, Morristown, NJ, USA, pp. 673 680, Association for Computational Linguistics, 2006. D. J. Aldous, Exchangeability and related topics,in École d'été de probabilités de Saint-Flour, XIII1983, Lecture Notes in Math., Vol. 1117, pp. 1 198, Springer, Berlin, 1985. 60 583/4 2012

F. J. Och and H. Ney, Discriminative training and maximum entropy models for statistical machine translation,in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 295 302, 2002. F. J. Och, Minimum error rate training for statistical machine translation,proceedings of the ACL, 2003. T. Rama and K. Gali, Modeling machine transliteration as a phrase based statistical machine translation problem,in Proc. ACL/IJCNLP Named Entities Workshop Shared Task, 2009. M. Z. Haizhou Li, A Kumaran, and V. Pervouchine, Whitepaper of news 2010 shared task on transliteration generation,in Proc. ACL Named Entities Workshop Shared Task, 2010. S. Jiampojamarn, G. Kondrak, and T. Sherif, Applying many-to-many alignments and hidden markov models to letter-to-phoneme conversion,human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, Rochester, New York, pp. 372 379, Association for Computational Linguistics, April 2007. E. S. Ristad and P. N. Yianilos, Learning string edit distance,ieee Transactions on Pattern Recognition and Machine Intelligence, Vol. 20, No. 5, pp. 522 532, May 1998. A. Finch, P. Dixon, and E. Sumita, Integrating models derived from non-parametric bayesian cosegmentation into a statistical machine transliteration system,proceedings of the Named Entities Workshop, Chiang Mai, Thailand, pp. 23 27, Asian Federation of Natural Language Processing, Nov 2011. H. KAWAI, R. ISOTANI, K. YASUDA, E. SUMITA, U. Masao, S. MATSUDA, Y. ASHIKARI, and S. NAKAMURA, An overview of a nation-wide field experiment of speech-to-speech translation in fiscal year 2009 (Japanese only),proceedings of 2010 autumn meeting of Acoustical Society of Japan, pp. 99 102, 2010. H. Okuma, H. Yamamoto, and E. Sumita, Introducint a translation dictionary into phrase-based smt,the IEICE Transactions on Information and Systems, Vol. 91-D, No. 7, pp. 2051 2057, 2008. T. Fukunishi, A. Finch, S. Yamamoto, and E. Sumita, Using features from a bilingual alignment model in transliteration mining,proceedings of the 3rd Named Entities Workshop (NEWS 2011), pp. 49 57, 2011. 61