22_15.dvi

Similar documents
ル札幌市公式ホームページガイドライン

WinXPBook.indb


ん n わ wa ら ra や ya ま ma は ha な na た ta さ sa か ka あ a り ri み mi ひ hi に ni ち chi し shi き ki い i る ru ゆ yu む mu ふ fu ぬ nu つ tsu す su く ku う u れ re め me へ

2 HMM HTK[2] 3 left-to-right HMM triphone MLLR 1 CSJ 10 1 : 3 1: GID AM/CSJ-APS/hmmdefs.gz


日本語 IME の設定 (XP の場合 ) 2

PowerPoint プレゼンテーション


main.dvi

CRA3689A

일본어 IME 설정법

01はじめに

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

だいか第 5 課 にほんごにゅうりょく日本語でパソコンに入力する Using Japanese on a Computer もくひょう目標 Goals にゅうりょく 1 ひらがな カタカナをパソコンに入力することができる Typing hiragana and katakana on a compu

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 1 tf-idf tf-idf i

2 [ 99] [Ramachandran 01] sound symbolism[hinton 95] [ 06] Ueda et al.[ueda 12] I [ 93] SVM [ 12, Aramaki 12] SVM 3 Twitter ,8

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

Fig. 3 3 Types considered when detecting pattern violations 9)12) 8)9) 2 5 methodx close C Java C Java 3 Java 1 JDT Core 7) ) S P S

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

<8CA48B8689C8985F8F5791E631308D862E696E6464>

IPSJ-TOD

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2013-GN-86 No.35 Vol.2013-CDS-6 No /1/17 1,a) 2,b) (1) (2) (3) Development of Mobile Multilingual Medical

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

2007/2 Vol. J90 D No Web 2. 1 [3] [2], [11] [18] [14] YELLOW [16] [8] tfidf [19] 2. 2 / 30% 90% [24] 2. 3 [4], [21] 428

Vol. 48 No. 3 Mar PM PM PMBOK PM PM PM PM PM A Proposal and Its Demonstration of Developing System for Project Managers through University-Indus

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

149 (Newell [5]) Newell [5], [1], [1], [11] Li,Ryu, and Song [2], [11] Li,Ryu, and Song [2], [1] 1) 2) ( ) ( ) 3) T : 2 a : 3 a 1 :


Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]


DEIM Forum 2009 C8-4 QA NTT QA QA QA 2 QA Abstract Questions Recomme

IPSJ SIG Technical Report Vol.2009-BIO-17 No /5/26 DNA 1 1 DNA DNA DNA DNA Correcting read errors on DNA sequences determined by Pyrosequencing

gengo.dvi

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

天理大学付属天理図書館所蔵「松前ノ言」について (2)

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

29 jjencode JavaScript

3_39.dvi

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

IPSJ SIG Technical Report Vol.2012-HCI-149 No /7/20 1 1,2 1 (HMD: Head Mounted Display) HMD HMD,,,, An Information Presentation Method for Weara

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

2 : Open Clip Art Library [4] Microsoft Office PowerPoint Web PowerPoint 2 Yahoo! Web [5] SlideShare Yahoo! Web Yahoo! Web

IT,, i

2

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

HP cafe HP of A A B of C C Map on N th Floor coupon A cafe coupon B Poster A Poster A Poster B Poster B Case 1 Show HP of each company on a user scree

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

HASC2012corpus HASC Challenge 2010,2011 HASC2011corpus( 116, 4898), HASC2012corpus( 136, 7668) HASC2012corpus HASC2012corpus

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN


1インターネットってなあに

: ( 1) () 1. ( 1) 2. ( 1) 3. ( 2)


Web Web Web Web Web, i

23 The Study of support narrowing down goods on electronic commerce sites

28 Horizontal angle correction using straight line detection in an equirectangular image

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

Web Stamps 96 KJ Stamps Web Vol 8, No 1, 2004

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

Vol.11-HCI-15 No. 11//1 Xangle 5 Xangle 7. 5 Ubi-WA Finger-Mount 9 Digitrack 11 1 Fig. 1 Pointing operations with our method Xangle Xa

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

B HNS 7)8) HNS ( ( ) 7)8) (SOA) HNS HNS 4) HNS ( ) ( ) 1 TV power, channel, volume power true( ON) false( OFF) boolean channel volume int

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

[12] Qui [6][7] Google N-gram[11] Web ( 4travel 5, 6 ) ( 7 ) ( All About 8 ) (1) (2) (3) 3 3 (1) (2) (3) (a) ( (b) (c) (d) (e) (1

Vol. 45 No Web ) 3) ),5) 1 Fig. 1 The Official Gazette. WTO A

COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

Hiragana 50-on hyo SeiOn (Basic characters) DakuOn (Combination - voiced consonant) あ い う え お a i u e o か き く け こ が ぎ ぐ げ ご ka ki ku ke ko ga gi gu ge

1_26.dvi

tikeya[at]shoin.ac.jp The Function of Quotation Form -tte as Sentence-final Particle Tomoko IKEYA Kobe Shoin Women s University Institute of Linguisti

3_23.dvi

①表紙 雛形(保険者入り)高齢者支援課 コピー

- June 0 0

kut-paper-template.dvi

意識_ベトナム.indd

02Takeishi-Fukumori.pdf

,,,,., C Java,,.,,.,., ,,.,, i

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

MDD PBL ET 9) 2) ET ET 2.2 2), 1 2 5) MDD PBL PBL MDD MDD MDD 10) MDD Executable UML 11) Executable UML MDD Executable UML

The Plasma Boundary of Magnetic Fusion Devices

自然言語処理16_2_45

202

@08470030ヨコ/篠塚・窪田 221号

Transcription:

Vol. 2 No. 1 145 155 (Feb. 2009) 1 2 3 1 2 Generating Diverse Katakana Variants via Backward- Forward Transliteration for Information Retrieval Hiroyuki Hattori, 1 Kazuhiro Seki 2 and Kuniaki Uehara 3 In Japanese, it is quite common for the same word to be written in multiple ways. This is especially true for katakana words which are typically used for transliterating foreign languages. For example, Los Angeles can be written in katakana as (rosanjerusu), (rosanzerusu), (rosuanjerusu), or (rosuanzerusu), all considered legitimate. This ambiguity becomes a critical problem for automatic processing such as information retrieval. To tackle this problem, we propose asimplebuteffectiveapproachforgeneratingkatakanavariantsforagiven katakana word based on phonemic representation of the original language for a given word. The proposed approach is first evaluated through a manual assessment of the variants it generates. It is also shown that the approach is beneficial for information retrieval when applied for query replacement, retrieving a large number of potentially relevant documents. 1. 1) Y Los Angeles 1 Google, Inc. 2 Organization of Advanced Science and Technology, Kobe University 3 Graduate School of Engineering, Kobe University 145 c 2009 Information Processing Society of Japan

146 2) Yahoo! Japan 1 Alisha Keys Alicia Keys 2 3 4 5 2. 2 3),4) 5),6) OR 1 1 http://search.yahoo.co.jp/ 7) 258 17.6% 100 47 Masuyama 8) Masuyama 1 682 98.6% 86.3% 9)

147 3. 3.1 1 10) /æ/ /a/ /e/ Chandler/tSændl@/ 4 (1) (2) (3) (4) 3.2 Knight 9) 1 1 2 2 Gregory 11) Knight diteeru Table 2 Table 1 1 Katakana characters and their phonetic representations. a ta ma gi bi i chi mi gu bu u tsu mu ge be e te me go bo o to mo za pa ka na ya ji pi ki ni yu zu pu ku nu yo ze pe ke ne ra zo po ko no ri da n sa ha ru ji v shi hi re zu su hu ro de se he wa do so ho ga ba 2 Compound katakana characters and their phonetic representations. di tsi chyo pyu du tse nya pyo ti tso nyu gya tu she nyo gyu si je hya gyo wi che hyu jya we kya hyo jyu wo kyu mya jyo va kyo myu dya vi shya myo dyu ve shyu rya dyo vo shyo ryu bya vyu chya ryo byu tsa chyu pya byo Knight 3 diteeru ru 1 L

148 Table 3 1 3 Knight 9) A fragment of English-Japanese phonemic mappings. D d 0.535 do 0.329 ER aa 0.719 a 0.081 ar 0.063 er 0.042 EY ee 0.641 a 0.122 e 0.114 IH i 0.908 L r 0.621 ru 0.362 T t 0.463 to 0.305 tto 0.103 UH u 0.794 uu 0.098 diteeru Fig. 1 Possible partitions for diteeru. r u 1 L UH Knight 3 1 1 1 5 φ φ 1 2 3.3 2 diteeru Fig. 2 Possible English phoneme sequences for diteeru. noisy channel model 3.2 a ER EY 1 d-i-t-ee-ru 2 1 1 e ee 2 AH EH EY IY J = j 1...j n j i E = e 1...e n e i P (E J) Ê Ê = arg max P (E J) E (1) = arg max P (J E)P (E) E

149 P (J E)P (E) = P (j i e i)p (e i e i 1) (2) i P (e 1 e 0)=P (e 1) Knight 9) (2) e i 12) 1 2 P (j i e i) 8,000 Knight 9) P (e i e i 1) CMU 1 127,000 1,571 = CMU 39 2 J (2) Ê D-IH-T-EY-L 3.4 Ê J J K J K 1 2 3.5 K K 2 1 Ê J K P (K ) i P (j i ê i) 1 P (K ) n 2 (2) 1 Knight n 3 EDICT 2 13,124 K 4 Table 4 Examples of katakana variants generated for 10 6 P (K ) i P (j i êi) 329 0.000002 195 0.000017 86 0.000003 36 0.000003 1 K K Yahoo! API K K K 1 2 1 K 4 46 12 4. 4.1 4.1.1 Infoseek 3 Yahoo! 1 http://www.speech.cs.cmu.edu/cgi-bin/cmudict 2 http://www.csse.monash.edu.au/ jwb/jedict.html 3 http://dictionary.www.infoseek.co.jp

150 1 25 7) 17 4.1.2 5 100 100 25 18.56% 13.98% 2 32.54% 7) 17.6% 4.2 4.1.3 5 Table 5 5 Individual results for quality judgment of generated katakana variants. 12 17.13 12.50 29.63 11 14.14 16.67 30.81 1 11.11 33.33 44.44 13 14.96 9.83 24.79 8 18.06 12.50 30.56 23 8.70 8.45 17.15 6 29.63 12.04 41.67 21 5.03 5.03 10.05 13 6.84 6.41 13.25 12 16.67 11.11 27.78 2 41.67 8.33 50.00 3 27.78 9.26 37.04 9 16.67 20.37 37.04 32 8.51 6.25 14.76 22 5.81 11.36 17.17 7 23.81 11.90 35.71 20 12.78 14.72 27.50 0 9 19.14 16.05 35.19 13 26.92 16.24 43.16 1 72.22 27.78 100.00 29 1.34 5.17 6.51 15 26.30 25.19 51.48 7 11.90 14.29 26.19 4 8.33 20.83 29.17 18.56 13.98 32.54 2 25 293 1 195 66.6% 174 89.2% 1 http://dic.yahoo.co.jp/newword/ 2 Google (http://google.com) 2008 4 14

151 4.2.1 174 4.2 4.2.1 NTCIR-3 Web 13) NTCIR-3 DM2&RL1 DM2&RL1 H A 47 26 3 1 14) tfidf 15) Base Base Phone EDICT 750 5 118 Rule Rule Yahoo! API 3.5 4.2.2 3 3 Base Phone Rule 1,000 4 R P R P Precision Recall Precision = TP TP+FP Recall = TP TP+FN TP true positive FP false positive FN false negative 4 Base Phone Rule 0 0.1 Rule (3) N HO 3 26 Fig. 3 Twenty-six katakana queries from NTCIR-3. 4 NTCIR R P Fig. 4 R P curves for NTCIR dataset.

152 Rule Base P hone 4.2.3 Rule 6 Rule Phone 1,000 0 1 6 1,000 Table 6 Precision at top 1,000 retrieved documents by query replacement. Rule Phone 0.0000 0.0050 0.0000 0.0000 0.0000 0.0020 0.0000 0.0000 0.0000 0.0010 0.0080 0.0080 0.0010 0.0010 0.0010 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0270 0.0630 0.0010 0.0000 0.0000 0.0180 0.0020 N HO 0.0010 0.0063 0.0150 6 Phone Rule Rule Phone 10 NTCIR 1 NTCIR 2 NTCIR 6 NTCIR 8 NTCIR capsaicin 1 10 NTCIR 7 C 7 6

153 7 10 Table 7 Top 10 documents retrieved by query replacement for polyphenol. Y/N 1 A Y 2 C N 3 N 4 N 5 N 6 C Y 7 C Y 8 N 9 N 10 N NTCIR 16) n n Rule Phone Phone 3.5 4.2.4 NTCIR-3 Web 4.1 25 NTCIR 4.1 1 Yahoo! API 60.1 1,420 2,040,000 1,437 512,000 12,400,000 238 Phone Rule Phone Rule 25 Rule Phone 20 4.1 17 Phone Rule 100 R P 5 Phone Rule 0.2

154 5 Phone Rule R P Fig. 5 Comparison of R P curves for our proposed and existing approaches. 0.03 0.08 Rule Phone 4.3 4.1 4.2 5 5. 25 32.5% 66.6% 1 89.2% 60.1 R P 1) Vol.2, pp.43 49 (1983). 2) Brill, E. and Moore, R. C.: An improved error model for noisy channel spelling correction, Proc. 38th Annual Meeting of the Association for Computational Linguistics, pp.286 293 (2000). 3) 44 pp.3 249 250 (1992). 4) FleCS

155 Vol.87, No.11, pp.83 90 (1992). 5) Vol.J77-D-II, No.2, pp.380 387 (1994). 6) Vol.35, No.12, pp.2745 2750 (1994). 7) Vol.J86-D-II, No.3, pp.418 428 (2003). 8) Masuyama, T. and Nakagawa, H.: Web-based acquisition of Japanese katakana variants, Proc. 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.338 344 (2005). 9) Knight, K. and Graehl, J.: Machine Transliteration, Computational Linguistics, Vol.24, No.4, pp.599 612 (1998). 10) (1978). 11) Gregory, G., Yan, Q. and David, A.E.: Mining the Web to create a language model for mapping between English names and phrases and Japanese, Proc. IEEE/WIC/ACM International Conference on Web Intelligence, pp.110 116 (2004). 12) Frederick, J.: Statistical Methods for Speech Recognition, MITPress(1998). 13) Eguchi, K., Oyama, K., Ishida, E., Kando, N. and Kuriyama, K.: Overview of the Web retrieval task at the third NTCIR workshop, Technical Report NII-2003-002E, National Institute of Informatics (2003). 14) Salton, G. and McGill, M.J.: Introduction to Modern Information Retrieval, McGraw-Hill, Inc. (1983). 15) Jones, K.S.: Statistical interpretation of term specificity and its application in retrieval, Journal of Documentation, Vol.28, No.1, pp.11 20 (1972). 16) Voorhees, E.M. and Harman, D.K. (Eds.): TREC: Experiment and Evaluation in Information Retrieval, The MIT Press (2005). 20 14 18 Ph.D. ACM SIGIR 53 58 AAAI ( 20 4 17 ) ( 20 6 6 ) ( 20 6 27 )