jnlp98f.dvi

Similar documents
浜松医科大学紀要

untitled

Q-Learning Support-Vector-Machine NIKKEI NET Infoseek MSN i

10-渡部芳栄.indd

1 1 tf-idf tf-idf i

Microsoft Word - PCM TL-Ed.4.4(特定電気用品適合性検査申込のご案内)


untitled

% 95% 2002, 2004, Dunkel 1986, p.100 1

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i


論文9.indd

,,.,.,,.,.,.,.,,.,..,,,, i

29 jjencode JavaScript

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

kut-paper-template2.dvi

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

™…

総研大文化科学研究第 11 号 (2015)

,,,,., C Java,,.,,.,., ,,.,, i

05_藤田先生_責

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

駒田朋子.indd

24 Depth scaling of binocular stereopsis by observer s own movements

授受補助動詞の使用制限に与える敬語化の影響について : 「くださる」「いただく」を用いた感謝表現を中心に


EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

01ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐02ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐03ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐04ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐05ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六七八九零壱弐06ⅢⅣⅤⅥⅦⅧⅨⅩ一二三四五六

28 TCG SURF Card recognition using SURF in TCG play video

FUJII, M. and KOSAKA, M. 2. J J [7] Fig. 1 J Fig. 2: Motivation and Skill improvement Model of J Orchestra Fig. 1: Motivating factors for a

yasi10.dvi

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro


) ,100 40% 21 2) 1 3) 1) 2) 21 4

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

ユーザーズマニュアル

⑥中村 哲也(他).indd

日本版 General Social Surveys 研究論文集[2]


_Y05…X…`…‘…“†[…h…•

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

gengo.dvi

2 1 ( ) 2 ( ) i

II

先端社会研究 ★5★号/4.山崎

Kyushu Communication Studies 第2号

4.1 % 7.5 %

untitled

126 学習院大学人文科学論集 ⅩⅩⅡ(2013) 1 2

”Y‰Æ”ЛïŸ_‘W40−ª3/ ’¼„´


DOUSHISYA-sports_R12339(高解像度).pdf

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

07_伊藤由香_様.indd

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

<95DB8C9288E397C389C88A E696E6462>

) ,

<303288C991BD946797C797592E696E6464>

25 Removal of the fricative sounds that occur in the electronic stethoscope

IR0036_62-3.indb

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

28 Horizontal angle correction using straight line detection in an equirectangular image

-like BCCWJ CD-ROM CiNii NII BCCWJ BCCWJ

On the Wireless Beam of Short Electric Waves. (VII) (A New Electric Wave Projector.) By S. UDA, Member (Tohoku Imperial University.) Abstract. A new e

阿部Doc

ScanFront300/300P セットアップガイド

大学における原価計算教育の現状と課題

Web Web Web Web Web, i

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

T05_Nd-Fe-B磁石.indd

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

WikiWeb Wiki Web Wiki 2. Wiki 1 STAR WARS [3] Wiki Wiki Wiki 2 3 Wiki 5W1H Wiki Web 2.2 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 2.3 Wiki 2015 Informa

<82E682B15F8FBC88E48D828BB42E696E6464>

36

【教】⑮長島真人先生【本文】/【教】⑮長島真人先生【本文】


Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

2

NINJAL Project Review Vol.3 No.3

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN


No.3 14

untitled

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

„h‹¤.05.07



Fig. 4. Configuration of fatigue test specimen. Table I. Mechanical property of test materials. Table II. Full scale fatigue test conditions and test

<8ED089EF8B D312D30914F95742E696E6464>

johnny-paper2nd.dvi

Study on Application of the cos a Method to Neutron Stress Measurement Toshihiko SASAKI*3 and Yukio HIROSE Department of Materials Science and Enginee

ScanFront 220/220P 取扱説明書

ScanFront 220/220P セットアップガイド

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Core Ethics Vol.

2

2 The Bulletin of Meiji University of Integrative Medicine 3, Yamashita 10 11

(2003)

Transcription:

December 9, 1998 RT0288 Human-Computer Interaction 19 pages Research Report A word-based Japanese language model N. Itoh, M. Nishimura, S. Ogino, and K. Yamasaki IBM Research, Tokyo Research Laboratory IBM Japan, Ltd. 1623-14 Shimotsuruma, Yamato Kanagawa 242-8502, Japan Research Di vision Almaden - Austin - Beijing - Haifa - India - T. J. Watson - Tokyo - Zurich Limited Distribution Notice This report has been submitted for publication outside of IBM and will be probably copyrighted if accepted. It has been issued as a Research Report for early dissemination of its contents. In view of the expected transfer of copyright to an outside publisher, its distribution outside IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or copies of the article legally obtained (for example, by payment of royalities).

Vol. 6 No. 1 Jan. 1999 y y y y (N-gram) 2 3 44,000 94-98% 1 12% 19% : N-gram A Word-based Japanese Language Model Nobuyasu Itoh y, Masafumi Nishimura y, Shiho Ogino y and Kazutaka Yamasaki y This paper deals with a word-based language model of Japanese. In Japanese, word boundaries are not stable and grammatical units do not necessarily coincide with human intuition. For accurate segmentation it is therefore necessary to create a vocabulary set that covers human utterance units. In our word-segmentation method, a model of word boundary is described by morphological parameters (i.e. part of speech), which are learned by comparing results of human segmentation with those of Japanese morphological analyzer. Then by using pseudo-random number and the model, it is determined whether each morpheme transition is a word boundary. As a result, we obtain a vocabulary set and learning data for Japanese language model automatically. According to our experiments using articles from three newspaper and appended texts in network-based forums, about 44,000 words cover 94-98% of all words in the test data, and the average numbers of words per sentence are 12-19% smaller than those of morphemes. The parameters of word segmentation model and language model are quite dierent in newspaper articles and forum's texts. However, the dierence does not exist in the probabilities of common events, but in the kinds of events. Therefore the language model, which was created from newspaper articles and forum's text, gave the satisfactory results for both test set. 0

KeyWords: Speech recognition, Dictation, N-gram model, Morphological analysis 1 (,,,, 1996;,,, 1998b) HMM N N-gram 1 N-gram ( 1996;,,,, 1996) ( ) Minimum Cover Set ( 1989) 76% ( 1996) y, Tokyo Research Laboratory, IBM Japan, Ltd. 1

Vol. 6 No. 1 Jan. 1999 N-gram ( 1998a) N-gram 2 + + + + + + + + + + + + ( ) ( ) ( ) ] ] NULL ] P (] i j Morpheme i! M orpheme i+1 ) Morpheme C 1 C 2 ;...;C n j ] P (] j j Morpheme; C j! C j+1 ) ( ) (KoW) 2

,,, (Part of Speech: PoS) (String) (KoW[PoS]; String) ( 1994) 81 119 1 6 4 ( ) 1 1 P (] j V: infl:[29]! Conj: p:p:[69]; ) [29] [69] 2 1 3 V. in. Conj. p.p. (Verb[8]) + (V: infl[30]) + (Conj: p:p:[69]) 1... 1 A 17 PoS (PartofSpeech) ( 1994) 2 V. in. Verb inection Conj. p.p. Conjunctive post-positional particle 3 3

Vol. 6 No. 1 Jan. 1999 Morpheme level segmentation Kind of Word P(# * Conj. p.p..) KoW transition KoW transition with Part-of-Speech KoW transition with Part-of-Speech and the string of the following word P(# V. infl. Conj. p.p.) P(# V. infl.[29] Conj. p.p.[69]) Character level segmentation P(# P.noun Conj. p.p.) P(# V. infl.[29] Conj. p.p. [69], ) P(# Interrogative P.p.[17],, ) 1 Detailed Applied order + ( ) 2 4

,,, 3 3.1 (,, 1996) ( 1994) : : ( 1996) ( 1998) 5

Vol. 6 No. 1 Jan. 1999 3.2 2 3 25,000 1 2 2 1 3 1 1 3 4 4.1 17 5 3 2 ( 26,000 ) ( 9,500 4 ( ) 2,829 4 1. 2. 6

,,, 1 ( ) [19]! [19]; 0.33 [19]! [19]; 0.71 [19]! [18]; 0.36 [29]! [69]; 0.03 [19]! [77]; 1.0 1HZVSDSHU DUWLFOH )RUXP 2 2,269 1 ( ) 5 2 2 3 6 1. [69]! [31] 2. [62]! [73] 3. [48]! [73] 1. + +, 2.... +, 3.... + 5 50 6 String 7

Vol. 6 No. 1 Jan. 1999 ( [13]! [100] i.e.... + ) ( [19]! [19]; ) 1,607 0.980 ( ) (1) (2) 0 1 (3) N-gram ( ) N-gram 4.2 3 ( 446,079 ) (,, 1995) 97% 3 3 10 7 216,904 132,164 25,000 ( ) 95% 2 8

,,, Coverage (%) 100 90 80 70 1000 5000 10000 15000 20000 25000 30000 35000 40000 45000 73331 132164 216904 Number of tokens 3 3 60% 2 ( ) (%) P (] j KoW 1 [PoS 1 ]! KoW 2 [PoS 2 ]; String) 59.6 P (] j KoW 1[PoS 1]! KoW 2[PoS 2]) 29.2 P (] j KoW 1! KoW 2) 3.9 P (] j KoW 2 ) 6.6 0.7 9

Vol. 6 No. 1 Jan. 1999 5 5.1 93 96 92 10 97 91 92 EDR (EDR 1995) ( 1998a) 7 90 ( :-) 5.2 EDR 95% 44,000 (44K 7 ID 10

,,, ) 11 5.3 44K ( 1996) ( 1996) 3 6 N-gram 11

Vol. 6 No. 1 Jan. 1999 3 K 1,000 M 100 (K) (M) 715 20.9 1,837 49.4 1,401 41.4 EDR 169 4.4 1,565 33.6 4 ( ) (%) 600 21,378 18,725 98.3 35.6 31.2 725 22,051 18,608 96.1 30.4 25.7 775 21,702 17,751 96.0 28.0 22.9 1,381 29,979 24,204 94.4 21.7 17.5 36 44K 1,800 N-gram 3 ( 44K 4 12-19% N-gram N-gram ( : N-1,..,8 F-1,...,8 95% 5% N-gram Held-out (N-1,..,8) 12

,,, 1LNNHL 0DLQLFKL 6DQNHL )RUXP 7UDLQLQJ GDWD VL]H PLOOLRQ ZRUGV 4 N-gram (trigram 4Forum 7 8 1-2% 8 (100-170) ( 1998b) 400 8 13

Vol. 6 No. 1 Jan. 1999 1LNNHL 0DLQLFKL 6DQNHL )RUXP FUHDWHG IURP IRUXP FRUS XV )RUXP WH[W GDWD VL]H PLOOLRQ ZRUGV PL[HG ZLWK QHZV FRUSXV 5 (F-1,..,8) 5 ( 25M ) 9 152.1 N-gram N-gram N-gram 6 (N-1,..,8) bigram trigram N-1,...,8 F-1,...,8 N-gram trigram 31M bigram 5.6M trigram N-gram N-gram 9 14

,,, 7ULJUDP %LJUDP 7UDLQLQJ GDWD VL]H PLOOLRQ ZRUGV 6 N-gram ( ) 7 N-gram trigram N-1,...,8 F-1,...,8 trigram trigram 5M 1/3 1/5 10 7 44K 94-98% 10 1,...,8 1 trigram 31M 9M 15

Vol. 6 No. 1 Jan. 1999 1LNNHL 0DLQLFKL 6DQNHL )RUXP 1R RI WULJUDPV PLOOLRQ 7 trigram 12-19% N-gram N-gram ( ( 1998b) 400 ( 1989) 16

,,, 2 ( 1996) ( 1996) CD- 91-95 ( ) EDR (1995).. ( ). (1989). \.", DPHI22-3.,,,, (1996). \.", 3-3-10, pp. 105{106.,, (1996). \.", pp. 19{26. (1994). \.", 35 (7), 1293{1299.,,,, (1996). \.", J79-D-II (12), 2125{2131.,, (1995). \." 51, 3R-7, pp. 117{118. (1998a). \.", J81-D-II (1), 10{17.,,, (1998b). \.", SLP20-3, pp. 17{24. (1998). \.", pp. 122{135. 17

Vol. 6 No. 1 Jan. 1999 (1996). \.", J79-DII (12), 2078{2085. (1996). \ bigram.", 3 (4), 129{139. : 1982 1984 : 1981 3 1983 3 10 : 1986 1988 : 1988 1990 1993 (1998 4 1 ) (1998 7 ) (1998 8 ) 18