IPSJ SIG Technical Report Vol.2009-SLP-77 No /7/ % unigram F 95 A Broadcast News Transcription System for Content Applicati

Similar documents
音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

ホットスポット 1 音リアクションイベント BIC GMM 2 3 BIC GMM HMM 10) SVM 11) 12) 13) Bayesian Information Criterion BIC 14) BIC M = M 1, M 2,,

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2014-SLP-103 No /10/24 放送音声と字幕テキストを利用した音声言語コーパスの開発 奥貴裕 一木麻乃 尾上和穂 小林彰夫 佐藤庄衛 NHK では, 様々なジャンルの放送番組の音声を直接音声

IPSJ SIG Technical Report Vol.2013-SLP-98 No /10/25 1,a) 1 ( Q&A ) ( ) YJVOICE Development of speech recognition and natural language processing

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

第 1 回バイオメトリクス研究会 ( 早稲田大学 ) THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS Proceedings of Biometrics Workshop,169

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

ディスプレイと携帯端末間の通信を実現する映像媒介通信技術

DEIM Forum 2012 E Web Extracting Modification of Objec

WISS PowerPoint [3] [16] Mehrabian [10] 7% 93% [10] [19][18] Hindus [7] Lyons [9] [8] [14] TalkMan [4] [5] [6] 3 [19][18] [19] [19] 1 F0 [11] 7

トピックモデルの応用: 関係データ、ネットワークデータ

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

WISS 2018 [2 4] [5,6] Query-by-Dancing Query-by- Dancing Cao [1] OpenPose 2 Ghias [7] Query by humming Chen [8] Query by rhythm Jang [9] Query-by-tapp

Mimehand II[1] [2] 1 Suzuki [3] [3] [4] (1) (2) 1 [5] (3) 50 (4) 指文字, 3% (25 個 ) 漢字手話 + 指文字, 10% (80 個 ) 漢字手話, 43% (357 個 ) 地名 漢字手話 + 指文字, 21

Vol. 42 No. SIG 8(TOD 10) July HTML 100 Development of Authoring and Delivery System for Synchronized Contents and Experiment on High Spe

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

IPSJ SIG Technical Report Vol.2015-MUS-106 No.10 Vol.2015-EC-35 No /3/2 BGM 1,4,a) ,4 BGM. BGM. BGM BGM. BGM. BGM. BGM. 1.,. YouTube 201

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

NINJAL Project Review Vol.3 No.3

(i) 1 (ii) ,, 第 5 回音声ドキュメント処理ワークショップ講演論文集 (2011 年 3 月 7 日 ) 1) 1 2) Lamel 2) Roy 3) 4) w 1 w 2 w n 2 2-g

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N


Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

2 HMM HTK[2] 3 left-to-right HMM triphone MLLR 1 CSJ 10 1 : 3 1: GID AM/CSJ-APS/hmmdefs.gz

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

Web Web [4] Web Web [5] Web 2 Web 3 4 Web Web 2.1 Web Web Web Web Web 2.2 Web Web Web *1 Web * 2*3 Web 3. [6] [7] [8] 4. Web 4.1 Web Web *1 Ama

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

2. Twitter Twitter 2.1 Twitter Twitter( ) Twitter Twitter ( 1 ) RT ReTweet RT ReTweet RT ( 2 ) URL Twitter Twitter 140 URL URL URL 140 URL URL

B HNS 7)8) HNS ( ( ) 7)8) (SOA) HNS HNS 4) HNS ( ) ( ) 1 TV power, channel, volume power true( ON) false( OFF) boolean channel volume int

本文/研究発表4

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

Microsoft PowerPoint - ibis_upload.ppt

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

10_08.dvi

経済論集 44‐1(よこ)/2.李

60 90% ICT ICT [7] [8] [9] 2. SNS [5] URL 1 A., B., C., D. Fig. 1 An interaction using Channel-Oriented Interface. SNS SNS SNS SNS [6] 3. Processing S

情報科学研究 第19号

2 122

IPSJ-JNL

独立行政法人情報通信研究機構 Development of the Information Analysis System WISDOM KIDAWARA Yutaka NICT Knowledge Clustered Group researched and developed the infor

2

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

IPSJ-SLP

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

kiyo5_1-masuzawa.indd

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

入力環境に依存 /a, i, u, e, o / X P(X/W) 入力音声 信号処理 探索 ( デコーダ ) P(W/X) P(W) P(X/W) P(W) 京都 ky o: t o 単語辞書 タスクドメインに依存 京都 + の + 天気 時間 1000 時間 100 時間 10 時間

it-ken_open.key

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

3807 (3)(2) ,267 1 Fig. 1 Advertisement to the author of a blog. 3 (1) (2) (3) (2) (1) TV 2-0 Adsense (2) Web ) 6) 3

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)


The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

スライド 1

2.R R R R Pan-Tompkins(PT) [8] R 2 SQRS[9] PT Q R WQRS[10] Quad Level Vector(QLV)[11] QRS R Continuous Wavelet Transform(CWT)[12] Mexican hat 4

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

BOK body of knowledge, BOK BOK BOK 1 CC2001 computing curricula 2001 [1] BOK IT BOK 2008 ITBOK [2] social infomatics SI BOK BOK BOK WikiBOK BO

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

2 K D 3

* a) A Medical Record Creation Support System Using a Voice Memo Recorded by a Mobile Device Hiromitsu NISHIZAKI a), Keisuke KURUMIZAWA, Kanae NISHIZA

A Study on Interruptions in the Conversations: To Demonstrate the Features of the Conver sation between Japanese Native Speakers and Chinese Japanese

i

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

6_27.dvi

IPSJ SIG Technical Report Vol.2009-BIO-17 No /5/26 DNA 1 1 DNA DNA DNA DNA Correcting read errors on DNA sequences determined by Pyrosequencing

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

Microsoft Word - deim2011_new-ichinose doc

2019 Department of Sociology Department of Social Welfare Department of Media, Journalism and Communications Department of Industrial Relations Depart

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

1. HNS [1] HNS HNS HNS [2] HNS [3] [4] [5] HNS 16ch SNR [6] 1 16ch 1 3 SNR [4] [5] 2. 2 HNS API HNS CS27-HNS [1] (SOA) [7] API Web 2

1: ( 1) 3 : 1 2 4

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

* 1 e CD-ROM e e e 3 e e e CD-ROM DVD CBT(Computer Based Training) e 2002 e e electronic( ) WBT Web Based Training on-demand IT e e 1 y

IPSJ SIG Technical Report GPS LAN GPS LAN GPS LAN Location Identification by sphere image and hybrid sensing Takayuki Katahira, 1 Yoshio Iwai 1


xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL


IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1

gengo.dvi

p.14 p.14 p.17 1 p レッテル貼り文 2015: PC 20 p : PC 4

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

™…{,

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

untitled

untitled

本文6(599) (Page 601)

IPSJ SIG Technical Report Vol.2009-CVIM-167 No /6/10 Real AdaBoost HOG 1 1 1, 2 1 Real AdaBoost HOG HOG Real AdaBoost HOG A Method for Reducing

.,,, [12].,, [13].,,.,, meal[10]., [11], SNS.,., [14].,,.,,.,,,.,,., Cami-log, , [15], A/D (Powerlab ; ), F- (F-150M, ), ( PC ).,, Chart5(ADIns

a) Extraction of Similarities and Differences in Human Behavior Using Singular Value Decomposition Kenichi MISHIMA, Sayaka KANATA, Hiroaki NAKANISHI a

Ł×

Transcription:

1 1 1 1 1 1 53 9.2% unigram F 95 A Broadcast News Transcription System for Content Appication Akio Kobayashi, 1 Takahiro Oku, 1 Shinich Homma, 1 Shoei Sato, 1 Toru Imai 1 and Tohru Takagi 1 This paper describes a new transcription system for content appication. The system archives broadcast news programs with their transcriptions and speaker tags with the aim of getting a coection of training and evauation data for acoustic and anguage modes. Bedes it is aso utiized for extracting and describing metadata for TV programs. The system has the functions of muc and speech detection during dua-gender decoding, speaker diarization, and automatic anguage mode updating for upcoming news shows. Trigram attices are compressed into confuon networks that are indexed for known item retrieva. The system achieved a 9.2 % of word error rate and a 95 of F-measure in evauation of known item retrieva for 53 Japanese broadcast news shows. 1. NHK 1) NHK 2) 3) 5) 6) 7) (Known Item Retrieva) 2. 2.1 ( 1) NHK / 1 ( 1 NHK NHK Science and Technoogy Research Laboratories 1 c 2009 Information Procesng Society of Japan

情報処理学会研究報告 M a etri phoneacousti cmode s W ord bi gram M ae monophone acoustic modes Phoneme bigram ph0,0 Reset after ong non-speech ph 0 muc Start of detection 0,1 : 0 0 muc Penaty between genders End of detection ph1,0 1 muc ph1,1 : 1 muc bigram Femae monophone acoustic modes 図 3 音素認識による発話区間検出と音楽検出 Fig. 3 Speech and Muc Dtection Phoneti c tree Start of recogni ti on 0 Endof recogni ti on 1 1 Phoneti c Parti a tree Gender W ord word change atti ce bi gram contro Fema etri phoneacousti cmode s W ord Audi o c ana y s Eary decion withoutput i nput Acousti andspeechdetecti on trigram rescoring Speech segment 図 4 男女並列の連続音声認識 Fig. 4 Dua-Gender Speech Decoder 内容 話者名の各情報は音声情報として統合され データベースに蓄積される また 音声 図 1 報道番組自動書き起こしシステム概要 Fig. 1 Broadcast News Transcription System 認識で得られたラティスはコンフュージョンネットワークに圧縮され 番組情報 発話時刻 4 とともに索引化してデータベースに蓄積される (図 1- ) 図 2 に示すクライアントでは ビデオ映像と同期して発話内容を閲覧したり キーワー ドを入力して発話内容の検索を行う 2.2 発話区間 音楽検出 背景音や男女の話者が混在した放送音声の自動書き起こしのための発話区間検出は フ レーム単位の細かな音声/非音声の判定よりも 多少の非音声区間を音声区間と誤ることは あっても 音声区間の欠落をできる限り抑え 音声を適度な長さの区間に切り出して 認識 率の向上に寄与することが重要である また 音声始終端検出までの遅れ時間はできる限り 小さく 音声認識に不要なテーマ曲やジングル等の音楽検出も求められる 本システムの発話区間検出は 音のパワーだけでなく周波数特性も考慮して 男女並列の 性別依存音響モデルによる音素認識をエンドレスに実行し その時の尤度から発話区間検出 および音楽検出を行う (図 3) 音素認識は 男女間遷移が可能で枝刈り共通の男女並列音素 図 2 クライアント画面 Fig. 2 Cient Appication 認識を常時実行し 累積音素尤度の比を利用して発話の始端と終端を早期に検出する8),9) 音楽の検出には まず音楽専用 HMM(6 状態 4 出力 戻り遷移あり 32 混合モデル) を 1 の音声認識は 番組 代 海外ネットワークなど) を対象にデータを収集している 図 1- 各種報道番組で放送されるテーマ曲やジングル等 46 個の音楽データ (切り出し位置を 16 通 音声から抽出された音響特徴量を入力として発話区間を検出し 音声認識結果を出力する りに拡張) から 最尤推定法で学習した この音楽専用 HMM(muc) を 無音 非音声モ 2 の話者識別は 音声認識と同様の音響特徴量を入力として 音声認識と並行して話 図 1- デル () と並列に前記男女並列音素ネットワークへ加え (図 3) 累積尤度比に基づいて発 3 の言語モデル自動更新は ウェブ上のニュースから最新のニュース 者を識別する 図 1- 話区間検出と同時に音楽区間も検出する 音楽と判定された区間の音声は 後段の男女並列 テキストを取得し 言語モデルを逐次更新する 各ブロックから出力された発話区間 発話 連続音声認識には送られず マークを音声認識結果として出力する 2 c 2009 Information Procesng Society of Japan

2.3 8),9) ( 4) 2.4 10) BIC(Bayean Information Criterion; ) BIC 11) BIC x, y Σ N BIC(x, y) = 1 [ ] Nxy og Σxy Nx og Σx Ny og Σy αp (1) 2 Σxy x y P, α BIC x y (1) x y (1) x 1 y BIC 2.5 (NHK ) ( ) trigram trigram 12) (12.5M) 2.6 13) (MPE; Minimum Phone Error) 18) 14) 15) 16),17) bigram trigram trigram 17) pivot 3. 3.1 2009 5 20 23 NHK / 1 53 / 3 c 2009 Information Procesng Society of Japan

1 Tabe 1 Evauation Data Tabe 2 2 Overa Recognition Resuts ( ) (%) (53 ) 532.2 8.4k 105.7k 28.8 0.51 465.1 6.8k 92.9k 22.6 0.35 5.8 139 1.2k 88.6 0.33 0.7 46 182 932.6 2.74 56.6 1.4k 11.4k 172.2 1.88 ( ) 4.1 ( 0.8%) 3.2 3.2.1 1 tree exicon bigram 2 trigram MPE 18) 8) ( 340 250 ) 0.5 MPE 10 12 MFCC+ 1 2 39 ( ) 660 (202.3M ). 60k (20 ) 2 9.2% / 5.3% ( 3) WER (%) 8.4k 95.8k 9.2 / 7.0k 85.9k 5.3 1.4k 9.9k 43.6 3 (%) Tabe 3 Recognition Resuts (WER, %) / 4.2 6.5 5.1 6.1 91.1-40.2 74.3 - - - 43.6 4 (%) Tabe 4 Speaker Diarization Resuts (%) DER MS FS SE 13.4 0.1 0.5 12.8 NHK 5.2 0.1 0.5 4.7 NHK 1 15.3 0.1 0.5 14.7 / 5.1% 5% 19) / 3.2.2 (FRR; Fase Rejection Rate) 21.3%(115 /540 ) (FAR; Fase Acceptance Rate) 26.0%(149 /573 ) 4 c 2009 Information Procesng Society of Japan

5 (%) Tabe 5 Speaker Diarization Resuts (Known Speakers, %) FRR FAR 32.2 7.7 NHK 19.0 12.3 NHK 1 35.1 7.7 6 (unigram, %) Tabe 6 Known Item Retrieva Resuts (unigram, %) unigram( ) unigram( ) F F 0.0 89.2 94.3 91.7 83.4 97.2 89.8 0.5 95.6 92.8 94.2 96.2 97.2 96.7 0.9 96.3 91.7 93.9 96.5 90.1 93.2 7 (bigram, %) Tabe 7 Known Item Retrieva Resuts (bigram, %) bigram( ) bigram( ) F F 0.0 94.3 90.6 92.4 87.2 79.0 82.9 0.5 94.6 90.5 92.5 92.0 78.4 84.7 0.9 94.6 89.2 91.8 94.3 70.0 80.4 = 98.7% 3.2.3 ( ) 2009 4 NHK (1) α 0.75 1.0 2009 4 NHK NHK 24 NHK 1 11 4 4 NIST (2) DER(Diarization Error Rate) 20) DER DER = FS + MS + SE 100 (3) FS(Fase Speech) MS(Missed Speech) SE(Speech Error) ( 5) 4 5 NHK NHK 1 DER,FRR NHK 1 NHK NHK FAR 1 NHK 1 NHK 3.2.4 (Known Item Retrieva) (precion) (reca) F (F-measure) unigram,bigram 20 unigram,bigram / / 6, 7 unigram = 0.5 F 95 bigram unigram F bigram = 0.5 F 84.7 / 5.3% unigram unigram F 5 c 2009 Information Procesng Society of Japan

4. 9.2% unigram F 95 / 1) Vo.63, No.3, pp.331 338 (2008). 2),,,,,,,,,,,, CurioView :, 7-5 (2008). 3) Renas, S., Abberey, D., Kirby, D. and Robinson, T.: Indexing and Retrieva of Broadcast News, Speech Communication, Vo.32, pp.5 20 (2000). 4) Federico, M.: A System for the Retrieva of Itaian Broadcast News, Speech Communication, Vo.32, pp.37 47 (2000). 5) Dowman, M., Taban, V., Cunningham, H. and Popov, B.: Web-Assted Annotation, Semantic Indexing and Search of Teevion and Radio News, Proc. the 14th Internationa Word Wide Web Conference, pp.225 234 (2005). 6) PodCaste : 2.0 (2007-SLP-65) Vo.2007, No.11, pp.35 40 (2007). 7) PodCaste : Web 2.0 (2007-SLP-65) Vo.2007, No.11, pp.41 46 (2007). 8) 2 (2008). 9) Imai, T., Sato, S., Homma, S., Onoe, K. and Kobayashi, A.: Onine Speech Detection and Dua-Gender Speech Recognition for Captioning Broadcast News, IEICE Trans. Information and Systems, Vo.E90-D, No.8, pp.1286 1291 (2007). 10) Liu, D. and Kubaa, F.: Fast Speaker Change Detection for Broadcast News Transcription and Indexing, Proc. EUROSPEECH 99, Vo.3, pp.1031 1034 (1999). 11) Chen, S. and Gopaakrishnan, P.: Speaker, environment and channe change detection and custering via the Bayean information criterion, Proc. DARPA Speech Recognition Workshop, pp.127 132 (1998). 12) Vo.40, No.4, pp.1421 1429 (1999). 13) Vo.108-338, pp.225 260 (2008). 14) Cheba, C. and Acero, A.: Potion specific posterior attices for indexing speech, Proc. the 43rd Annua Meeting on ACL, pp.443 450 (2005). 15) Meng, S., Peng, Y., Seide, F. and Liu, J.: A Study of Lattice-Based Spoken Term Detection for Chinese Spontaneous Speech, ASRU IEEE Workshop, pp. 635 640 (2007). 16) Mangu, L., Bri, E. and Stocke, A.: Finding Consensus in Speech Recognition: Word Error Minimization and Other Appications of Confuon Networks, Computer Speech and Language, Vo.14, No.4, pp.373 400 (2000). 17) Hakkani-Tür, D., Bechet, F., Riccardi, G. and Tur, G.: Beyond ASR 1-best: Ung Word Confuon Networks in Spoken Language Understanding, Computer Speech and Language, Vo.20, No.4, pp.495 514 (2006). 18) Povey, D. and Woodand, P.: Minimum phone error and I-smoothing for improved discriminative training, Proc. ICASSP, pp.i 105 108 (2002). 19) 10 2 (2006). 20) http://www.nist.gov/speech/test/rt 6 c 2009 Information Procesng Society of Japan