Similar documents
7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener VocaListener1 VocaListener1 Voca

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

HITACHI 液晶プロジェクター CP-AX3505J/CP-AW3005J 取扱説明書 -詳細版- 【技術情報編】

取扱説明書 -詳細版- 液晶プロジェクター CP-AW3019WNJ

IPSJ SIG Technical Report Vol.2019-MUS-123 No.23 Vol.2019-SLP-127 No /6/22 Bidirectional Gated Recurrent Units Singing Voice Synthesi

1/68 A. 電気所 ( 発電所, 変電所, 配電塔 ) における変圧器の空き容量一覧 平成 31 年 3 月 6 日現在 < 留意事項 > (1) 空容量は目安であり 系統接続の前には 接続検討のお申込みによる詳細検討が必要となります その結果 空容量が変更となる場合があります (2) 特に記載

007 0 ue ue b 6666 D

空き容量一覧表(154kV以上)

2/8 一次二次当該 42 AX 変圧器 なし 43 AY 変圧器 なし 44 BA 変圧器 なし 45 BB 変圧器 なし 46 BC 変圧器 なし

2001 Mg-Zn-Y LPSO(Long Period Stacking Order) Mg,,,. LPSO ( ), Mg, Zn,Y. Mg Zn, Y fcc( ) L1 2. LPSO Mg,., Mg L1 2, Zn,Y,, Y.,, Zn, Y Mg. Zn,Y., 926, 1

HITACHI 液晶プロジェクター CP-EX301NJ/CP-EW301NJ 取扱説明書 -詳細版- 【技術情報編】 日本語

学習の手順

05‚å™J“LŁñfi~P01-06_12/27

paper.dvi

H(ω) = ( G H (ω)g(ω) ) 1 G H (ω) (6) 2 H 11 (ω) H 1N (ω) H(ω)= (2) H M1 (ω) H MN (ω) [ X(ω)= X 1 (ω) X 2 (ω) X N (ω) ] T (3)

8

さぬきの安全2016-cs5-出力.indd

看護学科案内'16/表紙

補足情報

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

10_08.dvi

取扱説明書 [F-08D]

Run-Based Trieから構成される 決定木の枝刈り法


2 HMM HTK[2] 3 left-to-right HMM triphone MLLR 1 CSJ 10 1 : 3 1: GID AM/CSJ-APS/hmmdefs.gz

欧州特許庁米国特許商標庁との共通特許分類 CPC (Cooperative Patent Classification) 日本パテントデータサービス ( 株 ) 国際部 2019 年 7 月 31 日 CPC 版が発効します 原文及び詳細はCPCホームページのCPC Revision

ٽ’¬24flNfix+3mm-‡½‡¹724

2 Excel =sum( ) =average( ) B15:D20 : $E$26 E26 $ =A26*$E$26 $ $E26 E$26 E$26 $G34 $ E26 F4


() () () () () 175 () Tel Fax

(4) P θ P 3 P O O = θ OP = a n P n OP n = a n {a n } a = θ, a n = a n (n ) {a n } θ a n = ( ) n θ P n O = a a + a 3 + ( ) n a n a a + a 3 + ( ) n a n

美唄市広報メロディー2014年1月号

1 1 tf-idf tf-idf i


Number

欧州特許庁米国特許商標庁との共通特許分類 CPC (Cooperative Patent Classification) 日本パテントデータサービス ( 株 ) 国際部 2019 年 1 月 17 日 CPC 版のプレ リリースが公開されました 原文及び詳細はCPCホームページの C

2016東奥義塾高等学校スクールガイド

01.P28-01

Catalog No.AR006-e DIN EN ISO 9001 JIS Z 9901 Certificate: 販売終了

取扱説明書 [F-12C]

PSCHG000.PS


¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

VoiceMaker-1.1 ― HMM音声合成用音響モデルの構築

TCX γ 0.9,, H / H, [4], 3. 3., ( /(,,,,,,, Mel Log Spectrum Approximation (MLSA [5],, [6], [7].,,,,,,, (,,, 3.,,,,,,,, sinc,,, [8], W, ( Y ij Y ij W l

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

‚å™J‚å−w“LŁñfi~P01†`08

HTS Slides

日立液晶プロジェクター CP-AW2519NJ 取扱説明書- 詳細版-


pp d 2 * Hz Hz 3 10 db Wind-induced noise, Noise reduction, Microphone array, Beamforming 1

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

取扱説明書[F-09E]

a (a + ), a + a > (a + ), a + 4 a < a 4 a,,, y y = + a y = + a, y = a y = ( + a) ( x) + ( a) x, x y,y a y y y ( + a : a ) ( a : a > ) y = (a + ) y = a


BD = a, EA = b, BH = a, BF = b 3 EF B, EOA, BOD EF B EOA BF : AO = BE : AE, b : = BE : b, AF = BF = b BE = bb. () EF = b AF = b b. (2) EF B BOD EF : B

Taro10-岩手県警察航空隊の運営及

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

Crossover&Fusion_vol3~4.indd

Transcription:

11 22 33 12 23 1 2 3, 1 2, U2 3 U 1 U b 1 (o t ) b 2 (o t ) b 3 (o t ), 3 b (o t )

MULTI-SPEAKER SPEECH DATABASE Training Speech Analysis Mel-Cepstrum, logf0 /context1/ /context2/... Context Dependent HMMs (Average Voice Model)

Average Voice Model Speaker Adaptation Adapted Model ADAPTATION DATA

Adapted Model Sentence HMM TEXT c 1 c 2 p 1 p 2 F0 PARAMETER GENERATION Mel-Cepstrum Excitation MLSA Filter SYNTHESIZED SPEECH

F0 no yes no yes no yes no yes no yes no yes no yes

MDL Yes No Yes No Clustering Context Dependent HMMs

y y n n

a-b-a a-a-b b-b-a y n b-a-a b-a-b a a b a a-b-a a-a-b b-b-a b-a-a b-a-a

y a-b-a a-a-b b-b-a a n y b-a-a b-a-b a n a-b-a a-a-b b-b-a a b-a-a b-a-b

y y n n Average Voice Model

HMM ATRB 16kHz 5ms 25ms 024 left-to-right 42 0.4

50 100 150 200 250 300 FKN FKS FYM MHO MHT MYI A B C D E F A,B B,C C,D D,E E,F F,G A~C B~D C~E D~F E~G F~H A~D B~E C~F D~G E~H F~I A~E B~F C~G D~H E~I A,F~I A~F B~G C~H D~I A,E~I A,B,F~I AI

50 F0 (A) (B) (A) (B) 419 1011 37 ( 8%) 505 (50%) 14 ( 3%) 197 (19%) 548 818 0 (0%) 0 (0%) 0 (0%) 0 (0%) (A) (B)

Frequency [Hz] 300 200 150 100 0 1 2 3 4 Time [s]

13 538

sentences per speaker 50 100 150 200 250 300 15.9 84.1 17.1 82.9 18.3 81.7 30.0 70.0 17.5 82.5 27.2 72.8 0 20 40 60 80 100 score[%]

話者適応学習 (SATアルゴリズム) 話者適応に適した 平均声モデルを作成するための 話者正規化学習アルゴリズム

/a/ Average Voice Speaker 1 Speaker 2 logf0

/a/ Average Voice Speaker 1 Speaker 2 logf0 Speaker Adaptive Training [T. Anastasakos et al., 96]

[C.J. Leggetter et al., 96] m m Acoustic Space Dimension 2 Average Voice 2 1 ˆ 2 W ˆ 1 Speaker A Acoustic Space Dimension 1

Speaker 1 Speaker 2 Average Voice Model W i Speaker 3

Context Dependent Model (SI) Tied Context Dependent Model (SI) Context Dependent Models (SD) Tied Context Dependent Model (SI) Average Voice Model Average Voice Model SI SD

Average Voice Model (NONESATSTCSTC+SAT) Speaker Adapted Model MMY FTK Speaker Dependent Model MMY FTK

53 5 4 3 2 1

NONE 2.65 SAT 2.79 STC 3.01 STC+SAT 3.52 NONE 2.33 SAT 2.66 STC 2.95 STC+SAT 3.43 SD 3.84 MMY SD 4.02 1 2 3 4 5 FTK 1 2 3 4 5 Score SD

NONE SATSAT STCSTC STCSATSTC+SAT SD

HSMMに基づく 話者適応アルゴリズム 隠れセミマルコフモデルに基づく スペクトル F0 音韻継続長の 同時適応アルゴリズム

11 22 33 12 23 1 2 3, 1 2, U2 3 U 1 U b 1 (o t ) b 2 (o t ) b 3 (o t ), 3 b (o t )

[J.D. Ferguson 80, S.E. Levinson 86] p(d 1 ) p(d 2 ) p(d 3 ) 1 2 3 p (d i ) b i(o t ) b 1 (o t ) b 2 (o t ) b 3 (o t )

HSMM 1 2 3 1 2 3 1 2 3 d time

[J. Yamagishi et. al. 04] W X Acoustic Space Dimension 2 Average Voice Model Speaker A Acoustic Space Dimension 1

[J. Yamagishi et. al. 04]

Threshold Target Speaker s Model Average Voice Model

[J. Yamagishi et. al. 05] p(d 1 ) p(d 2 ) p(d 3 ) 1 2 3 p (d i ) b i(o t ) b 1 (o t ) b 2 (o t ) b 3 (o t )

[J. Yamagishi et. al. 05] Average Voice Model Speaker 1 Speaker 2 X 1 X2 W 1 W 2 X 3 W 3 Speaker 3 W i X i

Δ, Δ

9.0 MHO MYI Average mora/sec 8.5 8.0 7.5 MHT MSH MMY FKS FYM FKN FTY MTK FTK 7.0 4.0 4.5 5.0 5.5 6.0 Average logarithm of F0

73 Average log-likelihood per frame 72 71 70 69 0 Both Output Duration None 50 100 150 200 250 300 350 400 450 Number of Sentences

9.0 Average mora/sec 8.5 8.0 7.5 Average Voice (Male Speakers) MTK(MLLR) Average Voice (Female Speakers) FTK MTK FTK(MLLR) 7.0 4.0 4.5 5.0 5.5 6.0 Average logarithm of F0

RMSE of logf0 [cent] 400 350 300 250 Average Voice SD MLLR 200 0 50 100 150 200 250 300 350 400 450 Number of Sentences

8 SD Mel-cepstrum Distance [db] 7 6 5 Average Voice MLLR 4 0 50 100 150 200 250 300 350 400 450 Number of Sentences

11 RMSE of Vowel Duration [frame] 10 9 8 7 6 5 Average Voice SD MLLR 4 0 50 100 150 200 250 300 350 400 450 Number of Sentences

5 4 3 2.5 3.3 2.6 3.6 2.9 2 1.6 1.5 1.6 1.5 1 Spectrum F0 Duration SD SD SD Average Voice Adaptation

Spectrum Spectrum +F0 Spectrum +F0 +Duration 0 10 20 30 40 50 60 70 80 90 100 Score (%)

5 4 3 2.5 3.3 2.6 3.6 2.9 2 1.6 1.5 1.6 1.5 1 Spectrum F0 Duration SD SD SD Average Voice Adaptation

1. J. Yamagishi and T. Kobayashi, Simultaneous Speaker Adaptation Algorithm of Spectrum, Fundamental Frequency and Duration for HMM-based Speech Synthesis, IEICE Trans. Information and Systems. (in preparation) 2. J. Yamagishi, Y. Nakano, K. Ogata, J. Isogai, and T. Kobayashi, A Unified Speech Synthesis Method Using HSMM-Based Speaker Adaptation and MAP Modification, IEICE Trans. Information and Systems. (in preparation) 3. J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, Acoustic Modeling of Speaking Styles and Emotional Expressions in HMM-based Speech Synthesis, IEICE Trans. Information and Systems, E88-D, vol.3, pp.503 509, March 2005. 4. J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, A Training Method of Average Voice Model for HMM-based Speech Synthesis, IEICE Trans. Fundamentals, E86-A, no.8, pp.1956 1963, Aug. 2003. 5. J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, A Context Clustering Technique for Average Voice Models, IEICE Trans. Information and Systems, E86-D, no.3, pp.534 542, March 2003

1. J. Yamagishi, K. Ogata, Y. Nakano, J. Isogai, and T. Kobayashi, HSMM-based Model Adaptation Algorithms for Average-Voice-based Speech Synthesis, Proc. ICASSP 2006, May 2006 (submit). 2. J. Yamagishi, and T. Kobayashi, Adaptive Training for Hidden Semi-Markov Model, Proc. ICASSP 2005, vol.i, pp.365 368, March 2005. 3. J. Yamagishi, T. Masuko, and T. Kobayashi, MLLR Adaptation for Hidden Semi-Markov Model Based Speech Synthesis, Proc. ICSLP 2004, vo.ii, pp.1213 1216, October 2004. 4. J. Yamagishi, M. Tachibana, T. Masuko, and T. Kobayashi, Speaking Style Adaptation Using Context Clustering Decision Tree for HMM-based Speech Synthesis, Proc. ICASSP 2004, vol.i, pp.5 8, May 2004. 5. J. Yamagishi, T. Masuko, and T. Kobayashi, HMM-based Expressive Speech Synthesis Towards TTS with Arbitrary Speaking Styles and Emotions, Special Workshop in Maui (SWIM), January 2004. 6. J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, Modeling of Various Speaking Styles and Emotions for HMM-based Speech Synthesis, Proc. EUROSPEECH 2003, vol.iii, pp.2461 2464, September 2003. 7. J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, A Training Method for Average Voice Model Based on Shared Decision Tree Context Clustering and Speaker Adaptive Training, Proc. ICASSP 2003, vol.i, pp.716 719, April 2003. 8. J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, A Context Clustering Technique for Average Voice Model in HMM-based Speech Synthesis, Proc. ICSLP 2002, vol.1, pp.133 136, September 2002.