1. 1 2 1 3 2 HMM Rap-style Singing Voice Synthesis Keijiro Saino, 1 Keiichiro Oura, 2 Makoto Tachibana, 1 Hieki Kenmochi 3 an Keiichi Tokua 2 This paper aresses rap-style singing voice synthesis. Since it has not been very clear how to write a musical score for rap-style songs, existing singing voice synthesis systems base on musical scores are not suitable for synthesizing them with an intuitive input. Here a new type of musical score specialize for a rap-style is efine. An HMM-base singing voice synthesis system is use to realize an automatic synthesis of realistic rap-style singing. Glissano phenomenon which is special for the style coul be foun in synthesis results. It was also trie to apply pitch parameters generate from the HMMs to a sample-concatenation-base singing voice synthesis system. 1) HMM 2) VOCALOID 3) 2 3 HMM 4 5 HMM 6 1 Corporate Research an Development Center, Yamaha Corporation 2 Department of Computer Science an Engineering, Nagoya Institute of Technology 3 yamaha+ yamaha+ Division, Yamaha Corporation 1 c 2012 Information Processing Society of Japan
log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu 2.1 1 (1) (2) (3) (4) (1) (2) motsu (1), (2), (4) motsu 2 (1), (2) motsu (1) (2) メロディ構造の意識のない, しゃべり 葉のようなイントネーション Fig. 2 2 motsu Examples of log F 0 series of rap-style singing voice by motsu. motsu (4) (1) (4) (1), (2) 2.2 (1) (2) 2 c 2012 Information Processing Society of Japan
C#(+5) B(+3) G#(root) F#(-2) D#(-5) ぶし ʼ (A) (B) 2 下降 向 3. HMM つめ以降の 符 ( レッツ ʼ グリッサンド 符は通常 符と同様の単位で, 任意の さをもちうる 3 Fig. 3 The efine musical notation rules. motsu 2.1 motsu 3 16 8 3 2 5-5, -2, 0, +3, +5 1 VOCALOID 3) HMM 2) HMM 2.2 HMM HMM 2 HMM HMM 6) 3 c 2012 Information Processing Society of Japan
1 Table 1 Singing voice ata use for moel training. / motsu 11 21 6 BPM 92 130 motsu 48kHz/16bit 49 STRAIGHT 5 ms SWIPE 7) 5 ms & ' Root = C#3 HMM HMM MLSA 4. 2.2 4.1 HMM motsu 2.2 13 13 motsu motsu 13 motsu 1 (1) (4) (1), (2) motsu motsu 13 11 4 (BPM 128) Fig. 4 A part of input rap score an contour of generate log F 0. (BPM 128) 2 1 4.2 HMM (Hien Semi-Markov Moels; HSMM) 8) left-to-right 5 HMM 4 HMM 4.3 2 4 2 2 4 c 2012 Information Processing Society of Japan
Table 2 2 Subjective evaluation methos. Root = C#3 Root = C#3 A B C D ' el 信頼度区間 DMOS ' Fig. 5 5 Subjective evaluation results. 6 (BPM 100) Fig. 6 Generate log F 0 contour on each experimental conition (BPM 100). 4 19 1 5 5 (Degraation Mean Opinion Score; DMOS) 10 5 6 A D 6 5 4.4 /a/ /o/ /a/ 5. HMM VOCALOID 3) 4 5 c 2012 Information Processing Society of Japan
7 HMM VOCALOID Fig. 7 An example of VOCALOID pitch parameters converte from the parameters generate from the HMMs. VOCALOID VOCALOID VOCALOID 7 HMM VOCALOID HMM VocaListener 9) 6. HMM 2 VOCALOID HMM 7. motsu 1) H. Kenmochi, VOCALOID an Hatsune Miku phenomenon in Japan, Proc.InterSinging 2010, pp.1 4, 2010. 2) K.Oura, A.Mase, T.Yamaa, S.Muto, Y.Nankaku, an K.Tokua, Recent Development of the HMM-base Singing Voice Synthesis System - Sinsy, Proc.SSW7, pp.211 216, 2010. 3) H.Kenmochi an H.Ohshita, VOCALOID-Commercial Singing Synthesizer Base on Sample Concatenation, Proc.INTERSPEECH 2007, pp.4011 4010, 2007. 4) [DVD BOOK], (2005). 5) M.O.V.E Official Website, http://electropica.com/inex.html. 6),,,, HMM,, vol.i, 1-8-20, pp.283 284, 2010. 7) A.Camacho, SWIPE: A Sawtooth Waveform Inspire Pitch Estimator for Speech an Music, Ph.D.Thesis, University of Floria, 2007. 8) H.Zen, T.Masuko, K.Tokua, T.Kobayashi, an T.Kitamura, A Hien Semi- Markov Moel-Base Speech Synthesis System, Proc.IEICE Trans., vol.90-d, no.5, pp.825 834, 2007. 9) T. Nakano, an M. Goto, VocaListener: A Singing-to-Singing Synthesis System Base on Iterative Parameter Estimation, Proc.SMC 2009. pp.343 348, 2009. 6 c 2012 Information Processing Society of Japan