7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener VocaListener1 VocaListener1 Voca

Similar documents
log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot


(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

[2][3][4][5] 4 ( 1 ) ( 2 ) ( 3 ) ( 4 ) 2. Shiratori [2] Shiratori [3] [4] GP [5] [6] [7] [8][9] Kinect Choi [10] 3. 1 c 2016 Information Processing So

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

3_23.dvi

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

1. HNS [1] HNS HNS HNS [2] HNS [3] [4] [5] HNS 16ch SNR [6] 1 16ch 1 3 SNR [4] [5] 2. 2 HNS API HNS CS27-HNS [1] (SOA) [7] API Web 2

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

10_08.dvi

fiš„v8.dvi

28 Horizontal angle correction using straight line detection in an equirectangular image

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

<95DB8C9288E397C389C88A E696E6462>

Fig. 2 Signal plane divided into cell of DWT Fig. 1 Schematic diagram for the monitoring system

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

( )

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

1: ( 1) 3 : 1 2 4

大学における原価計算教育の現状と課題

Vol. 42 No MUC-6 6) 90% 2) MUC-6 MET-1 7),8) 7 90% 1 MUC IREX-NE 9) 10),11) 1) MUCMET 12) IREX-NE 13) ARPA 1987 MUC 1992 TREC IREX-N

,,,,., C Java,,.,,.,., ,,.,, i

21 Key Exchange method for portable terminal with direct input by user

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

_念3)医療2009_夏.indd

日本感性工学会論文誌

DT pdf

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

untitled

3D UbiCode (Ubiquitous+Code) RFID ResBe (Remote entertainment space Behavior evaluation) 2 UbiCode Fig. 2 UbiCode 2. UbiCode 2. 1 UbiCode UbiCode 2. 2

B HNS 7)8) HNS ( ( ) 7)8) (SOA) HNS HNS 4) HNS ( ) ( ) 1 TV power, channel, volume power true( ON) false( OFF) boolean channel volume int

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

474 Nippon Shokuhin Kagaku Kogaku Kaishi Vol. /-, No.3,.1..2* (,**0) 24 Measurement of Deterioration of Frying Oil Using Electrical Properties Yoshio

Transcription:

VocaListener2: 1 1 VocaListener2 VocaListener VocaListener2 VocaListener2 VocaListener VocaListener2 VocaListener2: A Singing Synthesis System Mimicking Voice Timbre Changes in Addition to Pitch and Dynamics of User s Singing Tomoyasu Nakano 1 and Masataka Goto 1 In this paper, we propose a singing synthesis system, VocaListener2, that automatically synthesizes a singing voice by mimicking timbre changes of a user s singing voice The system extends our previous system called VocaListener that can estimate singing synthesis parameters of only pitch (F 0 ) and dynamics (power) from the user s singing voice Although most previous techniques for manipulating voice timbre have focused on voice conversion and voice morphing, they cannot deal with the timbre changes during singing To develop VocaListener2, we first construct a voice timbre space on the basis of various singing voices that mimic the pitch and dynamics of the user s singing voice by using the VocaListener In this space, the timbre changes can be reflected to the synthesized singing voice In our experiences with singing synthesis systems on the market, we found the timbre changes as well as the pitch and dynamics can be mimicked 1 2007 1) Web 2),3) VocaListener 4),5) VocaListener2 VocaListener VocaListener1 VocaListener1 6) 1 National Institute of Advanced Industrial Science and Technology (AIST) 1 c 2010 Information Processing Society of Japan

7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener2 4 5 6 2 VocaListener1 VocaListener1 VocaListener2 21 VocaListener1: 4),5) VocaListener1 1 21) 1 VocaListener1 VocaListener1 VocaListener1 5) 2 3 22 VocaListener1 1 Vocaloid1 Note Velocity, Resonance, Harmonics, Noise, Brightness, Clearness, Gender Factor, Vocaloid2 VEL, BRE, BRI, CLE, OPE, GEN 2 http://staffaistgojp/tnakano/vocalistener/index-jhtml 3 http://wwwnicovideojp/mylist/7012071/ 2 c 2010 Information Processing Society of Japan

Vocaloid Vocaloid2 6) (MIKU Append) 1 2 DARK, LIGHT, SOFT, SOLID, SWEET, VIVID 6 LIGHT SOLID VocaListener1 23 (1): (2): 2 2 1 http://wwwcryptoncojp/cv01a/ 2 http://wwwcryptoncojp/mp/pages/prod/vocaloid/cv01jsp 2 3 VocaListener2: VocaListener2 31 23 (1) VocaListener1 : (2) VocaListener1 M t J M z j=1,2,,j (t) 3 J z j (t) M u(t) 3 7 J = 7 3 c 2010 Information Processing Society of Japan

u (t) VocaListener2 32 VocaListener2 3 VocaListener2 Z Z 1 Z 4 VocaListener1 A F 0 B 23 F 0 F 0 M C Z 1 Z 4 D E 33 : B F 0 STRAIGHT 22) STRAIGHT 7) 34 : C 4 VocaListener1 VocaListener1 3 VocaListener2 A B C D VocaListener1 X Y Z1 Z2 Z3 Z4 E VocaListener2 STRAIGHT 1 2 23) 24) 23),24) : : N M 1 F 0 STRAIGHT 2 4 c 2010 Information Processing Society of Japan

1 4 23),24) 35 : D 01 36 : E DARK, LIGHT, SOFT, SOLID, SWEET, VIVID 5 1 5 DARK, LIGHT, SOFT, SOLID, SWEET, VIVID 7),19) Radial Basis Function Variational Interpolation 25) t f Z j=1,2,,j (f, t) Z 1(f, t) Zr j(f, t) u(t) z j (t) 5 c 2010 Information Processing Society of Japan

( ) Zj (f, t) Zr j (f, t) = log (1) Z 1 (f, t) I g(u(t); f, t) = (w k (f, t) ϕ (u(t) z k (t))) + P (u(t); f, t) (2) Zr j(f, t) = k=1 J (w k (f, t) ϕ (z j(t) z k (t))) + P (z j(t); f, t) (3) k=1 g(z j (t); f, t) = Zr j (f, t) (4) M P (x; f, t) = p 0 + p m x (m) (5) m=1 Zr i(f, t) (1) w j P ( ) (5) x z j (t) u(t) M p m=0,,m ϕ( ) ϕ( ) = 1 (4) M = 3 ϕ 11 ϕ 12 ϕ 1J 1 z (1) 1 z (2) 1 z (3) 1 w 1 Zr 1 ϕ 21 ϕ 22 ϕ 2J 1 z (1) 2 z (2) 2 z (3) 2 w 2 Zr 2 ϕ J1 ϕ J2 ϕ JJ 1 z (J) I z (2) J z (3) J 1 1 1 0 0 0 0 z (1) 1 z (1) 2 z (1) J 0 0 0 0 z (2) 1 z (2) 2 z (2) J 0 0 0 0 z (3) 1 z (3) 2 z (3) J 0 0 0 0 w J p 0 p 1 p 2 p 3 = ϕ ij ϕ(z i (t) z j (t)) (f, t) (t) w j p m (2) 1 ϕ( ) = 2 log( ) ϕ( ) = 3 Zr J 0 0 0 0 (6) STRAIGHT 37 : ( 1 ) ( 2 ) ( 3 ) 4 41 RWC RWC-MDB-G-2001 26) No91 441kHz 1 msec Vocaloid Vocaloid2 6) 17 3 2 1 14 3 7 STRAIGHT 2 KAITO (Vocaloid1) (Vocaloid2) 3 MEIKO (Vocaloid1)6 SF-A2 mikivocaloid 2 6 c 2010 Information Processing Society of Japan

1 17 N R [%] 50 55 60 65 70 75 80 85 90 95 129 162 197 240 289 348 418 504 616 770 101 127 151 184 220 264 314 377 459 573 6 M F 0 FFT 4096 0 80 80 STRAIGHT 80% N 3 M = 3 F 0 42 A: 55 sec 17 R% N N 1 M 6 2 7 1 77 7 2 RWC-MDB-G-2001 No9155 sec 6 17 7 2 7 7 c 2010 Information Processing Society of Japan

8 1 41 95% 7 1 1 VIVID SOLID LIGHT DARK SOFT SWEET 7 43 B: 55 sec VocaListener1 Closed GEN 90 2 Open 8 Closed Open Closed LIGHT SOLID VIVID DARK SOFT 7 9 5 3 8 9 LIGHT, DARK, SOLID, SOFT, VIVID, SWEET GEN GEN=90 2 1 51 (1): 8 c 2010 Information Processing Society of Japan

6 6 VIVID 52 (2): GEN 6 VocaListener2 VocaListener2 4) VocaListener2 CrestMuse RWC ( RWC-MDB-G-2001) [1] Cabinet Office, Government of Japan: Virtual Idol, Highlighting JAPAN through images, Vol 2, No 11, pp 24 25 (2009) http://wwwgov-onlinegojp/pdf/hlj img/vol 0020et/24-25pdf [2] Vol 25, No 1, pp 157 167 (2010) [3] 2009pp 118 124 (2009) [4] VocaListener: 2008-MUS-75-9 Vol 2008, No 12, pp 51 58 (2008) [5] Nakano, T and Goto, M: VocaListener: A singing-to-singing synthesis system based on iterative parameter estimation, Proc SMC 2009, pp 343 348 (2009) [6] VOCALOID 2008-MUS-74-9Vol 2008, No 12, pp 51 58 (2008) [7] Vol 48, No 12, pp 3637 3648 (2007) [8] emorish http://wwwcrestmusejp/cmstraight/personal/emorish/ [9] Toda, T, Black, A and Tokuda, K: Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, IEEE Trans on Audio, Speech and Language Processing, Vol 15, No 8, pp 2222 2235 (2007) [10] STRAIGHT 9 c 2010 Information Processing Society of Japan

Vol J91-D, No 4, pp 1082 1091 (2008) [11] Schröder, M: Emotional speech synthesis: A review, Proc Eurospeech 2001, pp 561 564 (2001) [12] Iida, A, Campbell, N, Higuchi, F and Yasumura, M: A corpus-based speech synthesis system with emotion, Speech Communication, Vol 40, Iss 1 2, pp 161 187 (2003) [13] Tsuzuki, R, Zen, H, Tokuda, K, Kitamura, T, Bulut, M and Narayanan, S S: Constructing emotional speech synthesizers with limited speech database, Proc ICSLP 2004, pp 1185 1188 (2004) [14] F 0 Vol J89-D, No 8, pp 1811 1819 (2006) [15] Vol 50, No 3, pp 1181 1191 (2009) [16] Türk, O and Schröder, M: A comparison of voice conversion methods for transforming voice quality in emotional speech synthesis, Proc Interspeech 2008, pp 2282 2285 (2008) [17] Nose, T, Tachibana, M and Kobayashi, T: HMM-based style control for expressive speech synthesis with arbitrary speaker s voice using model adaptation, IEICE Trans on Information and Systems, Vol E92-D, No 3, pp 489 497 (2009) [18] Inanoglua, Z and Young, S: Data-driven emotion conversion in spoken English, Speech Communication, Vol 51, Is 3, pp 268 283 (2009) [19] 1-4-9pp 229 230 (2006) [20] Vol 51, No 2, pp 250 264 (2010) [21] Janer, J, Bonada, J and Blaauw, M: Performance-driven control for samplebased singing voice synthesis, Proc of the 9th Int Conference on Digital Audio Effects (DAFx-06), pp 41 44 (2006) [22] Kawahara, H, Masuda-Katsuse, I and de Cheveigne, A: Restructuring speech representations using a pitch adaptive time-frequency smoothing and an instantaneous frequency based on F0 extraction: Possible role of a repetitive structure in sounds, Speech Communication, Vol 27, pp 187 207 (1999) [23] Vol J85-D2, No 4, pp 554 562 (2002) [24] SPVol 101, No 86, pp 1 6 (2001) [25] Turk, G and O Brien, J F: Modelling with implicit surfaces that interpolate, ACM Transactions on Graphics, Vol 21, No 4, pp 855 873 (2002) [26] RWC : Vol 45, No 3, pp 728 738 (2004) 10 c 2010 Information Processing Society of Japan