IPSJ-JNL

Similar documents
7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener VocaListener1 VocaListener1 Voca

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

1911 F0 5) SingBySpeaking F0 F0 F0 4 F0 2. F0 4) 5) rate extent 6) rate 5.6 [Hz] extent 87 [cent] F0 5.2 [%] F0 SingBySpeaking 7) F0 Fig. 1 1 F0 F0 co

Vol. 43 No. 2 Feb. 2002,, MIDI A Probabilistic-model-based Quantization Method for Estimating the Position of Onset Time in a Score Masatoshi Hamanaka

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

IPSJ-SLP

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

10_08.dvi

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2


動画コンテンツ 動画 1 動画 2 動画 3 生成中の映像 入力音楽 選択された素片 テンポによる伸縮 音楽的構造 A B B B B B A C C : 4) 6) Web Web 2 2 c 2009 Information Processing S

IPSJ SIG Technical Report Vol.2010-NL-199 No /11/ treebank ( ) KWIC /MeCab / Morphological and Dependency Structure Annotated Corp

音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

sigmus201007_fujihara.dvi

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

ID 3) 9 4) 5) ID 2 ID 2 ID 2 Bluetooth ID 2 SRCid1 DSTid2 2 id1 id2 ID SRC DST SRC 2 2 ID 2 2 QR 6) 8) 6) QR QR QR QR

1 UD Fig. 1 Concept of UD tourist information system. 1 ()KDDI UD 7) ) UD c 2010 Information Processing S

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

28 Horizontal angle correction using straight line detection in an equirectangular image

IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/27 F0 1,a) 1,b) 1,c) 2,d) (F0) F0 F0 Graphical User Interface (GUI) F0 1. [1] CD MIDI [2] [3,

2007-Kanai-paper.dvi

gengo.dvi

Studies of Foot Form for Footwear Design (Part 9) : Characteristics of the Foot Form of Young and Elder Women Based on their Sizes of Ball Joint Girth

IPSJ SIG Technical Report Vol.2017-MUS-116 No /8/24 MachineDancing: 1,a) 1,b) 3 MachineDancing MachineDancing MachineDancing 1 MachineDan

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

Vol.54 No (July 2013) [9] [10] [11] [12], [13] 1 Fig. 1 Flowchart of the proposed system. c 2013 Information

IPSJ SIG Technical Report Vol.2017-MUS-115 No /6/17 1,a) 1 1 WORLD F0 Vocaloid F0 ipad 1. Vocaloid [1] UTAU *1 Vocaloid Vocaloid F0 VocaListene

Vol. 48 No. 3 Mar Evaluation of Music-noise Assimilation Playback for Portable Audio Players Akifumi Inoue, Shohei Bise, Satoshi Ichimura and

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

DT pdf

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

IPSJ SIG Technical Report Secret Tap Secret Tap Secret Flick 1 An Examination of Icon-based User Authentication Method Using Flick Input for

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

(255) Vol. 19 No. 4 July (completion) tcsh bash UNIX Emacs/Mule 2 ( ) [2] [9] [11] 2 (speech completion) 3 ( ) [7] 2 ( 7.1 )

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

paper.dvi

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

MOTIF XF 取扱説明書


1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

sigmusdemo.dvi

a) Extraction of Similarities and Differences in Human Behavior Using Singular Value Decomposition Kenichi MISHIMA, Sayaka KANATA, Hiroaki NAKANISHI a

OngaCREST [10] A 3. Latent Dirichlet Allocation: LDA [11] Songle [12] Pitman-Yor (VPYLM) [13] [14,15] n n n 3.1 [16 18] PreFEst [19] F

Table 1. Assumed performance of a water electrol ysis plant. Fig. 1. Structure of a proposed power generation system utilizing waste heat from factori

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2009-DPS-141 No.23 Vol.2009-GN-73 No.23 Vol.2009-EIP-46 No /11/27 t-room t-room 2 Development of

A Japanese Word Dependency Corpus ÆüËܸì¤Îñ¸ì·¸¤ê¼õ¤±¥³¡¼¥Ñ¥¹

Vol. 43 No. 7 July 2002 ATR-MATRIX,,, ATR ITL ATR-MATRIX ATR-MATRIX 90% ATR-MATRIX Development and Evaluation of ATR-MATRIX Speech Translation System

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

1_26.dvi

第 55 回自動制御連合講演会 2012 年 11 月 17 日,18 日京都大学 1K403 ( ) Interpolation for the Gas Source Detection using the Parameter Estimation in a Sensor Network S. T

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

IPSJ SIG Technical Report Vol.2009-BIO-17 No /5/26 DNA 1 1 DNA DNA DNA DNA Correcting read errors on DNA sequences determined by Pyrosequencing

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

3 3) 6) 1) MPEG-7 2) MPEG-7 (A) (B) 2 9) Zils 10) (1) (2) 2.1 2

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

60 90% ICT ICT [7] [8] [9] 2. SNS [5] URL 1 A., B., C., D. Fig. 1 An interaction using Channel-Oriented Interface. SNS SNS SNS SNS [6] 3. Processing S

IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1

Computational Semantics 1 category specificity Warrington (1975); Warrington & Shallice (1979, 1984) 2 basic level superiority 3 super-ordinate catego

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

_念3)医療2009_夏.indd

1. HNS [1] HNS HNS HNS [2] HNS [3] [4] [5] HNS 16ch SNR [6] 1 16ch 1 3 SNR [4] [5] 2. 2 HNS API HNS CS27-HNS [1] (SOA) [7] API Web 2

<4D F736F F D EC959F90B781758BA38B5A82A982E982BD82CC897282DD82C982A882AF82E B837D EFC CC93C192A D34392E646F63>

Bull. of Nippon Sport Sci. Univ. 47 (1) Devising musical expression in teaching methods for elementary music An attempt at shared teaching

24 Depth scaling of binocular stereopsis by observer s own movements

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

,,.,,.,..,.,,,.,, Aldous,.,,.,,.,,, NPO,,.,,,,,,.,,,,.,,,,..,,,,.,

IPSJ SIG Technical Report Vol.2011-DBS-153 No /11/3 Wikipedia Wikipedia Wikipedia Extracting Difference Information from Multilingual Wiki

2) TA Hercules CAA 5 [6], [7] CAA BOSS [8] 2. C II C. ( 1 ) C. ( 2 ). ( 3 ) 100. ( 4 ) () HTML NFS Hercules ( )

it-ken_open.key

soturon.dvi

JOURNAL OF THE JAPANESE ASSOCIATION FOR PETROLEUM TECHNOLOGY VOL. 66, NO. 6 (Nov., 2001) (Received August 10, 2001; accepted November 9, 2001) Alterna

JAIST Reposi Title 既存曲に合わせて口す さまれる即興歌唱を利用した 音楽創作支援手法に関する研究 Author(s) 柳, 卓知 Citation Issue Date Type Thesis or Dissertation Te

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

Wikipedia YahooQA MAD 4)5) MAD Web 6) 3. YAMAHA 7) 8) Vocaloid PV YouTube 1 minato minato ussy 3D MAD F EDis ussy

( ) fnirs ( ) An analysis of the brain activity during playing video games: comparing master with not master Shingo Hattahara, 1 Nobuto Fuji

SICE東北支部研究集会資料(2012年)

IPSJ SIG Technical Report Vol.2017-SLP-115 No /2/18 1,a) 1 1,2 Sakriani Sakti [1][2] [3][4] [5][6][7] [8] [9] 1 Nara Institute of Scie

kubostat2018d p.2 :? bod size x and fertilization f change seed number? : a statistical model for this example? i response variable seed number : { i

[2][3][4][5] 4 ( 1 ) ( 2 ) ( 3 ) ( 4 ) 2. Shiratori [2] Shiratori [3] [4] GP [5] [6] [7] [8][9] Kinect Choi [10] 3. 1 c 2016 Information Processing So

Transcription:

Vol. 52 No. 12 3853 3867 (Dec. 2011) VocaListener 1 1 VocaListener VocaListener 2 VocaListener: A Singing Synthesis System by Mimicking Pitch and Dynamics of User s Singing Tomoyasu Nakano 1 and Masataka Goto 1 This paper presents a singing synthesis system, VocaListener, thatinterac- tively synthesizes a singing voice by mimicking pitch and dynamics of a user s singing voice. Although there is a method to estimate singing synthesis parameters of pitch (F 0 ) and dynamics (power) from a singing voice, it does not adapt to different singing synthesis conditions (e.g., different singing synthesis systems and their singer databases) or singing skill/style modifications. To deal with different conditions, VocaListener repeatedly updates singing synthesis parameters so that the synthesized singing can mimic the user s singing more closely. Moreover, VocaListener has functions to help modify the user s singing by correcting off-pitch phrases or changing vibrato. In an experimental evaluation under two different singing synthesis conditions, mean error values after the iteration were much smaller than the previous approach. 1. 2007 1) Web 2),3) 4) 7) 8) 10) HMM 11) text-to-speech TTS text-to-singing lyrics-to-singing 12) 13) 12),14) speech-to-singing 1 National Institute of Advanced Industrial Science and Technology (AIST) 3853 c 2011 Information Processing Society of Japan

3854 VocaListener 12) 15) VocaListener singing-to-singing Janer 16) VocaListener YAMAHA Vocaloid 10) 2 3 VocaListener 4 5 6 2. YAMAHA Vocaloid 10) lyrics-to-singing 1 Fig. 1 Even if the same parameters are specified, the synthesized results always differ when we change the synthesis conditions. VOCALOID 10) 1 17) 2 1 16) 2 2 VocaListener

3855 VocaListener 3. VocaListener VocaListener-core VocaListener-plus VocaListener-front-end 3 VocaListener 2 16) Fig. 2 Problems of a previous approach 16). VocaListener 2 VocaListener Janer Viterbi 16) 100% Viterbi 1 Viterbi 1 Vocaloid 10) A B VocaListener-front-end Viterbi C D E VocaListener-plus F VocaListener-core G Viterbi /tachidomaru/ H I J VocaListener-front-end K L M N O P VocaListener-front-end VocaListener-plus VocaListenercore 1

3856 VocaListener 3 VocaListener Fig. 3 System architecture of VocaListener. 3.1 VocaListener-front-end VocaListener-front-end 44.1 khz 10 msec 3.1.1 F 0 [Hz] / Gross Error SWIPE 18) F 0 MIDI f 1 60 1 Table 1 List of symbols. F 0 [Hz] f MIDI f d f t f (t) f(t) f n f(t) f (i) (t) i Δf (i) p (t) i PIT Δf (i) s (t) i PBS Δf (i) (t) i p(t) p (t) p(t) p(t) p (i) (t) p m(t) ˆp (i) (t) ɛ ɛ (i) f ɛ (i) p i DYN 64 i DYN i i F 0 f =12 log 2 +69 (1) 440 p(t) N x(t) h(t) N/2 1 ) p(t) = ( (x(t + τ) h(τ)) 2 (2) τ= N/2 N 2,048 46 ms h(t) Viterbi MeCab 19) Viterbi short pause

3857 VocaListener 2002 monophone HMM 20) MLLR Maximum Likelihood Linear Regression MAP Maximum A Posteriori Probability MLLR-MAP 21) Viterbi MLLR-MAP 16 khz HTK Speech Recognition Toolkit 22) 3.1.2 Vocaloid2 10) CV01 CV02 1 VSTi Vocaloid Playback VST Instrument 2 3.2 VocaListener-plus VocaListener-plus 2 3.2.1 23) F 0 f(t) f d 127 { } (f(t) g i)2 f d =argmax exp (3) g 2σi 2 t i=0 σ =0.17 f(t) 5Hz 3 F 0 4 24),25) 4 5Hz 8Hz 26),27) f d 0 F d < 1 { f(t) fd (0 f d < 0.5) f(t) = (4) f(t)+(1 f d ) (0.5 f d < 1) f t f(t) =f(t)+f t (5) f t +12 1 1 http://www.vocaloid.com/product.ja.html 2 10 msec VSTi 1msec 3 FIR 1.8 4

3858 VocaListener 3.2.2 4 1 2 (6) (7) f(t) 3Hz 4 F 0 f (t) p(t) p (t) 5Hz 8Hz 26),27) r v r s f(t) =r {v s} f(t)+(1 r {v s} ) f (t) (6) p(t) =r {v s} p(t)+(1 r {v s} ) p (t) (7) r v 23) r s r v = r s =1 r v > 1 r s < 1 F 0 28) r s < 1 3.2.3 4 1 4 VocaListener-plus F 0(t) Fig. 4 Examples of F 0(t) adjusted by VocaListener-plus. 3.3 VocaListener-core VocaListener-core 3 VocaListener-plus

3859 VocaListener Table 2 2 Singing synthesis parameters and those initial values. 0 127 PIT 8,192 8,191 0 PBS 0 24 1 DYN 0 127 64 3.3.1 Viterbi Vocaloid2 PIT PBS DYN MIDI DYN MIDI Expression PIT PBS DYN 2 PIT PBS PBS 1 ±1 16,384 1 12 1 3.3.3 DYN 0 127 3.3.2 5 Viterbi Step 1) 1 Viterbi Step 2) 2 5 VocaListener-core Fig. 5 The lyrics alingment procedure of VocaListener-core. Step 3) Step 4) Step 2) Step 4) MFCC 3 3.3.3 3.3.4 MFCC

3860 VocaListener 6 F 0 Fig. 6 F 0 of the target singing and estimated note numbers. 3.3.3 (1) F 0 PIT PBS ±2 PBS PBS F 0 f n 6 ( (n f n =argmax exp { }) f(t))2 (8) n 2σ 2 t 1 σ =0.33 t 3.3.4 (2) f (i) (t) f(t) PIT PBS t i PIT PBS Δf (i) p Step 1) Step 2) f (i) (t) (t) Δf s (i) (t) Step 3) f(t) Δf (i) (t) 7 4 DYN Fig. 7 Power of the target singing and power of the singing synthesized with four different dynamics. Δf (i+1) (t) =Δf (i) (t)+ ( f(t) f (i) (t) ) (9) Δf (i) (t) PIT PBS MIDI 1 Δf (i) (t) = Step 4) Δf (i+1) (t) Δf (i+1) s (i) Δf p (t) Δf s (i) (t) (10) 8192 (t) Δf (i+1) p (t) Δf s (i+1) (t) 3.3.5 (1) α 7 DYN 0 127 DYN DYN = 127 7 A 1 Δf (i) (t) F 0

3861 VocaListener 7 A p(t) DYN 64 p m(t) α ɛ 2 = (α p(t) p m(t)) 2 (11) t α (p(t) pm(t)) t α = (12) t p(t)2 Table 3 3 A B Dataset for experiment A and B and synthesis conditions. All of the song samples were sung by female singers. [sec] A No.07 1 103 CV01 A No.16 1 100 CV02 B No.07 6.0 CV01,02 B No.16 7.0 CV01,02 B No.54 8.9 CV01,02 B No.55 6.5 CV01,02 RWC-MDB-P-2001 3.3.6 (2) α DYN DYN DYN = (0, 32, 64, 96, 127) t i DYN ˆp (i) (t) DYN p (i) (t) Step 1) Step 2) p (i) (t) Step 3) ˆp (i) (t) ˆp (i+1) (t) =ˆp (i) (t)+ ( α p(t) p (i) (t) ) (13) Step 4) ˆp (i+1) (t) DYN DYN 4. VocaListener-core 4.1 VocaListener-core A B 2 RWC RWC-MDB-P-2001 29) 3 3 3 Vocaloid2 0% CV01 CV02 A 1 100 B 1 i ɛ (i) f ɛ (i) p ɛ (i) f = 1 f(t) f (i) (t) (14) T f ɛ (i) p t = 1 20 log (α p(t)) 20 log ( p (i) (t) ) (15) T p t 0 T f T p 0 B

3862 VocaListener 4 A Table 4 Number of boundary errors and number of repairs for correcting (pointing out) errors in experiment A. n n =0 n =1 n =2 n =3 No.07 CV01 166 8 5 2 0 No.16 CV02 128 3 2 0 4.2 VocaListener-core 2 A B 4.2.1 A VocaListener-front-end Viterbi No.07 No.16 2 A 4 No.07 166 8 3 /w/ /r/ /m/ /n/ 4.2.2 B 5 No.07 VocaListener i = i =0 i =0 4 i =4 6 4 5 6 Janer 16) 4 No.07 2.2 8 4.3 4 No.07 166 8 No.16 128 3 2 3 5 n [%] B No.07 Table 5 Mean error values after each iteration for song No.07 in experiment B. ɛ (i) [semitone] ɛ (i) f p [db] VocaListener i i =0 i =1 i =2 i =3 i =4 CV01 0.217 0.386 0.091 0.058 0.042 0.034 CV02 0.198 0.352 0.074 0.041 0.029 0.024 CV01 13.65 11.22 4.128 3.617 3.472 3.414 CV02 14.17 15.26 6.944 6.382 6.245 6.171 6 B Table 6 Minimum and maximum error values for all four songs in experiment B. VocaListener i i =0 i =4 0.168 0.369 0.352 1.029 0.019 0.107 9.545 15.45 10.46 19.04 1.676 6.560 HMM 1 5 6 8 4 2 VocaListener 1 2 http://staff.aist.go.jp/t.nakano/vocalistener/index-j.html

3863 VocaListener C++ GUI Visual Studio 2005 GUI 9 3 9 A F 0 9 B 9 C wav 8 Fig. 8 The estimated parameters and synthesized results. Web CV01 CV02 5. VocaListener 3 VocaListener 3 5.1 5.1.1 1 1 9 D Vocaloid/Vocaloid 2 2010 12 5.1.2 2 F 0 A C E F 0 1

3864 VocaListener 5.2 Vocaloid2 Score Editor 10) 2 i) F 0 ii) 9 VocaListener Fig. 9 An example VocaListener screen. A 5.1.3 3 B 5.1.4 4 3 6. VocaListener VocaListener 1 VocaListener

3865 VocaListener 1 30),31) VocaListener 32) VocaListener-plus VocaListener-plus HMM singing-to-singing 1 CrestMuse CV01 CV02 RWC RWC-MDB-P-2001 1) Cabinet Office, Government of Japan: Virtual Idol, Highlighting JAPAN through images, Vol.2, No.11, pp.24 25 (2009), available from http://www.gov-online.go.jp/pdf/hlj img/vol 0020et/24-25.pdf. 2) Vol.25, No.1, pp.157 167 (2010). 3) 2009 pp.118 124 (2009). 4) Depalle, P., Garcia, G. and Rodet, X.: A virtual castrato, Proc. International Computer Music Conference (ICMC 94 ), pp.357 360 (1994). 5) Cook, P.R.: Identification of Control Parameters in An Articulatory Vocal Tract Model, with Applications to the Synthesis of Singing, Ph.D. Thesis, Stanford Univ. (1991). 6) Cook, P.R.: Singing Voice Synthesis: History, Current Work, and Future Directions, Computer Music Journal, Vol.20, No.3, pp.38 46 (1996). 7) Sundberg, J.: The KTH Synthesis of Singing, Advances in Cognitive Psychology, Special issue on Music Performance, Vol.2, pp.131 143 (2006). 8) CyberSingers 99-SLP-25-8 Vol.99, No.14, pp.35 40 (1998). 9) Bonada, J. and Xavier, S.: Synthesis of the Singing Voice by Performance Sampling and Spectral Models, IEEE Signal Processing Magazine, Vol.24, No.2, pp.67 79 (2007).

3866 VocaListener 10) Kenmochi, H. and Ohshita, H.: VOCALOID Commercial Singing Synthesizer based on Sample Concatenation, Proc. 8th Annual Conference of the International Speech Communication Association (INTERSPEECH 2007 ), pp.4011 4010 (2007). 11) Vol.45, No.7, pp.719 727 (2004). 12) Saitou, T., Goto, M., Unoki, M. and Akagi, M.: Speech-To-Singing Synthesis: Converting Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices, Proc. 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2007 ), pp.215 218 (2007). 13) Fukayama, S., Nakatsuma, K., Sako, S., Nishimoto, T. and Sagayama, S.: Automatic Song Composition from the Lyrics Exploiting Prosody of the Japanese Language, Proc. 7th Sound and Music Computing Conference (SMC2010 ), pp.299 302 (2010). 14) 2008-MUS-74-6 Vol.2008, No.12, pp.33 38 (2008). 15) STRAIGHT Vol.43, No.2, pp.208 219 (2002). 16) Janer, J., Bonada, J. and Blaauw, M.: Performance-driven Control for Sample- Based Singing Voice Synthesis, Proc. 9th Int. Conference on Digital Audio Effects (DAFx-06 ), pp.41 44 (2006). 17) VOCALOID 2008-MUS-74-9 Vol.2008, No.12, pp.51 58 (2008). 18) Camacho, A.: SWIPE: A Sawtooth Waveform Inspired Pitch Estimator for Speech And Music, Ph.D. Thesis, University of Florida (2007). 19) MeCab: Yet Another Part-of-Speech and Morphological Analyzer http://mecab.sourceforge.net/. 20) 2002 2001-SLP-48-1 Vol.2003, No.48, pp.1 6 (2003). 21) Digalakis, V. and Neumeyer, L.: Speaker Adaptation Using Combined Transformation and Bayesian Methods, IEEE Trans. Speech and Audio Processing, Vol.4, No.4, pp.294 300 (1996). 22) Young, S., Evermann, G., Hain, T., Kershaw, D., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V. and Woodland, P.: The HTK Book (2002). 23) Vol.48, No.1, pp.227 236 (2007). 24) Saitou, T., Unoki, M. and Akagi, M.: Development of an F0 Control Model Based on F0 Dynamic Characteristics for Singing-Voice Synthesis, Speech Communication, Vol.46, pp.405 417 (2005). 25) Mori, H., Odagiri, W. and Kasuya, H.: F 0 Dynamics in Singing: Evidence from the Data of a Baritone Singer, IEICE Trans. Inf. & Syst., Vol.E87-D, No.5, pp.1068 1092 (2004). 26) Seashore, C.E.: A Musical Ornament, the Vibrato, Psychology of Music, pp.33 52, McGraw-Hill (1938). 27) STRAIGHT 2005 3-P-15 pp.269 270 (2005). 28) H 2006 109, pp.611 616 (2006). 29) RWC Vol.45, No.3, pp.728 738 (2004). 30) Toda, T., Black, A. and Tokuda, K.: Voice Conversion Based on Maximum Likelihood Estimation of Spectral Parameter Trajectory, IEEE Trans. Audio, Speech and Language Processing, Vol.15, No.8, pp.2222 2235 (2007). 31) STRAIGHT Vol.J91-D, No.4, pp.1082 1091 (2008). 32) Nakano, T., Ogata, J., Goto, M. and Hiraga, Y.: Analysis and Automatic Detection of Breath Sounds in Unaccompanied Singing Voice, Proc. 10th International Conference of Music Perception and Cognition (ICMPC 10 ), pp.387 390 (2008). ( 23 1 6 ) ( 23 9 12 ) 2003 2008 2006 2007 2007 2009 2010 2010

3867 VocaListener 1998 2001 IPA IT 25