力 出力 ÝÒ 源分離 f å 2 š ž 伸縮率 f g å ² f œå 1 ( F0) audio-to-audio 3 2 RNMF [2] DTW audio-to-audio [3] [4] MIDI 2.2 [5 10] Dannenberg [5] Verc

Similar documents
IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

IPSJ SIG Technical Report Vol.2014-MUS-104 No /8/27 F0 1,a) 1,b) 1,c) 2,d) (F0) F0 F0 Graphical User Interface (GUI) F0 1. [1] CD MIDI [2] [3,

OngaCREST [10] A 3. Latent Dirichlet Allocation: LDA [11] Songle [12] Pitman-Yor (VPYLM) [13] [14,15] n n n 3.1 [16 18] PreFEst [19] F

IPSJ SIG Technical Report Vol.2017-MUS-116 No /8/24 MachineDancing: 1,a) 1,b) 3 MachineDancing MachineDancing MachineDancing 1 MachineDan

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

(1) (2) 2. Eurydice Eurydice Eurydice 1) Eurydice 2) Eurydice 3) Eurydice Eurydice 2.2 Eurydice 1 hidden Markov model, HMM Viterbi [7] SMF forma

IPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1

音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

Vol. 43 No. 2 Feb. 2002,, MIDI A Probabilistic-model-based Quantization Method for Estimating the Position of Onset Time in a Score Masatoshi Hamanaka

情報処理学会インタラクション 2015 IPSJ Interaction INT /3/7 1,a) 1,b) 1,c) CD Robust PCA Subharmonic Summation MIREX2014 GUI GUI A Vocal Expression Ed

MCMC: Marov Chain Monte Carlo [20] 2. VAE-NMF DNN DNN F T X x t R F t = 1,..., T x t 2. 1 Generative Adversarial Networ: GAN [21,22] GAN z t R D x t z

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

動画コンテンツ 動画 1 動画 2 動画 3 生成中の映像 入力音楽 選択された素片 テンポによる伸縮 音楽的構造 A B B B B B A C C : 4) 6) Web Web 2 2 c 2009 Information Processing S


1. HNS [1] HNS HNS HNS [2] HNS [3] [4] [5] HNS 16ch SNR [6] 1 16ch 1 3 SNR [4] [5] 2. 2 HNS API HNS CS27-HNS [1] (SOA) [7] API Web 2

2 DS SS (SS+DS) Fig. 2 Separation algorithm for motorcycle sound by combining DS and SS (SS+DS). 3. [3] DS SS 2 SS+DS 1 1 B SS SS 4. NMF 4. 1 (NMF) Y

2 3, 4, [1] [2] [3]., [4], () [3], [5]. Mel Frequency Cepstral Coefficients (MFCC) [9] Logan [4] MFCC MFCC Flexer [10] Bogdanov2010 [3] [14],,,

音楽とOR(片寄)

WISS 2018 [2 4] [5,6] Query-by-Dancing Query-by- Dancing Cao [1] OpenPose 2 Ghias [7] Query by humming Chen [8] Query by rhythm Jang [9] Query-by-tapp

,,.,.,,.,.,.,.,,.,..,,,, i

pp d 2 * Hz Hz 3 10 db Wind-induced noise, Noise reduction, Microphone array, Beamforming 1

2. [2], [3], [4] [5] [6], [7], [8] Agnihotri [6] Xu [7] [8] [9] Nakamura [10] TRECVID (TREC Video Retrieval Evaluation) [11] TRECVID TRECVID Singing s

IPSJ SIG Technical Report Pitman-Yor 1 1 Pitman-Yor n-gram A proposal of the melody generation method using hierarchical pitman-yor language model Aki

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

IPSJ SIG Technical Report Vol.2015-MUS-106 No.10 Vol.2015-EC-35 No /3/2 BGM 1,4,a) ,4 BGM. BGM. BGM BGM. BGM. BGM. BGM. 1.,. YouTube 201

H(ω) = ( G H (ω)g(ω) ) 1 G H (ω) (6) 2 H 11 (ω) H 1N (ω) H(ω)= (2) H M1 (ω) H MN (ω) [ X(ω)= X 1 (ω) X 2 (ω) X N (ω) ] T (3)

ばらつき抑制のための確率最適制御

7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener VocaListener1 VocaListener1 Voca

IPSJ SIG Technical Report Vol.2019-MUS-123 No.23 Vol.2019-SLP-127 No /6/22 Bidirectional Gated Recurrent Units Singing Voice Synthesi

10_08.dvi

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

log F0 意識 しゃべり 葉の log F0 Fig. 1 1 An example of classification of substyles of rap. ' & 2. 4) m.o.v.e 5) motsu motsu (1) (2) (3) (4) (1) (2) mot

untitled

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

IPSJ SIG Technical Report Vol.2009-DPS-141 No.23 Vol.2009-GN-73 No.23 Vol.2009-EIP-46 No /11/27 t-room t-room 2 Development of

IPSJ SIG Technical Report Vol.2012-MUS-94 No.3 Vol.2012-SLP-90 No /2/ DTM 200 GUIN-Resonator: A system synthesizing voice with the styl

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

DT pdf


it-ken_open.key

[1] SBS [2] SBS Random Forests[3] Random Forests ii

sigmusdemo.dvi

音楽音響信号の音源分離と能動的音楽鑑賞への応用 Sound source separation for music audio signals and its application to active music listening 援にとどまらず 一種の創作支援と見ることもできる 例えば ドラム

DEIM Forum 2012 E Web Extracting Modification of Objec

untitled

ホットスポット 1 音リアクションイベント BIC GMM 2 3 BIC GMM HMM 10) SVM 11) 12) 13) Bayesian Information Criterion BIC 14) BIC M = M 1, M 2,,

¥ì¥·¥Ô¤Î¸À¸ì½èÍý¤Î¸½¾õ

1 (n = 52, 386) DL (n = 52, 386) DL DL [4] Dynamic Time Warping(DTW ) [5] Altmetrics Gunther [

No. 3 Oct The person to the left of the stool carried the traffic-cone towards the trash-can. α α β α α β α α β α Track2 Track3 Track1 Track0 1

( ) [1] [4] ( ) 2. [5] [6] Piano Tutor[7] [1], [2], [8], [9] Radiobaton[10] Two Finger Piano[11] Coloring-in Piano[12] ism[13] MIDI MIDI 1 Fig. 1 Syst

paper.dvi

Microsoft Word - toyoshima-deim2011.doc

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

2006 [3] Scratch Squeak PEN [4] PenFlowchart 2 3 PenFlowchart 4 PenFlowchart PEN xdncl PEN [5] PEN xdncl DNCL 1 1 [6] 1 PEN Fig. 1 The PEN

2.2 (a) = 1, M = 9, p i 1 = p i = p i+1 = 0 (b) = 1, M = 9, p i 1 = 0, p i = 1, p i+1 = 1 1: M 2 M 2 w i [j] w i [j] = 1 j= w i w i = (w i [ ],, w i [

i

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2011-MBL-57 No.27 Vol.2011-UBI-29 No /3/ A Consideration of Features for Fatigue Es

IPSJ-SLP

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

情報処理学会論文誌 Vol.56 No (Dec. 2015) 図 1 Web 上で公開されているオリジナル楽曲から それを多数の歌手が歌った歌声コンテン ツが派生し さらにマッシュアップ 重ね合わせ がなされて合唱が制作される過程の 概要 Fig. 1 Relations

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-


258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

Convolutional Neural Network A Graduation Thesis of College of Engineering, Chubu University Investigation of feature extraction by Convolution

johnny-paper2nd.dvi

untitled

untitled

Songrium: 多様な関係性に基づく音楽視聴支援サービス

IS1-09 第 回画像センシングシンポジウム, 横浜,14 年 6 月 2 Hough Forest Hough Forest[6] Random Forest( [5]) Random Forest Hough Forest Hough Forest 2.1 Hough Forest 1 2.2

1 1 CodeDrummer CodeMusician CodeDrummer Fig. 1 Overview of proposal system c

IPSJ-JNL

2013 M

EQUIVALENT TRANSFORMATION TECHNIQUE FOR ISLANDING DETECTION METHODS OF SYNCHRONOUS GENERATOR -REACTIVE POWER PERTURBATION METHODS USING AVR OR SVC- Ju

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

IPSJ SIG Technical Report 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai,


IPSJ SIG Technical Report Vol.2014-DBS-159 No.6 Vol.2014-IFAT-115 No /8/1 1,a) 1 1 1,, 1. ([1]) ([2], [3]) A B 1 ([4]) 1 Graduate School of Info

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

[2][3][4][5] 4 ( 1 ) ( 2 ) ( 3 ) ( 4 ) 2. Shiratori [2] Shiratori [3] [4] GP [5] [6] [7] [8][9] Kinect Choi [10] 3. 1 c 2016 Information Processing So

3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)

š š o š» p š î å ³å š š n š š š» š» š ½Ò š ˆ l ˆ š p î å ³å š î å» ³ ì š š î å š o š š ½ ñ š å š š n n å š» š m ³ n š

Optical Flow t t + δt 1 Motion Field 3 3 1) 2) 3) Lucas-Kanade 4) 1 t (x, y) I(x, y, t)

Ÿ ( ) Ÿ ,195,027 9,195,027 9,195, ,000 25, ,000 30,000 9,000,000 9,000, ,789, ,000 2,039,145 3,850,511 2,405,371

Run-Based Trieから構成される 決定木の枝刈り法

JAPAN MARKETING JOURNAL 122 Vol.31 No.22011

21 Pitman-Yor Pitman- Yor [7] n -gram W w n-gram G Pitman-Yor P Y (d, θ, G 0 ) (1) G P Y (d, θ, G 0 ) (1) Pitman-Yor d, θ, G 0 d 0 d 1 θ Pitman-Yor G

2014/1 Vol. J97 D No. 1 2 [2] [3] 1 (a) paper (a) (b) (c) 1 Fig. 1 Issues in coordinating translation services. (b) feast feast feast (c) Kran

MA3-1 30th Fuzzy System Symposium (Kochi, September 1-3, 2014) Analysis of Comfort Given to Human by Using Sound Generation System Based on Netowork o

Vol. 48 No. 3 Mar Evaluation of Music-noise Assimilation Playback for Portable Audio Players Akifumi Inoue, Shohei Bise, Satoshi Ichimura and

IPSJ SIG Technical Report Vol.2017-MUS-115 No /6/17 1,a) 1 1 WORLD F0 Vocaloid F0 ipad 1. Vocaloid [1] UTAU *1 Vocaloid Vocaloid F0 VocaListene

スライド 1

1 7.35% 74.0% linefeed point c 200 Information Processing Society of Japan

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

TCX γ 0.9,, H / H, [4], 3. 3., ( /(,,,,,,, Mel Log Spectrum Approximation (MLSA [5],, [6], [7].,,,,,,, (,,, 3.,,,,,,,, sinc,,, [8], W, ( Y ij Y ij W l

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

COM COM 4) 5) COM COM 3 4) 5) COM COM 6) 7) 10) COM Bonanza 6) Bonanza Hearts COM 7) 10) Hearts 3 2,000 4,000

(255) Vol. 19 No. 4 July (completion) tcsh bash UNIX Emacs/Mule 2 ( ) [2] [9] [11] 2 (speech completion) 3 ( ) [7] 2 ( 7.1 )

Transcription:

1,a) 1,b) 1,c) 1,d) 2,e) (MIDI ) audio-to-audio (RNMF) (DTW) DTW 1., (MIDI ) MIDI CD 2 1 1 MIDI CGM (Consumer Generated Music) Web Songrium [1] 2007 7 120 Web 1 2 / AIP a) wada@sap.ist.i.kyoto-u.ac.jp b) yoshiaki@sap.ist.i.kyoto-u.ac.jp c) enakamura@sap.ist.i.kyoto-u.ac.jp d) itoyama@sap.ist.i.kyoto-u.ac.jp e) yoshii@sap.ist.i.kyoto-u.ac.jp 1 å Ÿš žÿžœ v² f q œ œ 通して歌唱を 力 F0 c 2017 Information Processing Society of Japan 1

力 出力 ÝÒ 源分離 f å 2 š ž 伸縮率 f g å ² f œå 1 ( F0) 3 1 2 audio-to-audio 3 2 RNMF [2] DTW audio-to-audio 2. 2.1 [3] [4] MIDI 2.2 [5 10] Dannenberg [5] Vercoe [6] Raphael [7] (HMM) Cont [8] HMM (HSMM) [9] Montecchio [10] 2.3 [11 15]Gong [11] HSMM [12] Iskandar [13] Wang [14] c 2017 Information Processing Society of Japan 2

4 1 2 3 ñ{ Á œž œ 7#3/.' Ÿ ½s å g œ½s f 5 6 7 3 Dzhambazov [15] (MFCC) HMM 2.4 - [16 19] Huang [16] (RPCA) [17] RPCA F0 Rafii [18] Yang [19] [20] 3. audio-to-audio 3.1 3 4 VB-RNMF (1) (2) (3) (4) (5) F0 (6) (7) 3 2, 4, 5 3 3 3 4 3 5 F0 3.2 3 2 3 3 3.3 NMF (VB-RNMF) [2] c 2017 Information Processing Society of Japan 3

[16 19] 4 VB-RNMF VB-RNMF VB-RNMF 1 Y = [y 1,..., y T ] L = [l 1,..., l T ] S = [s 1,..., s T ] y t l t + s t (1) L 2 K W = [w 1,..., w K ] H = [h 1,..., h T ] y t Wh t + s t (2) Kullback-Leibler (KL) (P ) KL 3 p(y W, H, S) = ( ) P y ft w fk h kt + s ft f,t k (3) (G ) 45 p(w α wh, β wh ) = f,k G(w fk α wh, β wh ) (4) p(h α wh, β wh ) = k,t G(h kt α wh, β wh ) (5) α wh β wh Jeffreys 67 p(s α s, β s ) = f,t G(s ft α s, β s ft), (6) p(β s ft) (β s ft ) 1. (7) α s (3) (7) WH S の唱 $ 歌た '. れ さ ' 離分 åp 5 p å Ït Ÿ œÿ œå '.'$$ œå Ït DTW 6 T = 8 c = 4, MaxRunCount = 4 DTW 3.4 audio-to-audio 5 DTW [21] DTW F0 MFCC 2 (F0) (MFCC) F0 MFCC F0 Subharmonic Summation [22] audio-toaudio X = {x 1,... x T } Y = {y 1,... y T } F0 MFCC F0 f X = {f (x) 1,..., f (x) T } MFCC m X = {m (x) 1,..., m(x) } T c 2017 Information Processing Society of Japan 4

Algorithm 1 DTW t 0, j 0 (t, j) while t < T, j < T do if GetInc(t, j) Column then t t + 1 for k = j c + 1,..., j do if k > 0 then (8) d t,k end for if GetInc(t, j) Row then j j + 1 for k = t c + 1,..., t do if k > 0 then (8) d k,j end for if GetInc(t, j) == previous then runcount runcount +1 else runcount 1 if GetInc(t, j) Both then previous GetInc(t, j) (t, j) end while Algorithm 2 FUNCTION GetInc (t, j) if t < c then return Both if runcount < MaxRunCount then if previous == Row then return Column else return Row (x, y) = arg min(d(k, l)), where k == t or l == j if x < t then return Row else if y < j then return Column else return Both F0 f Y = {f (y) 1,..., f (y) T } MFCC m Y = {m (y) 1,..., m(y) T } MFCC 12 F0 MFCC X = {x i }T i=1 = {f (x) i, m (x) i } T i=1 Y = {y i }T (y) i=1 = {f i, m (y) i } T i=1 13 DTW DTW 6 DTW 6 DTW 1 D = {d i,j }(i = 1,..., T ; j = 1,..., T ) 1 GetInc 2 1 (t, j) c (t, j) c GetInc runcount MaxRunCount T = 300, c = 4, MaxRunCount = 3 1 8 d i,j = x i y j + min(d i,j 1, d i 1,j, d i 1,j 1 ) (8) 8 x i y j x i y j x i y j = 13 k=1 (x ik y jk )2 DTW L = {(i 1, j 1 ),..., (i l, j l )}(0 i k i k+1 T, 0 j k j k+1 T ) (i k, j k ) DTW X Y x i k y j k 3.5 DTW L R = {r 1,..., r T } k r k 9 r k = {i 1,..., i l } k {j 1,..., j l } k (9) r k c 2017 Information Processing Society of Japan 5

(1) (2) 1 2 3 4 1, r R = {r 1,..., r T } r [23] 4. 4 (CM )(ASIAN KUNG-FU GENERATION) () ( ) 4 (1) (2) 2 1. 2. 3. 4. 4 2 1 F0 5. VB-RNMF DTW audio-to-audio audio-to-audio JSPS 26700020, 24220006, 26280089, 15K16654, 16H01744, 16J05486 JST AC- CEL No. JPMJAC1602 [1] Hamasaki, M. et al.: Songrium: Browsing and Listening Environment for Music Content Creation Community, Proc. SMC, pp. 23 30 (2015). [2] Bando, Y. et al.: Variational Bayesian Multi-channel Robust NMF for Human-voice Enhancement with a Deformable and Partially-occluded Microphone Array, Proc. EUSIPCO, pp. 1018 1022 (2016). [3] Tachibana, H. et al.: A Real-time Audio-to-audio Karaoke Generation System for Monaural Recordings Based on Singing Voice Suppression and Key Conversion Techniques, J. IPSJ, Vol. 24, No. 3, pp. 470 482 (2016). [4] Inoue, W. et al.: Adaptive Karaoke System: Human Singing Accompaniment Based on Speech Recognition, Proc. ICMC, pp. 70 77 (1994). [5] Dannenberg, R. B.: An On-Line Algorithm for Real- Time Accompaniment, Proc. ICMC, pp. 193 198 (1984). [6] Vercoe, B.: The Synthetic Performer in The Context of Live Performance, Proc. ICMC, pp. 199 200 (1984). [7] Raphael, C.: Automatic Segmentation of Acoustic Musical Signals Using Hidden Markov Models, IEEE Trans. on PAMI, Vol. 21, No. 4, pp. 360 370 (1999). [8] Cont, A.: A Coupled Duration-focused Architecture for Realtime Music to Score Alignment, IEEE Trans. on PAMI, Vol. 32, No. 6, pp. 974 987 (2010). [9] Nakamura, T. et al.: Real-Time Audio-to-Score Alignment of Music Performances Containing Errors and Arbitrary Repeats and Skips, IEEE/ACM TASLP, Vol. 24, No. 2, pp. 329 339 (2016). [10] Montecchio, N. et al.: A Unified Approach to Real Time Audio-to-score and Audio-to-Audio Alignment Using Sequential Montecarlo Inference Techniques, Proc. ICASSP (2011). [11] Gong, R. et al.: Real-time Audio-to-Score Alignment of Singing Voice Based on Melody and Lyric Information, Proc. Interspeech (2015). [12] Fujihara, H. et al.: LyricSynchronizer: Automatic Synchronization System between Musical Audio Signals and Lyrics, Proc. IEEE Journal of Selected Topics in Signal c 2017 Information Processing Society of Japan 6

Processing Conference, pp. 1252 1261 (2011). [13] Iskandar, D. et al.: Syllabic Level Automatic Synchronization of Music Signals and Text Lyrics, Proc. ACMMM, pp. 659 662 (2006). [14] Wang, Y. et al.: LyricAlly: Automatic Synchronization of Textual Lyrics to Acoustic Music Signals, IEEE TASLP, Vol. 16, No. 2, pp. 338 349 (2008). [15] Dzhambazov, G. et al.: Modeling of Phoneme Durations for Alignment between Polyphonic Audio and Lyrics, Proc. SMC, pp. 281 286 (2015). [16] Huang, P.-S. et al.: Singing-Voice Separation from Monaural Recordings Using Robust Principal Component Analysis, Proc. IEEE ICASSP, pp. 57 60 (2012). [17] Ikemiya, Y. et al.: Singing Voice Separation and Vocal F0 Estimation Based on Mutual Combination of Robust Principal Component Analysis and Subharmonic Summation, IEEE/ACM TASLP, Vol. 24, No. 11, pp. 2084 2095 (2016). [18] Rafii, Z. et al.: Music/Voice Separation Using The Similarity Matrix, Proc. ISMIR, pp. 583 588 (2012). [19] Yang, P.-K. et al.: Bayesian Singing-Voice Separation, Proc. ISMIR, pp. 507 512 (2014). [20] Huang, P.-S. et al.: Singing-Voice Separation from Monaural Recordings Using Deep Recurrent Neural Networks, Proc. ISMIR, pp. 477 482 (2014). [21] Dixon, S.: An On-Line Time Warping Algorithm for Tracking Musical Performances, Proc. the 19th IJCAI, pp. 1727 1728 (2005). [22] Hermes, D. J.: Measurement of Pitch by Subharmonic Summation, J. ASA, Vol. 83, No. 1, pp. 257 264 (1988). [23] Flanagan, J. et al.: Phase Vocoder, Bell System Technical Journal, Vol. 45, pp. 1493 1509 (1966). c 2017 Information Processing Society of Japan 7