JAIST Reposi https://dspace.j Title Voice-to-MIDIのためのメロディリズムタップを 用 いた 音 数 音 高 の 判 定 手 法 の 提 案 Author(s) 伊 藤, 直 樹 ; 西 本, 一 志 Citation 電 子 情 報 通 信 学 会 論 文 誌 D, J96-D(4): 965-977 Issue Date 2013-04-01 Type Journal Article Text version publisher URL Rights http://hdl.handle.net/10119/11576 Copyright (C)2013 IEICE. 伊 藤 直 樹, 西 本 一 志, 情 報 通 信 学 会 論 文 誌 D, J96-D(4), 2013, 965-977 http://www.ieice.org/jpn/trans_onlin Description Japan Advanced Institute of Science and
Voice-to-MIDI A Method of Note Counting and Pitch Extraction by Using Melody Rhythm Taps for Voice-to-MIDI System Naoki ITOU and Kazushi NISHIMOTO MIDI Voice-to-MIDI 1 MIDI Voice-to-MIDI 3 1. MIDI Musical Instrument Digital Interface [1] [3] Voice-to-MIDI VtoM VtoM School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1 1 Asahidai, Nomi-shi, 923 1292 Japan Research Center for Innovative Lifestyle Design, Japan Advanced Institute of Science and Technology, 1 1 Asahidai, Nomi-shi, 923 1292 Japan VtoM VtoM VtoM 1 2 1 3 F0 F0 4 F0 5 1 23 F0 F0 (2) D Vol. J96 D No. 4 pp. 965 977 c 2013 965
2013/4 Vol. J96 D No. 4 (3) F0 (4) 1 1 VtoM [4], [5] [6] [6] VtoM Voice-to-MIDI 1 1 Voice-to-MIDI TVM VtoM 2. 3. 4. 5. 6. 2. VtoM [7] [10] VtoM Query By Humming QBH [11] [14] F0 [15], [16] VtoM [17] Step Entry [18] 3. Voice-to-MIDI Voice-to-MIDI 966
Voice-to-MIDI 1 Fig. 1 Score of Aka tombo : compositon by Kosaku Yamada, lyric by Rofu Miki. 3 Fig. 3 Samples of segmentation mistake with extra notes. 2 1 1 Fig. 2 Samples of segmentation mistake with note binding and divorcing. 3. 1 VtoM VtoM VtoM [19] 1 2 2 1 1 3 VtoM 1 3. 2 VtoM TVM TVM F0 PC 967
2013/4 Vol. J96 D No. 4 3. 3 TVM D2-F5 A4 = 440 Hz MIDI 22050 Hz 16 bit MIDI PC PC 2 1 2 1 1 F0 2 1 F0 3 1 F0 F0 (STFT=2048 samples 100 ms =128 samples 6ms) D2-F5 IFFT [20] cent F0 PC Keypress 1024 sample 50 ms Keypress 1024 sample 3. 4 2 1 4 1 4 2 1 2 F0 [21], [22] D2-F5 6ms 3 1 2 3 200 ms 2 1 F0 F0 FFT 90% 4 2 Fig. 4 2 types of tapping manner. 968
Voice-to-MIDI / PC 1 3. 5 A4 = 440 Hz D2 F5 [23] FFT 6ms 16 6ms BPM=2500 BPM=250 4. 4. 1 TVM 3. 5 TVM 1 2 3 2 4 1 2 5. 1 5. 3 3 5. 4 TVM 4 5. 5 3 4. 2 2 1 2 1 2 BPM=120 4. 3 VtoM VtoM 3 1 CMP 2 RYN [10] 3 BP2 [25] CMP 969
2013/4 Vol. J96 D No. 4 F0 TVM [24] 50 cent 70 ms 16 BPM=213 RYN [10] Linux MIDI [9] Ryynanen Accent Signal FFT BP2 KAWAI: Band Producer 2 4. 4 TVM HP: 2710p PC Shure: SM87A 2 PC PC1 BP2 BP2 MIDI Wave BP2 PC2 2710p PCTVM PC1 BP2 MIDI TVM PC1 TVM PC2 PC1 PC1 PC2 CMP RYN BP2 Adobe: Audition 1.0 4. 5 8 1 TVM 1 2 3 2 6 9 4 5 1 1 VtoM 4. 6 1 VtoM 970
Voice-to-MIDI 1 1 3 Table 1 Results of pre-test and experiences of musical performing for each subject. 1 2 2 3 A 6 0 1 5 B 3 0 0 2 C 6 1 0 5 D 3 1 0 6 E 0 1 0 6 1 F 5 0 0 5 2 3 G 6 0 0 6 2 H 6 0 4 6 3 5 I 6 5 1 6 10 1 A D 22 Table 2 2 Singing conditions for each song. A BPM = 120 B 5 1 31 1 3 1 2 BPM=120 3 2 3 BPM=120 1 1 3 3 Table 3 List of subject own-selected songs. A Mr. Children Over B C 11 3 D E Acid Black Cherry F G 1 H SMAP I 4. 7 BP2 1 1 1 2 Adobe: Audition1.0 Ensoniq: MR-76 3 1 2 4 4 1 1 3 971
2013/4 Vol. J96 D No. 4 1 2 2 3 2 3 2 3 1 1 1 2 1 2 3 4 F0 1 4 Table 4 Categories for melody extracts. 1 1 31 1 3 = ++ F 1 % = / 100 2 % = / + + 100 3 F = 2 /+ 5. 5. 1 3 93 5 TVM F 5 100% F0 CMP RYN BP2 TVM CMP 95 58 RYN 42 23 4 972
Voice-to-MIDI 5 [ ] Table 5 Results of Aka tombo : [sung with own tempo, lyrics and taps]. 1 * 6 2 F0 6 3 4 6 6 [ BPM=120 ] Table 6 Results of Aka tombo : [sung with BPM=120, lyrics and taps]. CMP RYN BP2 3 3 4 TVM 5. 2 BPM = 120 BPM = 120 3 93 6 RYN E TVM E 3 973
2013/4 Vol. J96 D No. 4 3 1 TVM F TVM 2 100% 5. 3 7 7 TVM 3 F TVM A, E, F 1 A, F TVM F TVM TVM A E F 5. 4 TVM A D F I 2 TVM E 98.7% 98.7% 99.7% 99.7% t 100% 5 7 [ ] Table 7 Results of self-selected songs: [sung with own tempo, lyrics and taps]. 1 * 2 F0 3 4 974
Voice-to-MIDI 8 F Table 8 Differences of the addition of tapping in the total values of recall, precision and F-value of Aka tombo (sung with own tempo). CMP RYN BP2 85.6 85.4 87.2 88.9 92.7 94.7 84.0 83.3 79.5 75.4 88.7 86.4 F 84.8 84.3 83.2 81.6 90.6 90.4 % 9 F BPM=120 Table 9 Differences of the addition of tapping in the total values of recall, precision and F-value of Aka tombo (sung with BPM=120). CMP RYN BP2 83.8 86.1 87.5 81.7 78.5 79.0 86.9 86.7 84.7 77.1 92.1 92.1 F 85.3 86.4 86.1 79.3 84.8 85.0 % BPM=120 97.8% 96.6% 98.1% 98.1% t 100% 2 1 TVM 1 1 4 1 12 BPM=120 5. 5 TVM 3 3 8 836 830 CMP, RYN, BP2 9 BPM=120 837 835 BP2 CMP RYN 87.5% 84.7% 81.7% 77.1% E 35 11 BPM=120 5. 6 TVM 975
2013/4 Vol. J96 D No. 4 [22] [22] F0 65.9% 70.2% 24.0% 36.8% TVM 6. Voice-to-MIDI VtoM MIDI Note No. Voice-to-MIDI (Voice-to-MusicalExpression) [10] Matti Ryynanen Anssi Klapuri [1] YAMAHA XGworks ST2003. [2] INTERNET SingerSongWriter Lite6.0, 2008. [3] MakeMusic Inc., Finale2010, USA, 2009. [4] pp.109 118, 2005. [5] pp.20 21, 2003. [6] C. Oshima, N. Itou, K. Nishimoto, N. Hosoi, K. Yasuda, and K. Nakayama, An accompaniment system for healing emotions of patients with dementia who repeat stereotypical utterances, Proc. 9th Int l. Conf. Smart Homes and Health Telematics, 2011. [7] vol.20, no.10, pp.68 73, 1984. [8] C.C. Toh, B. Zhang, and Y. Wang, Multiple-feature fusion based onset detection for solo singing voice, Proc. ISMIR 2008, 2008. [9] M. Ryynanen and A. Klapuri, Modelling of note events for singing transcription, Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio, 2004. [10] M. Ryynanen and A. Klapuri, Automatic transcription of melody, bass line, and chords in polyphonic music, Computer Music Journal, vol.32, no.3, pp.73 86, 2008. [11] T. Kageyama, K. Mochizuki, and Y. Takashima, Melody retrieval with humming, Proc. ICMC 1993, pp.349 351, 1993. [12] A. Ghias, J. Logan, D. Chamberlin, and B.C. Smith, Query by humming: Musical information retrieval in an audio database, Proc. ACM Multimedia 95, San Francisco, California, Nov. 1995. [13] L. Prechelt and R. Typke, An interface for melody input, ACM Trans. Computer-Human Interaction (TOCHI), vol.8, no.2, pp.133 149, 2001. [14] N. Kosugi, Y. Nishihara, T. Sakata, M. Yamamuro, and K. Kushima, A practical query-by-humming system for a large music database, Proc. 8th ACM Intl. Conf. Multimedia, pp.333 342, Marina del Rey, California, 2000. [15] SLP-47, pp.71 76, 2003. [16] 2006 1-2-23, 2006. [17] Wildcat Canyon Software Inc., Autoscore 2.0, 1999. [18] MUS-34, pp.21 26, 1999. [19] p.68, 1994. [20] pp.718 723, 1983. [21] MIDI 2step 2006-EC-5, vol.2006, pp.43 48, 2006. [22] N. Itou and K. Nishimoto, A voice-to-midi system 976
Voice-to-MIDI for singing melodies with lyrics, Proc. Intl. Conf. ACE 07, pp.183 189, Salzburg, Austria, 2007. [23] p.439, 2004. [24] vol.23, no.5, pp.95 100, 2004. [25] Band Producer 2, 2008. 24 7 14 10 25 2011 ICOST2011 Best Multi-Disciplinary Paper Award GLOBAL HEALTH 2012 Best Paper Award 1987 1992 ATR 1995 ATR 1999 2007 2000 2003 21 1999 1999 ACM Multimedia 2004 Best Paper Award ICOST2011 Best Multi-Disciplinary Paper Award GLOBAL HEALTH 2012 Best Paper Award IEEE computer society ACM 977