STRAIGHT_Tutorial_Tianjin2016.key

Size: px

Start display at page:

Download "STRAIGHT_Tutorial_Tianjin2016.key"

ようじろうこやぎ
5 years ago
Views:

1 Lecture 1 Introduction to speech signal processing in STRAIGHT vocoder Hideki Kawahara Emeritus Professor: Wakayama University, Japan Tianjin University, China, 9 December, 2016

2 Collaborators Roy D. Patterson Masanori Morise Hideki Banno Toshio Irino Ryuichi Nisimura Verena G. Skuk Stefan Schweinberger Parham Zolfaghari Ken-Ichi Sakakibara Ikuyo Masuda-Katsuse Alain de Cheveigne Josh McDermott Osamu Fujimura Toru Takahashi Tomoki Toda and many others. 2

3 Matlab

4 Link to STRAIGHT resources TANDEM- STRAIGHT and morphing Username: Tianjin-Lecture Password: STRAIGHT (Valid on 9 December, 2016) Link 4

5 Link to STRAIGHT resources legacy- STRAIGHT Username: Tianjin-Lecture Password: STRAIGHT (Valid on 9 December, 2016) Link 5

6 Lecture 1 Introduction to speech signal processing in STRAIGHT vocoder Hideki Kawahara Emeritus Professor: Wakayama University, Japan Tianjin University, China, 9 December, 2016

7 Topic Application STRAIGHT Background 7

8 Summary Application STRAIGHT Background Interference-free representations play important roles Periodic excitation is an efficient and robust strategy for sampling and transmitting relevant information for communications using voice STRAIGHT is a collection of functions and applications Extended morphing provides a unique research strategy useful for para- and non-linguistic aspects of speech 8

9 Topic Application STRAIGHT Background 9

10 Interference Vocal tract SHAPE information is mixed with interfering structure caused by repetitive structure in voiced sounds Linear predictive analysis still suffers from estimation bias caused by repetitive structure in voiced sounds Interference-free representations 10

11 Visualization: spectrogram S(ω,t) = w(τ t)s(τ )e jωτ dτ 2 wide-band narrow-band 11

12 Movie 12

13 Visualization: spectrogram S(ω,t) = w(τ t)s(τ )e jωτ dτ 2 wide-band narrow-band 13

14 Matlab

15 Matlab

16 Matlab

19 Interference Vocal tract SHAPE information is mixed with interfering structure caused by repetitive structure in voiced sounds Linear predictive analysis still suffers from estimation bias caused by repetitive structure in voiced sounds Interference-free representations 19

20 In case of speech repetition Articulator Voicing organ filter transfer function mixing nightmare of signal processing voice source fundamental frequency source wave

21 Topic Application STRAIGHT Background 21

22 STRAIGHT is a VOCODER analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 22

23 STRAIGHT: legacy to TANDEM spectrum instantaneous frequency group delay 23

24 STRAIGHT: legacy to TANDEM spectrum instantaneous frequency group delay 24

25 Interference-free representation of power spectrum Complementary set of pitch synchronized time windows and spline-based spectral smoothing and inverse filtering legacy-straight [kawahara et.al. 1999] - original idea is by Kawahara (1997) F0-adaptive set of time windows separated a half pitch period and F0-adaptive smoothing followed by digital filter compensation based on consistent sampling TANDEM-STRAIGHT [kawahara et.al. 2008]???? 25

26 Interference: power spectrum DVD 26

27 Interference-free representation of power spectrum Complementary set of pitch synchronized time windows and spline-based spectral smoothing and inverse filtering legacy-straight [kawahara et.al. 1999] F0-adaptive set of time windows separated a half pitch period and F0-adaptive smoothing followed by digital filter compensation based on consistent sampling TANDEM-STRAIGHT [kawahara et.al. 2008] - original time window idea is by Morise et.al. (2007)???? 27

28 TANDEM-STRAIGHT: periodic pulse power spectrum Movie 28

29 29

30 30

31 31

32 TANDEM: principle signal model power spectrum arbitrary real numbers windowing function TANDEM spectrum temporally varying term fundamental period 32

33 log-power spectrum TANDEM-STRAIGHT: synthetic vowel /a/ Movie 33

34 Selection of window How to select windowing function averaged spectrum temporal variation normalized duration 34

35 temporal variation single window normalized duration (re. T0)

36 * Nuttall windows Nuttall, A. H. (1981). Some windows with very good sidelobe behavior. Acoustics, Speech and Signal Processing, IEEE Transactions on, 29(1), Hann Blackman Nuttall gain (db) frequency (Hz) 36

37 Nuttall #12 in Table II Note: nuttallwin in Matlab is different.

38 temporal variation TANDEM window normalized duration (re.t0)

39 temporal variation TANDEM window normalized duration (re.t0)

40 Interference-free representation of power spectrum Complementary set of pitch synchronized time windows and spline-based spectral smoothing and inverse filtering legacy-straight [kawahara et.al. 1999] F0-adaptive set of time windows separated a half pitch period and F0-adaptive smoothing followed by digital filter compensation based on consistent sampling TANDEM-STRAIGHT [kawahara et.al. 2008, 2011]???? 40

41 Consistent sampling simple filtering Consistent sampling: recovery only at sampled points [Unser 2000] Sampling theory: whole waveform recovery 41

42 Consistent sampling recovered spectrum smoothed spectrum smoothing function frequency domain representation 42

43 Consistent sampling 1 correlation 1 filter coefficient [Kawahara & Morise 2011b] 43

44 Implementation cepstrum representation truncated and of TANDEM spectrum adjusted coefficients lifter form of the rectangular frequency smoother [Kawahara & Morise 2011b] 44

45 Test example STRAIGHT spectrum TANDEM spectrum [Kawahara & Morise 2011b] 45

46 TANDEM-STRAIGHT: natural speech Movie 46

47 STRAIGHT is a VOCODER analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 47

48 STRAIGHT: legacy to TANDEM spectrum instantaneous frequency group delay 48

49 Other cause of interference Phase spectrogram and instantaneous frequency DVD 49

50 F0 extractors using instantaneous frequency Fundamental component selection using a constant-q filter bank and AM-FM magnitude legacy STRAIGHT [kawahara et.al.1999a] Fixed point of frequency to instantaneous frequency mapping [kawahara et.al.1999b] Refinement of initial estimates of F0s using instantaneous frequency Multi-source F0 extractor with intensive manual optimization of parameters [kawahara et.al. 2005] XSX: excitation source extractor based on interference-free representation of power spectra [kawahara et.al. 2008][Fujimura et.al. 2009] YANGsaf [Kawahara et.al., 2016] 50

51 Interference-free representation of instantaneous frequency Interferences in instantaneous frequency of periodic signals Interference-free representation of instantaneous frequency (animation) Derivation of Interference-free representation of instantaneous frequency 51

52 Movie 52

53 waveform and time windows time and frequency resolution Movie phase spectrogram viewer of target representation 53

54 Movie 54

55 Movie 55

56 Interference-free representation of instantaneous frequency Interferences in instantaneous frequency of periodic signals Interference-free representation of instantaneous frequency (animation) Derivation of Interference-free representation of instantaneous frequency 56

57 Movie 57

58 Movie 58

59 Interference-free representation of instantaneous frequency Interferences in instantaneous frequency of periodic signals Interference-free representation of instantaneous frequency (animation) Derivation of Interference-free representation of instantaneous frequency 59

60 Instantaneous frequency: problem Definition: Time derivative of phase where singularity 60

61 Flanagan s equation Derivation-1 No need of inverse function 61

62 Flanagan s equation Derivation-2 Simplification and notation = 62

63 Averaged instantaneous frequency Power weighted average Derivation-3 Note! Denominator is TANDEM spectrum 63

64 Averaged instantaneous frequency Power weighted average Derivation-4 Numerator: sum of each numerator 64

65 Numerators Derivation-5 Substitution and simplification Squared terms vanish 65

66 TANDEM trick Derivation-6 Independent on time This term should be eliminated = TANDEM trick 66

67 F0 extractors using instantaneous frequency Fundamental component selection using a constant-q filter bank and AM-FM magnitude legacy STRAIGHT [kawahara et.al.1999a] Fixed point of frequency to instantaneous frequency mapping [kawahara et.al.1999b] Refinement of initial estimates of F0s using instantaneous frequency NDF: Multi-source F0 extractor with intensive manual optimization of parameters [kawahara et.al. 2005] XSX: excitation source extractor based on interference-free representation of power spectra [kawahara et.al. 2008][Fujimura et.al. 2009] YANGsaf [Kawahara et.al., 2016] 67

68 Periodicity detection by spectral division Movie 68

69 TANDEM STRAIGHT F0 adaptive processing sp. division shaping

70 F0 adaptive processing Contradiction: no F0 information multiple hypothesis and integration

71 Multiple hypothesis and integration detector-1 signal detector-2 detector-3 detector-4 integration F0 salience blackman 2.5 npo:3 std: response detector-n normalized lag in semitone (re. T0)

72 Spectral division TANDEM spectrum: <- envelope and periodic structure F0-adaptive smoothed spectrum <- envelope TANDEM spectrum spectrum only with periodic component smoothed spectrum 72

73 Selecting base-band sp. division shaping 73

74 Integration of individual detectors shaping individual response shaped response integrated response 74

75 Integration of individual detectors shaped response integrated response 75

76 Integration of individual detectors Movie 76

77 Alternating amplitude Movie 77

78 Displacement of pulse timing Movie 78

79 Analysis of Noh voice Fujimura, O., Honda, K., Kawahara, H., Konparu, Y., Morise, M., & Williams, J. C. (2009). Noh voice quality. Logopedics Phoniatrics Vocology, 34(4),

80 F0 extractors using instantaneous frequency Fundamental component selection using a constant-q filter bank and AM-FM magnitude legacy STRAIGHT [kawahara et.al.1999a] Fixed point of frequency to instantaneous frequency mapping [kawahara et.al.1999b] Refinement of initial estimates of F0s using instantaneous frequency NDF: Multi-source F0 extractor with intensive manual optimization of parameters [kawahara et.al. 2005] XSX: excitation source extractor based on interference-free representation of power spectra [kawahara et.al. 2008][Fujimura et.al. 2009] YANGsaf:[Kawahara et.al., 2016] SSW9 80

81 Using instantaneous frequency and aperiodicity detection to estimate F0 for high-quality speech synthesis Hideki Kawahara, Yannis Agiomyrgiannakis, Heiga Zen Wakayama University, Japan ISCA SSW9, Sunnyvale, CA, USA, September, 2016

82 STRAIGHT: legacy to TANDEM spectrum instantaneous frequency group delay 82

83 STRAIGHT is a VOCODER analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 83

84 Non-periodic component Distance between lower and upper power spectrum envelope and calibration based on simulation [Kawahara et.al. 2001] Residuals of pitch scale linear prediction of frequency sub-bands and sigmoidal spectral modeling [Kawahara et.al. 2010a] Interference-free group delay representation for estimating deviation from periodicity [Kawahara et.al. 2014] Event extraction based on running kurtosis [Kawahara et.al. 2010b] the least successful component 84

85 EGG is not almighty EGG speech time (s) 85

86 glottal closure instance GCI no GCI gender:f talkerid:14 sentenceid:28 reldscrpncy:5.8 % differentiated EGG signal speech signal mismatch EGG signal close open time (s)

87 amount of mismatch relative mismatch Mismatch is not rare 10% 1% 840 utterances were tested (30 sentences 28 speakers) only 77 (in 840) utterances do not have mismatch record count sorted utterance ID

88 F0 modulation in Noh voice 6*F frequency (Hz) x 10 4 time (ms)

89 F0 modulation in Noh voice coupling 1:3 chaos? 1:2 1:3 1:2 6*F frequency (Hz) x time (ms)

90 F0 modulation power spectrum vibrato modulation relative rms level (db, semitone) modulation frequency (Hz) 90

91 Power spectrum of periodic impulse train using previous windows zeroes noise component develops MAVEBA'2001

92 Envelope calculation cepstrum original and smoothed spectrum upper and lower envelope lifter MAVEBA'2001

93 Multi resolution analysis: example estimated excitation speech waveform MAVEBA'2001

94 Non-periodic component Distance between lower and upper power spectrum envelope and calibration based on simulation [Kawahara et.al. 2001] Residuals of pitch scale linear prediction of frequency sub-bands and sigmoidal spectral modeling [Kawahara et.al. 2010a] Interference-free group delay representation for estimating deviation from periodicity [Kawahara et.al. 2014] Event extraction based on running kurtosis [Kawahara et.al. 2010b] the least successful component YANGsaf may be the answer 94

96 Broad-band colored noise turbulence: random boundary frequency slope converted pulse to noise ratio square root of pulse to noise ratio

97 Parameter estimation logit conversion weighted least square solution weight update

98 Fitting example

99 Non-periodic component Distance between lower and upper power spectrum envelope and calibration based on simulation [Kawahara et.al. 2001] Residuals of pitch scale linear prediction of frequency sub-bands and sigmoidal spectral modeling [Kawahara et.al. 2010a] Interference-free group delay representation for estimating deviation from periodicity [Kawahara et.al. 2014] Event extraction based on running kurtosis [Kawahara et.al. 2010b] the least successful component 99

100

101 Event detection Examples original natural speech synthesis without events synthesis with event detection

102 Event detection Examples original natural speech synthesis without events synthesis with event detection

103 Event detection Examples original natural speech synthesis without events synthesis with event detection

104 Outline Introduction: TANDEM-STRAIGHT Non-periodic component in speech sounds Wide-band noise Acoustic events Discussion

105 Outline Introduction: TANDEM-STRAIGHT Non-periodic component in speech sounds Wide-band noise Acoustic events Conclusion

106 Acoustic event detection strongly distorted distribution 4th moment 2nd moment kurtosis (non-negative) r =2, 4 implementation as filtering

107 Acoustic event detection local peak of running kurtosis closest centroid of 4th power of the windowed wave peak picking (initial estimate) practical solution event location adjustment theoretical solution filtered signal

108 Event detection example

109 Peak kurtosis distribution for 112 utterances highly non-gaussian

110 Non-periodic component Distance between lower and upper power spectrum envelope and calibration based on simulation [Kawahara et.al. 2001] Residuals of pitch scale linear prediction of frequency sub-bands and sigmoidal spectral modeling [Kawahara et.al. 2010a] Interference-free group delay representation for estimating deviation from periodicity [Kawahara et.al. 2014] Event extraction based on running kurtosis [Kawahara et.al. 2010b] the least successful component 110

111 Other cause of interference Phase spectrogram and group delay DVD 111

112 Movie

113 113 Movie

114 Movie

115 group delay Flanagan-like equation di [log(x)] τ g = dω [ ] 1 dx = I X dω = R[X]I [ dx dω ] I[X]R [ dx dω ] X 2, 115

116 Flanagan-like equation X(ω,t)= X d (ω,t)= dx(ω,t) dω = = j w(τ)x(τ t)e jωτ dτ w(τ)x(τ t) de jωτ dω dτ τw(τ)x(τ t)e jωτ dτ, τ g (ω,t)= R[X(ω,t)]I[X d(ω,t)] I[X(ω,t)]R[X d (ω,t)] X(ω,t) 2, 116

117 Interference model x(t) =δ ( t T 0 2 ) + αδ ( t + T 0 2 ) time window covers temporally repeating events with amplitude modification 117

118 ( ) ( ) Interference in power spectrum P (ω,t)= ( w t T 0 ) 2 2 ( +2αw t T α ) ( w t + T 0 ( w t + T ) ) 2 cos (2π ff0 ) periodic variation on the frequency axis 118

119 ( ) ( ) ( Interference in power spectrum P (ω,t)= ( w t T 0 ) 2 2 ( +2αw t T α ) ( w t + T 0 ( w t + T ) ) 2 cos (2π ff0 ) TANDEM trick cancels the interference P F (ω,t)= P ( ω ω 0 4,t ) + P ( ω + ω 0 4,t ) 2 119

120 Power weighted average τ ga (ω,t)= τ g1(ω,t) S 1 (ω,t) 2 + τ g2 (ω,t) S 2 (ω,t) 2 S 1 (ω,t) 2 + S 2 (ω,t) 2 P F (ω,t) Using T0/2 separation makes this sum interference-free 120

121 Numerators R[S(ω,t)]I[S d (ω,t)] = w d (t 1 )w(t 1 ) cos 2 ( ωt 1 ) α 2 w d (t 2 )w(t 2 ) cos 2 ( ωt 2 ) αw d (t 2 )w(t 1 ) cos( ωt 1 ) cos( ωt 2 ) αw d (t 1 )w(t 2 ) cos( ωt 1 ) cos( ωt 2 ) I[S(ω,t)]R[S d (ω,t)] = w d (t 1 )w(t 1 ) sin 2 ( ωt 1 ) + α 2 w d (t 2 )w(t 2 ) sin 2 ( ωt 2 ) + αw d (t 2 )w(t 1 ) sin( ωt 1 ) sin( ωt 2 ) + αw d (t 1 )w(t 2 ) sin( ωt 1 ) sin( ωt 2 ), 121

122 d d α 2 w d (t 2 )w(t 2 ) sin 2 ( ωt 2 ) Numerators: simplification + αw d (t 2 )w(t 1 ) sin( ωt 1 ) sin( ωt 2 ) + αw d (t 1 )w(t 2 ) sin( ωt 1 ) sin( ωt 2 ), sin 2 θ + cos 2 θ =1 sin cos A cos 2 θ + B cos + sin 2 θ =1 cos A sin B = cos(a A cos B B) + sin A sin B cos(a B) R[S(ω,t)]I[S d (ω,t)] I[S(ω,t)]R[S d (ω,t)] = (27) w d (t 1 )w(t 1 ) α 2 w d (t 2 )w(t 2 ) α (w d (t 1 )w(t 2 )+w d (t 2 )w(t 1 )) cos (2π ff0 ), (28) periodic component 122 P F (ω,t)

123 Interference-free group τ df (ω,t)= 1 P F (ω,t) delay [R[S(ω 1,t)]I[S d (ω 1,t)] I[S(ω 1,t)]R[S d (ω 1,t)] + R[S(ω 2,t)]I[S d (ω 2,t)] I[S(ω 2,t)]R[S d (ω 2,t)]] R I I R ω 1 = ω ω 0 4, ω 2 = ω + ω 0 where 4 Interference on the frequency axis is removed but 123

124 Interference-free group delay on the frequency axis 124

125 Interference-free group delay both in the time and frequency 125

126 waveform and time windows modulation window phase spectrogram power spectrum group delay Movie

127 Movie

128 Movie

129 Movie

130 Synthesis Minimum phase impulse response Pitch synchronized overlap and add of periodic and aperiodic components Mixed mode excitation and approximate time varying filter 130

131 STRAIGHT is a VOCODER analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 131

132 Synthesis: OLA analysis physical attributes synthesis spectral envelope analysis spectral envelope filter-1 + output signal input signal-1 F0 analysis F0 modification periodic pulse generator filter-2 signal parameter non-periodicity analysis nonperiodicity noise generator data process 132

133 Synthesis: approximate TVF analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 133

134 x[n] = FFT based convolution k x[k]δ 0,n k ( y[n] = M k=0 x[k]h[n k] cyclic version of signal with length N N 1 ( ) 2πjkn x[n] = X[k] exp N k=0 DFT coefficient N 1 ỹ[n] = h[n k] x[k] Y [k] =H[k]X[k] k=0 cyclic convolution 134

135 original signal x[n] = FFT based convolution k= s (k) [n] = k= x (k) 1 [n]w[n kl] constraint on w[n kl] =1 subdivision window k= subdivided signal implementation w 1 [n] =1,n=0, 1,...,K 1 ( ) w 2nπ 2 [n] = cos 2L K =2L +1 FFT length limit ( ) 135

136 Time varying filter in STRAIGHT x[n] = M k=0 x[k]h[n k; k] minimum phase impulse response h min (t) = 1 2π H min (ω)e jωt dt h min (t) =R [ 1 2π ] H min (ω)e jωt dt R[ln(H min (ω))] + ji[ln(h min (ω))] = c(q) = 1 2π ln(p (ω))e jωq dω c min (q)e jωq dq c min (q) = c(q) (q>0) c(0)/2 (q = 0) 0 (q <0) 136

137 Numerical examples 137

138 STRAIGHT spectrogram: section 138

139 Minimum phase responses 139

140 Time invariant and variant responses: spectral views rectangular subdivision raised cosine subdivision time invariant filter time varying filter 140

141 STRAIGHT is a VOCODER enabling flexible manipulation analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 141

142 STRAIGHT: summary STRAIGHT decomposes input speech into Interference-free spectrum Fundamental frequency: F0 Aperiodic component Virtually perfect removal of interferences Flexible manipulation of speech without introducing quality degradations STRAIGHT component procedures provide building blocks for various applications such as TTS systems STRAIGHT serves as a test-bed for new component algorithms 142

143 Topic Application STRAIGHT Background 143

144 Lecture 2 Hands on tutorial of generalized speech morphing based on STRAIGHT Hideki Kawahara Emeritus Professor: Wakayama University, Japan Tianjin University, China, 9 December, 2016

145 Topic Application STRAIGHT Background 145

146 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 146

147 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 147

148 STRAIGHT is a VOCODER enabling flexible manipulation analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal input signal-1 F0 analysis F0 modification periodic pulse generator shaper and mixer signal parameter non-periodicity analysis nonperiodicity non-periodic component generator data process 148

149 STRAIGHT GUIs Matlab APSIPA DL talk 149

150 Snapshot: F0 extraction Matlab customizable F0 extractor 150

151 Snapshot: modification Matlab duration size F0 amplitude 151

152 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 152

153 Manipulation by function calls Analysis functions Source analysis Fundamental frequency Aperiodic component Filter analysis Synthesis function Default OLA synthesis Optional: approximate time varying filter Optional: sinusoidal synthesis 153

154 Manipulation by function calls Analysis functions Source analysis Fundamental frequency Aperiodic component Filter analysis Synthesis function Default OLA synthesis Optional: approximate time varying filter Optional: sinusoidal synthesis 154

155 Fundamental frequency r = exf0candidateststraightgb(x,fs,paramsin) samplingfrequency: f0: [159x1 double] periodicitylevel: [159x1 double].. post processing rc = autof0tracking(r,x); rc.vuv = refinevoicingdecision(x,rc); 155

156 Aperiodicity parameter q = aperiodicityratiosigmoid(x,rc,sidemargin,exponent,displayon) samplingfrequency: f0: [159x1 double] vuv: [159x1 double] sigmoidparameter: [2x159 double]

157 Manipulation by function calls Analysis functions Source analysis Fundamental frequency Aperiodic component Filter analysis Synthesis function Default OLA synthesis Optional: approximate time varying filter Optional: sinusoidal synthesis 157

158 Filter analysis exspectrumtstraightgb(x,fs,sourceobj,paramsin) ElapsedTimeForSpectrum: temporalpositions: [1x159 double] spectrogramstraight: [1025x159 double] samplingfrequency: TANDEMSTRAIGHTconditions: [1x1 struct] spectrogramtandem: [1025x159 double] dateofspectrumestimation: 'DD-MM :00:58' 158

159 Manipulation by function calls Analysis functions Source analysis Fundamental frequency Aperiodic component Filter analysis Synthesis function Default OLA synthesis Optional: approximate time varying filter Optional: sinusoidal synthesis 159

160 Default OLA synthesis exgeneralstraightsynthesisr2(sourcestructure,filterstructure) synthesisout: [17106x1 double] samplingfrequency: elapsedtime: generalized framework generalstraightsynthesisframeworkr2(feedinghandle, responsehandle, deterministichandle, randomhandle, shifterhandle, datasubstrate,optionalparameters) 160

161 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 161

162 Modification by function calls Fundamental frequency manipulation (example) making fundamental frequency 1.2 times higher rm = r; rm.f0 = r.f0*1.2; s = exgeneralstraightsynthesisr2(rm,f); making fundamental frequency 50 Hz higher rm = r; rm.f0 = r.f0+50; s = exgeneralstraightsynthesisr2(rm,f); 162

163 Modification by function calls Speaking rate manipulation (example) making total duration 2 times longer rm = r; rm.temporalpositions = r.temporalpositions*2; s = exgeneralstraightsynthesisr2(rm,f); 163

164 Modification by function calls Vocal tract length manipulation (example) making vocal tract length 1.2 times longer fftl = (size(f.spectrogramstraight,1)-1)*2; fxoriginal = (0:fftl/2)/fftl*f.samplingFrequency; fxtarget = fxoriginal*1.2; fxtarget = min(f.samplingfrequency/2, fxtarget); fm = f; fm.f.spectrogramstraight = interp1(fxoriginal,f.spectrogramstraight,fxtarget); s = exgeneralstraightsynthesisr2(r,fm); 164

165 Modification by function calls Vocal tract length manipulation (example) making vocal tract length 0.8 times of the original fftl = (size(f.spectrogramstraight,1)-1)*2; fxoriginal = (0:fftl/2)/fftl*f.samplingFrequency; fxtarget = fxoriginal*0.8; fm = f; fm.f.spectrogramstraight = interp1(fxoriginal,f.spectrogramstraight,fxtarget); s = exgeneralstraightsynthesisr2(r,fm); nonlinear frequency axis modification is possible by designing fxtarget 165

166 166

167 167

168 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 168

169 Matlab

170 170

171 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 171

172 voices Temporally variable multi-aspect N-way morphing attribute 172

173 Temporally variable multi-aspect N-way morphing analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal F0 analysis F0 periodic pulse generator shaper and mixer input signal-1 non-periodicity analysis nonperiodicity morphing non-periodic component generator time axis alignment time axis mapping time axis alignment input signal-k frequency axis alignment frequency axis mapping frequency axis alignment signal analysis parameter physical attributes data input signal-n analysis a set of indexed weights of physical attributes process 173

174 STRAIGHT analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal F0 analysis F0 periodic pulse generator shaper and mixer input signal-1 non-periodicity analysis nonperiodicity morphing non-periodic component generator time axis alignment time axis mapping time axis alignment input signal-k frequency axis alignment frequency axis mapping frequency axis alignment signal analysis parameter physical attributes data input signal-n analysis a set of indexed weights of physical attributes process 174

175 Temporally variable multi-aspect N-way morphing analysis physical attributes synthesis spectral envelope analysis spectral envelope filter output signal F0 analysis F0 periodic pulse generator shaper and mixer input signal-1 non-periodicity analysis nonperiodicity morphing non-periodic component generator time axis alignment time axis mapping time axis alignment input signal-k frequency axis alignment frequency axis mapping frequency axis alignment signal analysis parameter physical attributes data input signal-n analysis a set of indexed weights of physical attributes process 175

176 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 176

177 Generalized morphing enabling extrapolation location, speed... no constraint F0, power... positivity time axis, frequency axis... monotonicity w.sum(function) exponent(w.sum(log(function))) integration(exponent(w.sum(log(function )))) derivative of function 177

178 What is the problem? interpolation 178

179 What is the problem? interpolation Break down extrapolation Non-monotonic mapping 179

180 Speech parameter constraints ( 1 ) time increases monotonically ( 2 ) frequency increases monotonically ( 3 ) time-frequency spectral representation is positive ( 4 ) fundamental frequency is positive abstract time Θ ( ) { ) ) Θ (k) (ν, τ) = f (k) 0 ( t (k) (τ), a (k) (t (k) (τ) P (f (k) (k) (ν),t (k) (τ),f (k) (ν),t (k) (τ), (1) morphing entity 180 abstract frequency ), }

181 Speech parameter constraints ( 1 ) time increases monotonically ( 2 ) frequency increases monotonically ( 3 ) time-frequency spectral representation is positive ( 4 ) fundamental frequency is positive abstract time Θ ( ) { ) ) Θ (k) (ν, τ) = f (k) 0 ( t (k) (τ), a (k) (t (k) (τ) P (f (k) (k) (ν),t (k) (τ),f (k) (ν),t (k) (τ), (1) morphing entity 181 abstract frequency ), }

182 Speech parameter constraints ( 1 ) time increases monotonically ( 2 ) frequency increases monotonically ( 3 ) time-frequency spectral representation is positive ( 4 ) fundamental frequency is positive abstract time Θ ( ) { ) ) Θ (k) (ν, τ) = f (k) 0 ( t (k) (τ), a (k) (t (k) (τ) P (f (k) (k) (ν),t (k) (τ),f (k) (ν),t (k) (τ), (1) morphing entity 182 abstract frequency ), }

183 Speech parameter constraints ( 1 ) time increases monotonically ( 2 ) frequency increases monotonically ( 3 ) time-frequency spectral representation is positive ( 4 ) fundamental frequency is positive abstract time Θ ( ) { ) ) Θ (k) (ν, τ) = f (k) 0 ( t (k) (τ), a (k) (t (k) (τ) P (f (k) (k) (ν),t (k) (τ),f (k) (ν),t (k) (τ), (1) morphing entity 183 abstract frequency ), }

184 Speech parameter constraints ( 1 ) time increases monotonically ( 2 ) frequency increases monotonically ( 3 ) time-frequency spectral representation is positive ( 4 ) fundamental frequency is positive abstract time Θ ( ) { ) ) Θ (k) (ν, τ) = f (k) 0 ( t (k) (τ), a (k) (t (k) (τ) P (f (k) (k) (ν),t (k) (τ),f (k) (ν),t (k) (τ), (1) morphing entity 184 abstract frequency ), }

185 Generalized morphing enabling extrapolation location, speed... no constraint F0, power... positivity time axis, frequency axis... monotonicity w.sum(function) exponent(w.sum(log(function))) integration(exponent(w.sum(log(function )))) derivative of function 185

186 No constraint case morphed parameter: function number of cases weight N g m1 (t m3 (τ)) = w (k) (t (k) (τ))g (k) (t (k) (τ)), (2) k=1 speech parameter index of case N w (k) (t (k) (τ)) = 1. k=1 not always necessary 186

187 Generalized morphing enabling extrapolation location, speed... no constraint F0, power... positivity time axis, frequency axis... monotonicity w.sum(function) exponent(w.sum(log(function))) integration(exponent(w.sum(log(function )))) derivative of function 187

188 positivity constraint ( N g m2 (t m3 (τ)) = exp w (k) (t (k) (τ)) log ( g (k) (t (k) (τ)) )) k=1 ( k=1 ( N ( = g (k) (t (k) (τ)) ) w (k) (t (k) (τ)), (4) g m2 (t m3 (τ)) > 0 188

189 Generalized morphing enabling extrapolation location, speed... no constraint F0, power... positivity time axis, frequency axis... monotonicity w.sum(function) exponent(w.sum(log(function))) integration(exponent(w.sum(log(function )))) derivative of function 189

190 monotonicity constraint morphed attribute: function number of cases weight ( ( τ N ( ) ) dg g m3 (τ) = exp w (k) (k) (ξ) (ξ) log dξ 0 dξ k=1 index of case τ N ( ( ) dg (k) w (ξ) (k) (ξ) = dξ, (5) dξ 0 k=1 speech attribute abstract parameter dg m3 (τ) > 0 dτ 190

191 Generalized morphing ( ( ) ) morphing entity ( examplar ( Θ m (ν, τ)=t Θ (1) (ν, τ), Θ (2) (ν, τ),...,θ (K) (ν, τ); W ), (6) W ={w F0 (τ), w A (τ), w P (τ), w Fx (τ), w Tx } (τ)}, (7) w X (τ) =[w (1) X (τ),w(2) X (τ),...,w(k) X (τ)]t } X {F 0, A, P, F x,t x } F0 aperiodicity time-frequency rep. frequency c. time c. 191

192 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 192

193 ( Implementation: ) piece-wise linear function ( ) time axis of an example ID of the example ( ) t (k) (τ) =(p (k) (τ n+1 ) p (k) (τ n ))(τ τ n )+p (k) (τ n ). (8) morphed time axis value at an anchor anchor location ID of the anchor t m3 (τ) =(p m (τ n+1 ) p m (τ n ))(τ τ n )+p m (τ n ), (11) p m (τ n )= K ( p (k) (τ n ) p (k) (τ n 1 ) ) w (k) Tx (τ n) k=1 value at morphed location + p m (τ n 1 ), (12) 193

194 Matlab implementation of function inversion yi = interp1(x,y,xi, linear, extrap ); xi = interp1(y,x,yi, linear, extrap ); 194

195 Temporally variable multi-aspect N-way morphing voices attribute 195

196 Movie

197 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 197

198 GUI for generalized morphing preparation Matlab

199 Matlab November, 2013, APSIPA, Taiwan

200 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 200

201 Morphing by scripting Matlab function for temporally variable multi-aspects arbitrary many voices morphing morphedobject = tvariablenwaymorphingraw(objectbundle,contributionstructure,dispon); synthstructure = generatemorphedsound(morphedobject); objectbundless = STRAIGHTobject: {1x8 cell} contributionstructure = timeaxis: [47x8 double] fundamentalfrequency: [47x8 double] frequencyaxis: [47x8 double] aperiodicity: [47x8 double] spectrum: [47x8 double] 201

202 Morphing by scripting Matlab function for temporally variable multi-aspects arbitrary many voices morphing morphedobject = tvariablenwaymorphingraw(objectbundle,contributionstructure,dispon); synthstructure = generatemorphedsound(morphedobject); morphedobject = morphedtimeanchors: [49x1 double] timemorphedframe: [522x8 double] morphedtargetf0: morphedf0: [1x522 double] f0listonmorphedtime: [522x8 double] frequencymappingatanchor: [1x1 struct] frameonmorphing: [522x1 double] morphedvuv: [1x522 double] contributionstructure: [1x1 struct] morphedspectrogram: [2049x522 double] morphedaperiodicity: [2x522 double] elapsedtime: cutofflistfix: [5x1 double] samplingfrequency: procedurename: 'tvariablenwaymorphing' tmpobj: [1x1 struct] 202

203 Morphing by scripting Matlab function for temporally variable multi-aspects arbitrary many voices morphing morphedobject = tvariablenwaymorphingraw(objectbundle,contributionstructure,dispon); synthstructure = generatemorphedsound(morphedobject); contributionstructure = timeaxis: [47x8 double] fundamentalfrequency: [47x8 double] frequencyaxis: [47x8 double] aperiodicity: [47x8 double] spectrum: [47x8 double] flexible manipulation can be implemented by assigning relevant weights for contributionstructure 203

204 Application Speech modification using STRAIGHT Modification using GUIs Modification by function calls Extended morphing for two voices Temporally variable multi-aspect arbitrary many voices morphing Formulation Implementation Morphing using GUIs Morphing by scripting Morphing as a research tool 204

205

206 Topic Application STRAIGHT Background 206

207 Topic Application STRAIGHT Background 207

208 Summary Application STRAIGHT Background Interference-free representations play important roles Periodic excitation is an efficient and robust strategy for sampling and transmitting relevant information for communications using voice STRAIGHT is a collection of functions and applications Extended morphing provides a unique research strategy useful for para- and non-linguistic aspects of speech 208

209 Thank you! Roy D. Patterson Masanori Morise Hideki Banno Toshio Irino Ryuichi Nisimura Verena G. Skuk Stefan Schweinberger Parham Zolfaghari Ken-Ichi Sakakibara Ikuyo Masuda-Katsuse Alain de Cheveigne Josh McDermott Osamu Fujimura Toru Takahashi Tomoki Toda and many others. 209

210 References 210

211 Reference: STRAIGHT Kawahara, H., Morise, M., Toda, T., Banno, H., Nisimura, R., & Irino, T. (2014). Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation. In Fifteenth Annual Conference of the International Speech Communication Association. Kawahara, H., Morise, M., & Sakakibara, K. I. (2013d). Temporally fine F0 extractor applied for frequency modulation power spectral analysis of singing voices. Proc. MAVEBA, Kawahara, H., Morise, M., Banno, H., & Skuk, V. G. (2013c). Temporally variable multi-aspect N-way morphing based on interference-free speech representations. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2013 Asia-Pacific (pp. 1-10). IEEE. Kawahara, H., Morise, M., Toda, T., Nisimura, R., & Irino, T. (2013b). Beyond bandlimited sampling of speech spectral envelope imposed by the harmonic structure of voiced sounds. In INTERSPEECH (pp ). Kawahara, H., Morise, M., Nisimura, R., & Irino, T. (2013a). Higher order waveform symmetry measure and its application to periodicity detectors for speech and singing with fine temporal resolution. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on (pp ). IEEE. Kawahara, H., Morise, M., Nisimura, R., & Irino, T. (2012b). Deviation measure of waveform symmetry and its application to high-speed and temporally-fine F0 extraction for vocal sound texture manipulation. In Interspeech. Kawahara, H., & Morise, M. (2012a). Analysis and synthesis of strong vocal expressions: extension and application of audio texture features to singing voice. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on (pp ). IEEE. Kawahara, H., & Morise, M. (2011b). Technical foundations of TANDEM-STRAIGHT, a speech analysis, modification and synthesis framework. Sadhana, 36(5), Kawahara, H., Irino, T., & Morise, M. (2011a). An interference-free representation of instantaneous frequency of periodic signals and its application to F0 extraction. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on (pp ). IEEE. 211

212 Reference: STRAIGHT Kawahara, H., Morise, M., Takahashi, T., Banno, H., Nisimura, R. & Irino, T. (2010b). Kurtosis-based acoustic event detection and its application to speech analysis, modification and synthesis systems, Spring Annual Meeting of the Acoustical Society of Japan, [in Japanese] Kawahara, H., Morise, M., Takahashi, T., Banno, H., Nisimura, R., & Irino, T. (2010a). Simplification and extension of non-periodic excitation source representations for high-quality speech manipulation systems. In Interspeech 2010, Fujimura, O., Honda, K., Kawahara, H., Konparu, Y., Morise, M., & Williams, J. C. (2009). Noh voice quality. Logopedics Phoniatrics Vocology, 34(4), Kawahara, H., Takahashi, T., Morise, M., & Banno, H. (2009b). Development of exploratory research tools based on TANDEM-STRAIGHT. In Proceedings: APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference (pp ). Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference, International Organizing Committee. Kawahara, H., Nisimura, R., Irino, T., Morise, M., Takahashi, T., & Banno, H. (2009a). Temporally variable multiaspect auditory morphing enabling extrapolation without objective and perceptual breakdown. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on (pp ). IEEE. Kawahara, H., Morise, M., Takahashi, T., Nisimura, R., Irino, T., & Banno, H. (2008, March). TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, F0, and aperiodicity estimation. In Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on (pp ). IEEE. Banno, H., Hata, H., Morise, M., Takahashi, T., Irino, T., & Kawahara, H. (2007). Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation. Acoustical science and technology, 28(3), Kawahara, H. (2006). STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds. Acoustical science and technology, 27(6),

213 Reference: STRAIGHT Kawahara, H., de Cheveigné, A., Banno, H., Takahashi, T., & Irino, T. (2005, September). Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT. In Interspeech (pp ). Matsui, H., & Kawahara, H. (2003b). Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system. In INTERSPEECH. Kawahara, H., & Matsui, H. (2003a). Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation. In Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP'03) IEEE International Conference on (Vol. 1, pp. I-256). IEEE. Kawahara, H., Estill, J., & Fujimura, O. (2001, September). Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT. In MAVEBA (pp ). Kawahara, H., Atake, Y., & Zolfaghari, P. (2000). Accurate vocal event detection method based on a fixed-point analysis of mapping from time to weighted average group delay. In INTERSPEECH (pp ). Kawahara, H., Katayose, H., de Cheveigné, A., & Patterson, R. D. (1999b). Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity. In EuroSpeech (Vol. 99, No. 6, pp ). Kawahara, H., Masuda-Katsuse, I., & de Cheveigné, A. (1999a). Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds. Speech communication, 27(3), Kawahara, H. (1997). Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited. In Acoustics, Speech, and Signal Processing, ICASSP-97., 1997 IEEE International Conference on (Vol. 2, pp ). IEEE. 213

214 Reference: using STRAIGHT Assmann, P. F., & T. M. Nearey (2008). Identification of frequency-shifted vowels. The Journal of the Acoustical Society of America, 124(5), Athanasios, T., Zañartu, M., Little, M.A., Fox, C., Ramig, L.O., & Clifford, G.D. (2014). Robust fundamental frequency estimation in sustained vowels: Detailed algorithmic comparisons and information fusion with adaptive Kalman filtering, The Journal of the Acoustical Society of America, 135(5), Bruckert, L., Bestelmeyer, P., Latinus, M., Rouger, J., Charest, I., Rousselet, G. A.,... & Belin, P. (2010). Vocal attractiveness increases by averaging. Current Biology, 20(2), d' Alessandro, C., Rilliard, A., & Le Beux, S. (2011). Chironomic stylization of intonationa). The Journal of the Acoustical Society of America, 129(3), Humes, L. E., Kewley-Port, D., Fogerty, D., & Kinney, D. (2010). Measures of hearing threshold and temporal processing across the adult lifespan. Hearing research, 264(1), Ives, D. T., Smith, D. R., & Patterson, R. D. (2005). Discrimination of speaker size from syllable phrasesa). The Journal of the Acoustical Society of America, 118(6), Kawahara, H., Kitamura, T., Takemoto, H., Nisimura, R., & Irino, T. (2014). Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information. In Fifteenth Annual Conference of the International Speech Communication Association. Kawahara, H., Mizobuchi, S., Morise, M., Nisimura, R., & Irino, T. (2014). Realtime conversion of growl-type voice qualities based on modulation and approximate time-varying filtering driven by a non-linear oscillator: Formulation. IPSJ SIG Technidal report, 2014-MUS-102(14), 1-6. Liu, C., & Kewley-Port, D. (2004). Vowel formant discrimination for high-fidelity speech. The Journal of the Acoustical Society of America, 116(2), Nguyen, P. C., Takao, O., & Akagi, M. (2003). Modified restricted temporal decomposition and its application to low rate speech coding. IEICE TRANSACTIONS on Information and Systems, 86(3),

215 Reference: using STRAIGHT Saitou, T., Goto, M., Unoki, M., & Akagi, M. (2007, October). Speech-to-singing synthesis: Converting speaking voices to singing voices by controlling acoustic features unique to singing voices. In Applications of Signal Processing to Audio and Acoustics, 2007 IEEE Workshop on (pp ). IEEE. chweinberger, S. R., Walther, C., Zäske, R., & Kovács, G. (2011). Neural correlates of adaptation to voice identity. British Journal of psychology, 102(4), Schweinberger, S. R., Casper, C., Hauthal, N., Kaufmann, J. M., Kawahara, H., Kloth, N.,... & Zäske, R. (2008). Auditory adaptation in voice perception. Current Biology, 18(9), Skuk, V. G., & Schweinberger, S. R. (2014). Influences of Fundamental Frequency, Formant Frequencies, Aperiodicity, and Spectrum Level on the Perception of Voice Gender. Journal of Speech, Language, and Hearing Research, 57(1), Skuk, V. G., & Schweinberger, S. R. (2013). Adaptation aftereffects in vocal emotion perception elicited by expressive faces and voices. PloS one, 8(11), e Smith, D. R., Patterson, R. D., Turner, R., Kawahara, H., & Irino, T. (2005). The processing and perception of size information in speech sounds. The Journal of the Acoustical Society of America, 117(1), 305. Toda, T., & Tokuda, K. (2007). A speech parameter generation algorithm considering global variance for HMMbased speech synthesis. IEICE TRANSACTIONS on Information and Systems, 90(5), Toda, T., Saruwatari, H., & Shikano, K. (2001). Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In Acoustics, Speech, and Signal Processing, Proceedings.(ICASSP'01) IEEE International Conference on (Vol. 2, pp ). IEEE. Tsanas, A., Zañartu, M., Little, M.A., Fox, C., Ramig, L.O., & Clifford, G.D. (2014). Robust fundamental frequency estimation in sustained vowels: detailed algorithmic comparisons and information fusion with adaptive Kalman filtering, The Journal of the Acoustical Society of America, XXX(X), XXX. von Kriegstein, K., Smith, D. R., Patterson, R. D., Kiebel, S. J., & Griffiths, T. D. (2010). How the human brain recognizes speech in the context of changing speakers. The Journal of Neuroscience, 30(2),

216 Reference: using STRAIGHT von Kriegstein, K., Smith, D. R., Patterson, R. D., Ives, D. T., & Griffiths, T. D. (2007). Neural representation of auditory size in the human voice and in sounds from other resonant sources. Current Biology, 17(13), von Kriegstein, K., Warren, J. D., Ives, D. T., Patterson, R. D., & Griffiths, T. D. (2006). Processing the acoustic effect of size in speech sounds. Neuroimage, 32(1), Yonezawa, T., Suzuki, N., Abe, S., Mase, K., & Kogure, K. (2007). Perceptual continuity and naturalness of expressive strength in singing voices based on speech morphing. EURASIP Journal on Audio, Speech, and Music Processing, 2007(3), 2. Yu, K., & Young, S. (2011). Continuous F0 modeling for HMM based statistical parametric speech synthesis. Audio, Speech, and Language Processing, IEEE Transactions on, 19(5), Zäske, R., Schweinberger, S. R., & Kawahara, H. (2010). Voice aftereffects of adaptation to speaker identity. Hearing research, 268(1), Zäske, R., Schweinberger, S. R., Kaufmann, J. M., & Kawahara, H. (2009). In the ear of the beholder: neural correlates of adaptation to voice gender. European Journal of Neuroscience, 30(3), Zen, H., Nose, T., Yamagishi, J., Sako, S., Masuko, T., Black, A. W., & Tokuda, K. (2007a). The HMM-based speech synthesis system (HTS) version 2.0. In SSW (pp ). Zen, H., Toda, T., Nakamura, M., & Tokuda, K. (2007b). Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge IEICE transactions on information and systems, 90(1), Zen, H., Tokuda, K., & Black, A. W. (2009). Statistical parametric speech synthesis. Speech Communication, 51(11),

すべて見る

27 5) STRAIGHT ) STRAIGHT 8) 3 STRAIGHT ),6),2) 7) 7),9) 5) 2. 2. STRAIGHT 5),7) 2.. spline 2..2 6) ms 2..3 4) STRAIGHT (db) ERB N(Effective Rectangul

27 5) STRAIGHT ) STRAIGHT 8) 3 STRAIGHT ),6),2) 7) 7),9) 5) 2. 2. STRAIGHT 5),7) 2.. spline 2..2 6) ms 2..3 4) STRAIGHT (db) ERB N(Effective Rectangul 2 2 4 3 STRAIGHT 3 5 2 Perceptual study on design reuse of voice identity and singing style based on singing voice morphing HIDEKI KAWAHARA, TAICHI IKOMA, MASANORI MORISE, TORU TAKAHASHI, KEN ICHI TOYODA