VTLN Maximum Likelihood Liniear Regression; MLLR [3] x Ax + c MLLR A, c SI / [] [] SI Localized Affine Invarian Feaure; LAIF [] LAIF LAIF MFCC / Merin

Size: px

Start display at page:

Download "VTLN Maximum Likelihood Liniear Regression; MLLR [3] x Ax + c MLLR A, c SI / [] [] SI Localized Affine Invarian Feaure; LAIF [] LAIF LAIF MFCC / Merin"

かねろうのじま
5 years ago
Views:

1 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE {suzuki,qiao,mine,hirose}@gavou-okyoacjp (Localized Affine Invarian Feaures; LAIF LAIF LAIF LAIF LAIF LAIF MFCC MFCC MFCC+ MFCC+LAIF MFCC+ MFCC 37% Speech recogniion using localized affine invarian feaures Masayuki SUZUKI, Yu QIAO, Nobuaki MINEMATSU, and Keikichi HIROSE Grad School of Engineering, Univ of Tokyo 7 3, Hongo, Bunkyo-ku, Tokyo, 3 Japan Grad School of Info Sci and Tech, Univ of Tokyo 7 3, Hongo, Bunkyo-ku, Tokyo, 3 33 Japan {suzuki,qiao,mine,hirose}@gavou-okyoacjp Absrac This paper proposes localized affine invarian feaures (LAIFs for speaker-independen auomaic speech recogniion The LAIFs can be calculaed direcly from daa sequences As speaker variaions can be approximaed well by affine ransform in a cepsral space, he LAIFs can provide robus feaures wih respec o hose variaions This fac inspires us o expec ha he use of he LAIFs should improve he recogniion performance especially when no raining daa is available for speaker normalizaion or adapaion To verify his expecaion, we apply LAIFs for isolaed word recogniion The experimenal resuls show ha he combinaion of LAIFs wih MFCC or MFCC+ MFCC can lead o higher performances han MFCC or MFCC+ MFCC only Especially in mismached condiions, MFCC+ MFCC+LAIFs can reduce he error raes by 37% when compared o MFCC+ MFCC only Key words Acousic feaures, Speaker independen ASR, Affine ransform, Localized feaures, Speaker invariance Speaker Independen; SI Speaker Dependen; SD [] SI SD Vocal Trac Lengh Normalizaion; VTLN []

2 VTLN Maximum Likelihood Liniear Regression; MLLR [3] x Ax + c MLLR A, c SI / [] [] SI Localized Affine Invarian Feaure; LAIF [] LAIF LAIF MFCC / Merins Irino [7] [9] Warping Invarian Feaure; WIF LAIF WIF LAIF LAIF Invarian Srucure Represenaion; ISR [] [] ISR LAIF HMM LAIF ISR LAIF LAIF LAIF LAIF X = [x, x,, x T ] d x x x = Ax + c ( A d d c d X X = [x, x,, x T ] LAIF X X k :+k = [x k, x k +,, x,, x +k ] X k :+k F (X k :+k = F (X k :+k =,, T ( F F F (X k :+k k k (X k :+k k = k = k k τ= (X k:+k = τ (x +τ x τ k τ (3 τ= (X k:+k = (X k:+k LAIF F (X k :+k LAIF F (X k :+k = (µ b µ a T (Σ a + Σ b (µ b µ a (

3 µ Σ a [ k,, ] b [,, + k ] µ a Σ a µ a = x τ ( k τ= k Σ a = (x τ µ a (x τ µ a T ( k τ= k ML µ b Σ b LAIF µ a Σ a µ a Σ a µ a = Aµ a + c (7 Σ a = AΣ aa T ( F (X k :+k = (µ b µ a T (Σ a + Σ b (µ b µ a = (Aµ b Aµ a T (A(Σ a + Σ b A T (Aµ b Aµ a = (µ b µ a T A T (A T (Σ a + Σ b A A(µ b µ a = F (X k :+k (9 ( LAIF ( LAIF [] X i k :i+k W X i k :i+k W ( LAIF (7, (9 ( LAIF F (X k :+k ( = n F X n k :n+k = m F X m k :m+k X n k :n+k X m k :m+k LAIF LAIF LAIF [] x Ax + c A A [3], [] [] A [] x x (, x ( LAIF LAIF ( ( ( x ( A ( = A ( x ( x ( = A ( x ( + c ( ( x ( = A ( x ( + c ( ( x ( x ( + ( c ( c ( ( A A s s sream : (x (, x (,, x (s sream : (x (, x (3,, x (s+ sream d s + : (x (d s+, x (d s+,, x (d LAIF A s s s LAIF F (X k :+k d s + [F ( (X k :+k,, F (d s+ (X k :+k ] T 3

4 d cepsrum sequence T x ( x ( x (3 x (d muli sream parameerizaion sream s k +k sream s sream d s + s a x ( x (s b k k + (µ b µa T (Σa + Σb (µb µa LAIF F ( (X k:+k F ( (X k:+k F (d s+ (X k:+k Fig LAIF Calculaion of LAIFs wih muli sream parameerizaion LAIF s = LAIF A Ax + c ( LAIF f (X k :+k = (µ a µ b T (σa + σb (µ a µ b = µ a µ b σ a + σ b (3 s µ σ µ a µ b k = k = k k µ a µ b = w τ (x +τ x τ τ= ( w τ τ /k w τ w τ ( LAIF w τ = τ/( k τ= τ ( (3 (3 s = LAIF s LAIF LAIF [] LAIF - (Specro-Temporal Feaures; STF STF Muroi [] LAIF Muroi Ax + c A [] LAIF 3 LAIF LAIF k, k khz k = k + = k + k Hz Kanedera [7] s LAIF MFCC /aiueo/ sraigh [] 7 LAIF s = s = LAIF MFCC MFCC LAIF MFCC k Hidden-Markov- Toolki(HTK []

5 (a /aiueo/ (a /aiueo/ (b MFCC /aiueo/ (b MFCC/aiueo/ (c LAIF s= /aiueo/ (c LAIF s= /aiueo/ (d LAIF s= /aiueo/ (d LAIF s= /aiueo/ MFCC LAIF Fig Comparison beween oupus of mel filer bank, MFCCs and LAIFs LAIF s= MFCC 3 [9] 3 3 bi/khz bi/khz msec msec 97z DCT MFCC MFCC LAIF Table Acousic feaures used for he experimen Feaures(# of dimension MFCC( MFCC( + LAIF s= ( MFCC( + LAIF s= ( MFCC( + MFCC( MFCC( + MFCC( + LAIF s= ( MFCC( + MFCC( + LAIF s= ( 3 HMM LAIF HMM lef-o-righ HMM

6 Table Recogniion resul M,, and L denoe MFCC, dela cofficiens of MFCC, and LAIF, respecively s means block size for muli sream parameerizaion Mehod M M+L s= M+L s= M+ M+ +L s= M+ +L s= Mached condiion 93% 99% 9% 997% 99% 9939% Male raining Female esing 77% 3% 33% 79% 3% 97% Female raining Male esing 79% 3% 3% 3% 9% 97% Mached condiion MFCC MFCC+ MFCC LAIF MFCC LAIF s= % MFCC+ MFCC LAIF s= 37% s s = s = LAIF LAIF MFCC LAIF SI LAIF SI LAIF+MFCC+ MFCC 37% LAIF SI [] S Young eal, The HTK Book (for HTK Version 3 [] EEide and HGish, A parameric approach o vocal rac lengh normalizaion, Proc In Conf Acousics, Speech, and Signal Processing, vol, pp3 3, 99 [3] CJ Leggeer and PC Woodland, Mazimum likelihood speaker adapaion of coninuous densiy hidden Markov models, Compuer Speech and Language, Vol 9, pp 7, 99 [],,, volj7 D II, no, pp37 3, [] R Gomez, T Toda, H Saruwaari, K Shikano, Techniques in rapid unsupervised speaker adapaion based on HMMsufficien saisics, Speech Communicaion, vol, pp 7 9 [] Y Qiao, M Suzuki, N Minemasu, Affine invarian feaures and is applicaion o speech recogniion, Proc In Conf Acousics, Speech, and Signal Processing, 9 (submied [7] A Merins and J Rademacher, Frequency-warping invarian feaures for auomaic speech recogniion, Proc IEEE In Conf Acousics, Speech, and Signal Processing, vol, pp, [] J Rademacher, M Wacher and A Merins, Improved warping-invarian feaures for auomaic speech recogniion, Proc In Conf Acousics, Speech, and Signal Processing, pp 99 [9] T Irino and R D Paerson, Segregaing informaion abou he size and shape of he vocal rac using a imedomain audiory model: The sabilised wavele-mellin ransform, Speech Communicaion, vol, pp 3 [] N Minemasu, Mahmaical evidence of he acousic universal srucure in speech, Proc In Conf Acousics, Speech, and Signal Processing, pp 9 9 [] S Asakawa, N Minemasu, K Hirose, Muli-sream parameerizaion for srucural speech recogniion, Proc In Conf Acousics, Speech, and Signal Processing, pp 97, [],,,,,, SP 3, pp73 7, [3] M Piz and HNey, Vocal rac normalizaion equals linear ransformaion in cepsral space, IEEE Trans Speech and Audio Processing, vol3, pp93 9, [],,,, volj3 D II, no, pp 7, [] T Muroi, T Takiguchi, Y Ariki Speaker Independen Phoneme Recogniion Based on Fisher Weigh Map, Inernaional Journal of Hybrid Informaion Technology, Vol, No 3, 9 [],,,,,, SP7, pp9 9, 7 [7] N Kanedera, T Arai, H Hermansky, and M Pavel, On he relaive imporance of various componens of he modulaion specrum for auomaic speech recogniion, Speech Communicaion, vol, no, pp 3, 999 [] H Kawahara, STRAIGHT, Exploraion of he oher aspec of VOCODER: Percepually isomorphic decomposiion of speech sounds, Acousic Science and Technology, Vol 7, No [9] -,, vol, no, pp99 9, 99

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. E-mail: {ytamura,takai,tkato,tm}@vision.kuee.kyoto-u.ac.jp Abstract Current Wave Pattern Analysis for Anomaly