paper.dvi

Similar documents
1. HNS [1] HNS HNS HNS [2] HNS [3] [4] [5] HNS 16ch SNR [6] 1 16ch 1 3 SNR [4] [5] 2. 2 HNS API HNS CS27-HNS [1] (SOA) [7] API Web 2

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2


経済論集 44‐1(よこ)/2.李

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

H(ω) = ( G H (ω)g(ω) ) 1 G H (ω) (6) 2 H 11 (ω) H 1N (ω) H(ω)= (2) H M1 (ω) H MN (ω) [ X(ω)= X 1 (ω) X 2 (ω) X N (ω) ] T (3)

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

音響モデル triphone 入力音声 音声分析 デコーダ 言語モデル N-gram bigram HMM の状態確率として利用 出力層 triphone: 3003 ノード リスコア trigram 隠れ層 2048 ノード X7 層 1 Structure of recognition syst

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

,,.,.,,.,.,.,.,,.,..,,,, i

DEIM Forum 2012 E Web Extracting Modification of Objec

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

ActionScript Flash Player 8 ActionScript3.0 ActionScript Flash Video ActionScript.swf swf FlashPlayer AVM(Actionscript Virtual Machine) Windows

2

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

2

7) 8) 9),10) 11) 18) 11),16) 18) 19) 20) Vocaloid 6) Vocaloid 1 VocaListener1 2 VocaListener1 3 VocaListener VocaListener1 VocaListener1 Voca

28 Horizontal angle correction using straight line detection in an equirectangular image


2 HMM HTK[2] 3 left-to-right HMM triphone MLLR 1 CSJ 10 1 : 3 1: GID AM/CSJ-APS/hmmdefs.gz

3.1 Thalmic Lab Myo * Bluetooth PC Myo 8 RMS RMS t RMS(t) i (i = 1, 2,, 8) 8 SVM libsvm *2 ν-svm 1 Myo 2 8 RMS 3.2 Myo (Root

pp d 2 * Hz Hz 3 10 db Wind-induced noise, Noise reduction, Microphone array, Beamforming 1

10_08.dvi

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System



(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

IPSJ SIG Technical Report Vol.2012-MUS-96 No /8/10 MIDI Modeling Performance Indeterminacies for Polyphonic Midi Score Following and

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

WikiWeb Wiki Web Wiki 2. Wiki 1 STAR WARS [3] Wiki Wiki Wiki 2 3 Wiki 5W1H Wiki Web 2.2 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 5W1H 2.3 Wiki 2015 Informa

27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U

19_22_26R9000操作編ブック.indb

paper.dvi

DEIM Forum 2010 D Development of a La

2

2

main.dvi

DT pdf


1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

1 (PCA) 3 2 P.Viola 2) Viola AdaBoost 1 Viola OpenCV 3) Web OpenCV T.L.Berg PCA kpca LDA k-means 4) Berg 95% Berg Web k-means k-means


Lyra X Y X Y ivis Designer Lyra ivisdesigner Lyra ivisdesigner 2 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) (1) (2) (3) (4) (5) Iv Studio [8] 3 (5) (4) (1) (

IPSJ SIG Technical Report Vol.2014-EIP-63 No /2/21 1,a) Wi-Fi Probe Request MAC MAC Probe Request MAC A dynamic ads control based on tra

Transcription:

59 6 2003 pp. 1 11 1 43.72.Kb * 1 2 3 1. 2 2 1 1 1 [1] Person Recognition for News Videos through Multimodal Interaction, by Masakiyo Fujimoto, Yasuo Ariki and Shuji Doshita. 1 ATR 2 3 masakiyo.fujimoto@atr.jp 2003 2003

2 59 6 2003 1 1 2 5 150 93.33% 76.19% 2 3 2. [2] IPv6 [3] [4] [5] NHK BS [6] 2000 12 1 2007

3 [7] Put-that-there [14] Electric Program Guide: EPG Put-that-there VTR EPG Put-that-there [8] [8] [9] [10] [10] 97% 3. 1 1 [8] [13] User Who is he? Pointing Speech input Action input Speech recog Action recog Face extraction [11] [12] Face recognition Web Information retrieval Information presentation 1

4 59 6 2003 4.1 [12] 3 [16] 3 y i(t) M θ x(t) τ x(t) d 3 θ θ 4 5 θ 6 4. 2 2 M MLLR(Maximum Likelihood Linear Regression) [15] M θ θ d θ θ y 1 (t) y (t) y (t) Delay τ Delay (Μ 1)τ y 1 (t) y 1 (t) x(t)=my (t) 1 3 Observed signal Speaker direction estimation Hands-free speech input Beam forming User utterance section detection Acoustic model adaptation Speech recognition 4.2 CSP(Cross-power Spectrum Phase analysis) [11] i j y i(t) y j(t) CSP i,j(k) [ ] F[yi(t)]F[y j(t)] CSP i,j(k) = F 1 F[y i(t)] F[y j(t)] (1) 2 F F 1

5 τ τ = arg max (CSP i,j(k)) (2) k θ c f ( ) c τ/f θ = cos 1 (3) d 4.3 4 time(s) Direction Of Arrival: DOA DOA(deg.) User utterance section 4 5 PC x6 Loudspeaker (TV sound) Screen 5.0m News 4.4 1 4.1 4.3 MLLR [15] [17] 5 1 1 MLLR 2 MLLR [17] 1 5 DOA 4.5 DOA 6 1 6 2.6m 2.85m 10.0m 0.8m Loudspeaker (TV sound) 1.2m Screen Microphone array DOA stability section News sound+ user utterance Screen PC x2 2.4m 2.0m Speaker (User) 2.4m 4.9m PC News 8.0m Digital projector

6 59 6 2003 1 6 5. 5.1 3 LED LED 3 [18] 2 LED LED 15cm 7 7cm LED 7 PC PC 4 7 PC PC PC PC 5.2 1 4.5 1 1 2 6. [19]

7 1/n 6.1 1 100 150 10 10 150 x 1 9 x 150 [20] (7) m(= 150) {x t} (t = 1,, t,, N) µ Σ (N PD = x µ 2 k (x µ,ϕ d ) 2 (7) µ = 1 N Σ = 1 N N x t (4) t=1 N (x t µ)(x t µ) T (5) t=1 Eigenface space Σ = VΛV T (6) Λ Σ λ d (d = 1,, k,, m) V Σ ϕ d (d = ϕ 2 1,, k,, m) Observation space 9 ϕ 1 PD 6.2 1 Search window Input image 8,, n 1, n n n n 8 n n 100 150 ϕ 1 µ d=1 PD x - µ λ 1 > λ2 > x λ3 ϕ 3

8 59 6 2003 1 7. 16kHz(16bit) 5 4096 CSP NHK 10 256ms 256ms 1 Hamming Window 3 150 2 1 16kHz(16bit) 1 0.97z 1 3 PC 13 MFCC(0 12 ) PC + + (39 ) PC TCP/IP 20ms 10ms Hamming Window PC 7.1 4 PC Intel Xeon 1.7GHz 2 Memory 10m 8m 3 512MByte PC 1.2m 16 1.5 d = 2cm 2m 0 3 9 PC 4 (%) Beam Forming 67.33(101/150) Beam Forming+2 MLLR 93.33(140/150) 58dB(A) 40dB(A) 7.2 T 60 = 0.3[sec] mono- 1.7GHz Memory 256MByte phone HMM(5 3 12 41 PC ) HMM [21] 100% 150 137 21782 150 7.3 1 2 4.4 2 PC Intel Pentium4 MLLR 5 4 20 3 93.33% 100% 1 2 93.33% 150 140 3 PC Intel Pentium4 1.7GHz Memory 256MByte PC 1 3 4.4

9 [22] 10 60 150 150 300 1 300 10 1 20 1 60.00%(84/140) 2 60.00% 6.1 6.1 1 76.19%(64/84) 20 150 64 42.67% 42.67% 93.33% 1 1) 2) 3) 4) 3 1) 2) 3) 3 1 3 3

10 59 6 2003 [23] [ 1 ],, 9 (2000). [ 2 ] TV, http://www.jiten.com/dicmi/docs/k15/18379s.htm [ 3 ] IPv6, http://www.iij.ad.jp/ipv6/ [ 4 ], http://plusd.itmedia.co.jp/broadband/rbb/0203/13/ rbb 0313 10.html [ 5 ] NHK dnhk, http://www.nhk.or.jp/data/ [ 6 ] NHK /digital, http://www.nhk.or.jp/digital/ [ 7 ] R. A. Bolt, Put-that-there : Voice and gesture at the graphics interface, ACM Computer Graphics, Vol. 14, No. 3, 262-270 (1980). [ 8 ] N. Krahnstoever, S. Kettebekov, M. Yeasin, and R. Sharma, A Real-Time Framework for Natural Multimodal Interaction with Large Screen Displays, Proc. ICMI 02, 349-354 (2002). [ 9 ] R. Sharma, M. Yeasin, N. Krahnstoever, I. [24] Rauschert, G. Cai, I. Brewer, A. M. MacEachren, and K. Sengupta, Speech Gesture Driven Multimodal Interfaces for Crisis Management, Proc. IEEE, Vol. 91, No. 9, 1327-1354 (2003). [10] R. Sharma, J. Cai, S. Chakravarthy, I. Poddar, and Y.Sethi, Exploiting Speech/Gesture Cooccurrence for Improving Continuous Gesture Recognition Weather Narration, Proc. FG 00, 422-427 (2000). [11] M. Omologo and P. Svaizer, Acoustic Event Localization Using a Crosspower-Spectrum Phase Based 8. Technique, Proc. ICASSP 94, I, 273-276 (1994). [12],,, SP95-62, 1-8 (1995). [13] M. Kaneko and O. Hasegawa, Processing of Face Images and Its Applications, IEICE Transactions on Information and Systems, Vol. E82-D, No. 3, 535-544 (2005). [14] Y. Ariki, N. Ishikawa, and Y. Sugiyama, Face indexing on Video Data Extraction, Recognition, Tracking and Modeling, Proc. FG 98, 62-69, (1998). [15] C. L. Leggetter and P. C. Woodland, Maximum Likelihood Linear Regression for Speaker Adap- 5 93.33% tation of Continuous Density Hidden Markov Models, Computer Speech and Language, 9, 171-100.00% 60.00% 185 (1995). 76.19% [16] J. L. Flanagan, J. D. Jhonston, R. Zhan and G. W. Elko, Computer-Steered Microphone Arrays for 42.67% Sound Transduction in Large Rooms, J.Acoust. Soc. Am., 78(5), 1508-1518 (1985). [17] M. Fujimoto, Y. Ariki and S. Doshita, Hands- Free Speech Recognition in Real Environments Using Microphone Array and 2-Levels MLLR Adaptation as a Front-End System for Conversational TV, Acoustical Science and Technology, 24(6), 379-381 (2003). [18] Visualeyez USER S MANUAL, PhoeniX Technologies Incorporated [19],,,, (, 1986) [20],,,,, 24(1), 106-112 (1983). [21] http://www.milab.is.tsukuba.ac.jp/jnas/ [22],, http://www.hoip.jp/web catalog/top.html

11 [23],,,, TV, FIT 03, K-039, 507-508 (2003). [24],,,,, S-tgif,, SP96-32, 89-96 (1996). 1997 2001 2004 ATR 2003 ISCA IEEE 1974 1976 1979 1980 1990 1992 2003 1987 1990 IEEE ISCA 1958 1960 1963 1965 1968 1973 1996 ( ) 1998 1999 2003 1959 1988 1990 30 1999 1994 1995 1997 1998