(MIRU2009) cuboid cuboid SURF 6 85% Web. Web Abstract Extracting Spatio-te

Similar documents
IPSJ SIG Technical Report Vol.2010-CVIM-171 No /3/19 1. Web 1 1 Web Web Web Multiple Kernel Learning(MKL) Web ( ) % MKL 68.8% Extractin

IPSJ SIG Technical Report Vol.2011-CVIM-177 No /5/ TRECVID2010 SURF Bag-of-Features 1 TRECVID SVM 700% MKL-SVM 883% TRECVID2010 MKL-SVM A

28 TCG SURF Card recognition using SURF in TCG play video

IPSJ SIG Technical Report Vol.2010-CVIM-170 No /1/ Visual Recognition of Wire Harnesses for Automated Wiring Masaki Yoneda, 1 Ta

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

Google Goggles [1] Google Goggles Android iphone web Google Goggles Lee [2] Lee iphone () [3] [4] [5] [6] [7] [8] [9] [10] :

bag-of-words bag-of-keypoints Web bagof-keypoints Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 84

スライド 1

Duplicate Near Duplicate Intact Partial Copy Original Image Near Partial Copy Near Partial Copy with a background (a) (b) 2 1 [6] SIFT SIFT SIF

(MIRU2008) HOG Histograms of Oriented Gradients (HOG)

28 Horizontal angle correction using straight line detection in an equirectangular image

SICE東北支部研究集会資料(2013年)

LBP 2 LBP 2. 2 Local Binary Pattern Local Binary pattern(lbp) [6] R

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-CVIM-186 No /3/15 EMD 1,a) SIFT. SIFT Bag-of-keypoints. SIFT SIFT.. Earth Mover s Distance

光学

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i

2. 30 Visual Words TF-IDF Lowe [4] Scale-Invarient Feature Transform (SIFT) Bay [1] Speeded Up Robust Features (SURF) SIFT 128 SURF 64 Visual Words Ni

本文6(599) (Page 601)

Optical Flow t t + δt 1 Motion Field 3 3 1) 2) 3) Lucas-Kanade 4) 1 t (x, y) I(x, y, t)

yoo_graduation_thesis.dvi

,,.,.,,.,.,.,.,,.,..,,,, i

2.2 6).,.,.,. Yang, 7).,,.,,. 2.3 SIFT SIFT (Scale-Invariant Feature Transform) 8).,. SIFT,,. SIFT, Mean-Shift 9)., SIFT,., SIFT,. 3.,.,,,,,.,,,., 1,

(3.6 ) (4.6 ) 2. [3], [6], [12] [7] [2], [5], [11] [14] [9] [8] [10] (1) Voodoo 3 : 3 Voodoo[1] 3 ( 3D ) (2) : Voodoo 3D (3) : 3D (Welc

untitled

4. C i k = 2 k-means C 1 i, C 2 i 5. C i x i p [ f(θ i ; x) = (2π) p 2 Vi 1 2 exp (x µ ] i) t V 1 i (x µ i ) 2 BIC BIC = 2 log L( ˆθ i ; x i C i ) + q

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. TRECVID2012 Instance Search {sak

paper.dvi

IPSJ SIG Technical Report Vol.2012-CG-149 No.13 Vol.2012-CVIM-184 No /12/4 3 1,a) ( ) DB 3D DB 2D,,,, PnP(Perspective n-point), Ransa

(MIRU2010) Geometric Context Randomized Trees Geometric Context Rand

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

kut-paper-template.dvi

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.

IS1-09 第 回画像センシングシンポジウム, 横浜,14 年 6 月 2 Hough Forest Hough Forest[6] Random Forest( [5]) Random Forest Hough Forest Hough Forest 2.1 Hough Forest 1 2.2

IPSJ SIG Technical Report Vol.2009-CVIM-167 No /6/10 Real AdaBoost HOG 1 1 1, 2 1 Real AdaBoost HOG HOG Real AdaBoost HOG A Method for Reducing

14 2 5

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

3 2 2 (1) (2) (3) (4) 4 4 AdaBoost 2. [11] Onishi&Yoda [8] Iwashita&Stoica [5] 4 [3] 3. 3 (1) (2) (3)

Sobel Canny i

浜松医科大学紀要

第62巻 第1号 平成24年4月/石こうを用いた木材ペレット

DEIM Forum 2010 A Web Abstract Classification Method for Revie

a) Extraction of Similarities and Differences in Human Behavior Using Singular Value Decomposition Kenichi MISHIMA, Sayaka KANATA, Hiroaki NAKANISHI a

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

一般社団法人電子情報通信学会 THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGIN

IPSJ SIG Technical Report GPS LAN GPS LAN GPS LAN Location Identification by sphere image and hybrid sensing Takayuki Katahira, 1 Yoshio Iwai 1

Microsoft PowerPoint - SSII_harada pptx

3 1 Table 1 1 Feature classification of frames included in a comic magazine Type A Type B Type C Others 81.5% 10.3% 5.0% 3.2% Fig. 1 A co

IPSJ SIG Technical Report Vol.2013-CVIM-187 No /5/30 1,a) 1,b), 1,,,,,,, (DNN),,,, 2 (CNN),, 1.,,,,,,,,,,,,,,,,,, [1], [6], [7], [12], [13]., [

untitled

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

VRSJ-SIG-MR_okada_79dce8c8.pdf

[1] SBS [2] SBS Random Forests[3] Random Forests ii

7,, i

情報処理学会研究報告 い認識率を示す事が出来なかったと報告している 視覚特徴量としては SIFT や SURF のような局所的な 領域から特徴量を抽出する方法がある [4] [5] これらの 特徴量とフローベクトルを使いダイナミックなシーンの分 類を行う手法が提案されている しかし これらの画像特

,,,,,,,,,,,,,,,,,,, 976%, i

スライド 1

2007/8 Vol. J90 D No. 8 Stauffer [7] 2 2 I 1 I 2 2 (I 1(x),I 2(x)) 2 [13] I 2 = CI 1 (C >0) (I 1,I 2) (I 1,I 2) Field Monitoring Server

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

(b) BoF codeword codeword BoF (c) BoF Fergus Weber [11] Weber [12] Weber Fergus BoF (b) Fergus [13] Fergus 2. Fergus 2. 1 Fergus [3]

TF-IDF TDF-IDF TDF-IDF Extracting Impression of Sightseeing Spots from Blogs for Supporting Selection of Spots to Visit in Travel Sat

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

58 10


25 Removal of the fricative sounds that occur in the electronic stethoscope

25 fmri A study of discrimination of musical harmony using brain activity obtained by fmri

1 (PCA) 3 2 P.Viola 2) Viola AdaBoost 1 Viola OpenCV 3) Web OpenCV T.L.Berg PCA kpca LDA k-means 4) Berg 95% Berg Web k-means k-means

IPSJ SIG Technical Report Vol.2011-EC-19 No /3/ ,.,., Peg-Scope Viewer,,.,,,,. Utilization of Watching Logs for Support of Multi-

19 Systematization of Problem Solving Strategy in High School Mathematics for Improving Metacognitive Ability

2) 3) LAN 4) 2 5) 6) 7) K MIC NJR4261JB0916 8) 24.11GHz V 5V 3kHz 4 (1) (8) (1)(5) (2)(3)(4)(6)(7) (1) (2) (3) (4)

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

The 18th Game Programming Workshop ,a) 1,b) 1,c) 2,d) 1,e) 1,f) Adapting One-Player Mahjong Players to Four-Player Mahjong

2 ( ) i

5) 2. Geminoid HI-1 6) Telenoid 7) Geminoid HI-1 Geminoid HI-1 Telenoid Robot- PHONE 8) RobotPHONE 11 InterRobot 9) InterRobot InterRobot irt( ) 10) 4

IT,, i

先端社会研究 ★5★号/4.山崎

Web Web Web Web Web, i

JFE.dvi

24 Region-Based Image Retrieval using Fuzzy Clustering

IHI Robust Path Planning against Position Error for UGVs in Rough Terrain Yuki DOI, Yonghoon JI, Yusuke TAMURA(University of Tokyo), Yuki IKEDA, Atsus

大学における原価計算教育の現状と課題

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

Vol1-CVIM-172 No.7 21/5/ Shan 1) 2 2)3) Yuan 4) Ancuti 5) Agrawal 6) 2.4 Ben-Ezra 7)8) Raskar 9) Image domain Blur image l PSF b / = F(

29 jjencode JavaScript

A Survey on Image Recognition Using Geo-tag Information

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag

1 Kinect for Windows M = [X Y Z] T M = [X Y Z ] T f (u,v) w 3.2 [11] [7] u = f X +u Z 0 δ u (X,Y,Z ) (5) v = f Y Z +v 0 δ v (X,Y,Z ) (6) w = Z +


untitled

知能と情報, Vol.30, No.5, pp


Vol.-ICS-6 No.3 /3/8 Input.8.6 y.4 Fig....5 receptive field x 3 w x y Machband w(x =

2 Fig D human model. 1 Fig. 1 The flow of proposed method )9)10) 2.2 3)4)7) 5)11)12)13)14) TOF 1 3 TOF 3 2 c 2011 Information

GPGPU

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

Transcription:

(MIRU2009) 2009 7 182 8585 1 5 1 E-mail: noguchi-a@mm.cs.uec.ac.jp, yanai@cs.uec.ac.jp cuboid cuboid SURF 6 85% Web. Web Abstract Extracting Spatio-temporal Local Features Considering Consecutiveness of Motions Akitsugu NOGUCHI and Keiji YANAI The University of Electro-Communications 1-5-1 Chofugaoka Chofu-shi Tokyo 182-8585 E-mail: noguchi-a@mm.cs.uec.ac.jp, yanai@cs.uec.ac.jp Recently spatio-temporal local features are proposed as image features to recognize events or human actions in videos. In this paper, we propose a novel local spatio-temporal feature which is applicable to large amounts of video data. Our method consists of two parts: extracting visual features and extracting motion features. First, we select candidate points based on the SURF detector, which is a very fast detector. Next, we calculate motion features at each points with local temporal units divided in order to consider consecutiveness of motions. Since our proposed feature is intended to be robust to rotation, we rotate optical flow vectors to the dominant direction of extracted SURF features. In the experiments, we evaluate the proposed spatio-temporal local feature with the common dataset containing six kinds of simple human actions. As the result, the accuracy achieves 85%, which is almost equivalent to state-of-the-art. In addition, we make experiments to classify large amounts of Web video clips downloaded from Youtube. Key words video recognition, action recognition, spatio-temporal local feature 1. Web cuboid cuboid cuboid [1] [2] cuboid HoG HoF

1 KTH 2 cuobid SURF [3] 1 KTH walking running jogging, boxing hand waving, hand clapping 85% Web Youtube 100. 2. 3 4. 5. 2. [4] [6] Cuboid Dollor Cuboid [1] visual word visual word Laptev STIP(spatio-temporal interest point) [2] 3 Cuboid HoG HoF Alireza Cuboid [7] cuboid cuboid SURF 3. 2 SURF (Speeded-up Robust Feature) [3] ( ) ( ) ( ) ( )

4 y xy ( ) ( ) 3 SURF 3. 1 SURF SURF 3 SURF SIFT [8] SURF SIFT SURF 3. 1. 1 SURF SURF SURF X = (x y) T I Σ (X) 1 i< = x j< = y I Σ (X) = I(i j) (1) i=0 j=0 5 haar wavelet x (dx) y (dy) 6 I(x y) (x y) (x y) (x y) SURF I X = (x y) H(X σ) 2 [ L xx (X σ) H(X σ) = L xy (X σ) ] L xy (X σ) L yy (X σ) (2) 7 non-maximum suppression L xx (X σ) I X 2 x 2 g(σ) L yy (X σ) L xy (X σ) 4 L yy L xy D xx D yy D xy det(h approx ) = D xx Dyy (0.9 D xy ) 2 (3) SIFT 6 9

10 9 8 Dominant rotation ( ) ( ) 6 27 15 6 2 51 7 3 3 3 non-maximum suppression 3. 1. 2 SURF Haar Wavelet 5 dx dy dx dy 8 8 Dominant rotation 4 4 dx dy dx dy 4 4 4 = 64 3. 2 3. 2. 1 9( ) SURF N N N=5 N/2 Lucus-Kanade [9] 3. 2. 2 9( ) N M M N M 1 N = 5, M 5 9( ) N x + x y + y 5 x + x x x 5 1 (M 1) 5 dominant rotate 10 (x 1 y 1 ) (x 2 y 2 ) SURF dominant rotate θ (x y) 4 [ x y ] = [ cosθ sinθ x 2 sinθ cosθ y 2 ] x 1 x 2 y 1 y 2 1 (4)

11 3. 3 visual motion 64 visual 5 (M 1) motion weight 64 + 5 (M 1). M=5 84 4. 12 Web 4. 1 Dollar [1] visual word bag-of-video-word SVM 11 k-means bag-of-videoword(bovw) BoVW bag-offeature(bof) [10] BoF support vector machiene(svm) SVM RBF 4. 1. 1 KTH 13 codebook walking running jogging boxing hand clapping hand waving 6 25 4 100 5 5 fold cross validation validation KTH 20 4000. 4. 1. 2 weight k. 12 weight. k 700. 13. weight 2.5. weight = 2.5 k = 1500.. 4

1 (VMR) 2 (VM) 3 (V) 4 (M) 14 VMR VM running jogging walking hand waving boxing hand clapping 1 boxing hand clapping hand waving walking running jogging 1 4 walking running jogging 1 2 3 walking running jogging boxing waving clapping 4 1 waving 85.5%83.3%. Dollar [1] Alireza [7] Laptev [2] 15. 85%, Dollar 82.3% Alireza 91.5% Laptev 91.8% 16 [1] [1] KTH 600 15 16 14 2/3 4. 2 Web Web.. 17. Web.. HSV. χ 2 (

1 2 3 4 ). 17 Web ( ). bag-ofvideo-words ( ). ( ) ( ) k-means. k = 50 4. 2. 1 Youtube Web. 100. 4. 2. 2 18.. 19... 1 20..... 5. SURF SURF dominant rotation KTH 6 85%

図 18 一つの動画のクラスタリング結果: 遠くからの (上) 比較的近くのショット (中) 人をクローズアップしたショット (下) 図 19 全ての動画ショットのクラスタリング結果: 遠くから撮影されたショット (上) 近くからのショット (中) 様々なものが混ざったクラスタ (下) 索 動画要約 自動サーベイランスシステムなどが考え られる 文 [7] 献 [1] P. Dollar, G. Cottrell, and S. Belongie. Behavior recognition via sparse spatio-temporal features. In Proc. of Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65 72, 2005. [2] I. Laptev and T. Lindeberg. Local descriptors for spatio-temporal recognition. In Proc.of IEEE International Conference on Computer Vision, 2003. [3] B. Herbert, E. Andreas, T. Tinne, and G. Luc. Surf: Speeded up robust features. In CVIU, pp. 346 359, 2008. [4] C. Fanti and P. Perona. Hybrid models for human motion recognition. In Proc.of IEEE Computer Vision and Pattern Recognition, 2005. [5] C. Rao, A. Yilmaz, and M. Shah. View-invariant representation and recognition of actions. Int J Comput Vision, Vol. 50(2), pp. 203 226, 2002. [6] Y. Yacoob and M.J. Black. Parameterized modeling [8] [9] [10] [11] and recognition of activities. Comput Vis Image Und, Vol. 72(2), pp. 203 226, 2002. F. Alireza and M. Greg. Action recognition by learning mid-level feature. In Proc.of IEEE Computer Vision and Pattern Recognition, 2008. D. Lowe. Distinctive image features from scaleinvariant keypoints. In International Journal of Computer Vision, pp. 91 110, 2004. B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proc. of International Joint Conference on Artificial Intelligence, pp. 674 679, 1981. G.Csurka, C.Bray, C.Dance, and L. Fan. Visual categorization with bags of keypoints. In Proc. of ECCV Workshop on Statistical Learning in Computer Vision, pp. 1 22, 2004. S.Konrad and G.Luc. Action snippets: How many frames does human action recognition require? In Proc.of IEEE Computer Vision and Pattern Recognition, 2008.