thesis.dvi

Size: px

Start display at page:

Download "thesis.dvi"

よしおとべ
5 years ago
Views:

1 2007

2 Graph Cuts Graph Cuts Graph Cuts Graph Cuts t-link Interactive Graph Cuts 4.7% Mean Shift Segmentation

3 i

4 5 SIFT Bag of Keypoints Bag of Keypoints A Mean Shift 51 A.1 (Kernel Density Estimation) A.2 (Density Gradient Estimation) B Scale-Invariant Feature Transform 55 B B.1.1 LoG B.1.2 Difference-of-Gaussian B.1.3 σ B.1.4 k B.1.5 DoG B B ii

5 B B B B B.5 SIFT C 69 iii

7 s-t cut Lazy Snapping ( [1] ) Grabcut ( [2] ) n-link t-link λ Bag of Keypoints Affine Invariant keypoint SIFT (a)affine Invariant keypoint (b) (c) 8 ( [3] ) GMM Mean-Shift SIFT v

8 A.1 (A.13) A.2 Mean Shift ( [4] ) B.1 LoG B.2 DoG B.3 σ B.4 s =2 DoG B B.6 DoG B B B B B B.12 SIFT vi

9 [%] [%] [s] [%] [%] ( [5] ) [%] (α =0.1) vii

11 1 Snake [6] Level Sets[7], Graph Cuts [1, 2, 8, 9, 10, 11] Snake Level Sets Graph Cuts Graph Cuts Boykov Inreractive Graph Cuts [9, 10] Interactive Graph Cuts minimum cut/maximum flow algorithm Interactive Graph Cuts Lazy Snapping[1] Grab Cut[2] Graph Cuts n-link Graph Cuts t-link 1

12 Mean Shift SIFT 2

13 2 2.1 Boykov minimum cut/maximum flow algorithms [8] minimun cut minimum cut max flow algorithm minimum cut/maximum flow algorithms image restoration [12, 13, 14, 15], stereo and motion [16, 17, 18, 19], image segmentation [9, 20, 21], multi-camera reconstruction [22] P p L p depth 2.1 E(L) E(L) = D p (L p )+ p P V (p,q) (L p,l q ) (2.1) (p,q) N 3

14 2.1: N D p (L p ) V (p,q) (L p,l q ) E Potts Interaction Energy Model Liner Interaction Energy Model 2 Potts Interaction Energy Model Potts Interaction Energy Model E(I) = I p Ip + p P K (p,q) T (I p I q ) (2.2) (p,q) N I = {I p p P } I = {Ip p P } K(p, q) T (I p,i q ) I p I q I p I q 1, 0 K(p, q) p q T (I p,i q ) 0 Potts Energy 2 max flow NP hard 4

15 Liner Interaction Energy Model Liner Interaction Energy Model. E(I) = I p Ip + p P A (p,q) I p I q (2.3) (p,q) N Potts Model A (p,q) K (p,q) A (p,q) p q Potts Energy 2.2 E min-cut/max-flow algorithm Graph Cuts Algorithm min-cut/max-flow algorithm min-cut/max-flow algorithm 2.2: G =(E,V ) V (node) E (edge) (source)s V (sink)t V n-link s t t-link n-link 5

16 t-link n-link t-link (2.1) V p,q D p 2.3(a) 2.3: s S t T = V S 2 s-t S T s-t s-t s-t 2.4: s-t cut Ford-Fulkerson Method[23] Push- Relabel Method[24] Ford-Fulkerson Method 6

17 0 s t Push-Relabel Method s Potts Interaction Energy Model E(L) = λ R(L)+B(L) (2.4) R(L) = p P R p (L p ) (2.5) B(L) = δ(l p,l q ) = {p,q}inn { B {p,q} δ(l p,lq) (2.6) 1 if L p L q 0 otherwise (2.7) R(L) B(L) λ R(L) E(L) : edge weight(cost) for {p, q} B { p, q} {p, q} N λ R p ( bkg ) p P, p /O B {p, S} K p O 0 p B λ R p ( obj ) p P, p /O B {p, T } 0 p O K p B R p ( obj ) = ln Pr(I p O) (2.8) R p ( bkg ) = ln Pr(I p B) (2.9) 7

18 B {p,q} exp ( (I ) p I q ) 2 1 (2.10) 2σ 2 dist(p, q) K = 1 + max B {p,q} (2.11) p P q:{p,q} N O, B, I p p O, B O, B seed Pr(I p O), Pr(I p B) seed t-link dist(p, q) p, q mincut/max-flow algorithm Li Lazy Snapping [1] Lazy Snapping 2.5 Rother GrabCut [2] GrabCut 2.5: Lazy Snapping ( [1] ) GMM Graph Cuts 2.6 8

図 2.6: Grabcut によるセグメンテーション例 (文献 [2] より) 2.2.3 従来法の問題点従来法である Interactive Graph Cuts[9], [20] では seed が与えられていない t-link のエッジコストには seed の色分布から得られる尤度を利用して計算するか 0 とする尤度の計算を行う場合 n-link に比べ t-link

19 図 2.6: Grabcut によるセグメンテーション例 (文献 [2] より) 従来法の問題点従来法である Interactive Graph Cuts[9], [20] では seed が与えられていない t-link のエッジコストには seed の色分布から得られる尤度を利用して計算するか 0 とする尤度の計算を行う場合 n-link に比べ t-link が大きくなると色分布による影響が強くなり突発的な誤検出が多くなる場合があるそのためこのような誤検出を抑制するためには λ によって n-link の影響を強くする必要がある n-link の影響が強いと画像中のエッジ情報に対しての依存性が強くなるそのため図 2.7 に示すように Interactive Graph Cuts では画像に複雑なエッジが存在する場合局所的なエッジを乗り越えてセグメンテーションすることが困難となる図 2.7: 複雑なエッジを含む画像でのグラフカットによるセグメンテーション 9

21 3 Graph Cuts : seed σ l σ σ Graph Cuts Graph Cuts GMM(Gaussian Mixture Model) GMM t-link Graph Cuts σ<1 σ =0 σ = α σ Graph 11

22 Cut 0 <α<1 Step1. seed Step2. σ Step3. Step4. Graph Cuts Step5. Step6. σ<1 σ =0 σ = α σ(0 <α<1) Step : 12

23 3.2 I G(σ) L(σ) L(σ) =G(σ) I (3.1) σ σ I 1 σ 1 2 L 1 (σ) I I 2 I 2 σ =1 L 2 (σ) L 1 (σ) L 2 (σ) L 1 (2σ) =.. L 2 (σ) (3.2) σ 1 2 σ σ : 13

24 3.3 σ Graph Cuts Graph Cuts t-link (2.8), (2.9) 1 Graph Cuts t-link R p ( obj ) = ln Pr(O I p) (3.3) R p( bkg ) = ln Pr(B I p ) (3.4) Pr(O I p ) Pr(B I p ) (3.5), (3.6) Pr(O I p ) = Pr(O)Pr(I p O) Pr(I p ) Pr(B I p ) = Pr(B)Pr(I p B) Pr(I p ) (3.5) (3.6) Pr(I p ) Pr(O) Pr(B) Pr(I p O), Pr(I p B) Pr(O), Pr(B) t-link : Pr(I p O), Pr(I p B) GMM (Gaussian Mixture Model) [25] RGB 3 14

25 Pr(I p ) = p(i p µ, Σ) = K α i p i (I p µ i, Σ i ) (3.7) i=1 1 (2π) 3/2 Σ ( 1/2 ) 1 exp 2 (I p µ) T Σ 1 (I p µ) (3.8) GMM EM [26] GMM GMM (3.7) I p Pr(I p O) Pr(I p B) Graph Cuts Pr(O), Pr(B) d d obj d bkg { Pr(O) = d obj if d obj d bkg 1 d bkg if d obj <d bkg (3.9) Pr(B) = 1 Pr(O) (3.10) 3.4 GMM Pr(I p O), Pr(I p B) Pr(O), Pr(B) (3.3) Pr(O I p ) (3.4) Pr(B I p ) Pr(O I p ) (3.3) {p, T } t-link Pr(B I p ) (3.4) {p, S} t-link 3.2 Graph Cuts 15

26 (b) Interactive Graph Cuts[9] 3.5(c) n-link σ =0 n-link 3.5(c) n-link t-link Pr(O), Pr(B) Pr(I p O), Pr(I p B) 1 3.5(d) 3.5(e) Pr(O), Pr(B) Pr(I p O), Pr(I p B) t-link GrabCut 1 50 seed Interactive Graph Cuts [9] GrabCut[2] O = {O 1,O 2,...,O p,...,o P }, B = {B 1,B 2,...,B p,...,b P } L = {L 1,L 2,...,L p,...,l P } over segmentation under segmentation over seg. = under seg. = p P δ(l p B p ) P p P δ(l p O p ) P (3.11) (3.12) Graph Cuts λ Interactive Graph Cuts λ =0.005 seed t-link Grabcut λ = i3l/segmentation/grabcut.htm 16

27 3.5: n-link t-link % 2% 2% % t-link 17

28 3.1: [%] Interactive GrabCut[2] Graph Cuts[9] over seg under seg total (26 ) (24 ) 3.2: [%] Interactive GrabCut[2] Graph Cuts[9] over seg under seg total over seg under seg total (err) (3.11) over segmentation (3.12) under segmentation λ (2.1) t-link n-link λ Interactive Graph Cuts Interactive Graph Cuts [9] λ λ =0.005 n-link Interactive Graph Cuts λ t-link Interactive Graph Cuts 18

29 図 3.6: セグメンテーション例と誤検出率は誤検出領域が多いが提案手法では安定して物体領域を抽出できていることがわかるこれは従来法では t-link に色情報のみが使われているため物体領域の色に似ている色の背景領域が誤検出される提案手法では繰り返し処理における 1 つ前の Graph Cuts の結果から大まかな物体領域と背景領域の形を捉えた t-link が得られるため λ を大きくした場合でも突発的な誤検出を抑制することができるこのことから提案手法は λ を変化させても安定したセグメンテーション結果を得ることができる処理時間について従来法と提案手法の処理時間の計測を行う処理時間計測には画像サイズを 150x113, 300x225, 600x450 の 3 種類を用いる使用した PC は Intel(R) Xeon 2.66GHz 8 メモリ 4.0GB である表 3.3 に各手法での処理時間と繰り返し回数を示す表 3.3 より提案手法は繰り返し処理が入るため処理時間が大幅に増加する解決策として繰り返し回数の軽減や Lazy Snapping [1] のようにスーパーピクセルからグラフを作成することによるグラフカットの高速化が考えられる 19

30 3.7: λ 3.3: [s] Interactive Graph Cuts[9] GrabCut[2] 150x (4 ) 9.81 (8 ) 300x (3 ) (10 ) 600x (12 ) (12 ) % 4.79% λ 20

31 4 4.1 Interactive Graph Cuts [9] n-link t-link Graph Cuts Mean Shift Segmentation [4] x i = {x s i, x t i, x r i } z i L i {y j } j=1,2,... ( n i=1 x x x ig i 2) h y j+1 = ( n i=1 g x x i 2) (4.1) h g(x) = ( C x s 2) ( h 2 sh t h p k x t 2) ( k x r 2) k (4.2) r h s h t h r 21

32 4.1: h s,h t,h r C k(x) Mean Shift Segmentation 1. ean Shift z i = y i,c 2. z i {C p } p=1,...,m 3. L i = {p z i C p } 4. M pixel Mean Shift Segmentation h s

33 4.2: 4.3: 4.3 seed 10 ( ) Mean Shift ( ) : [%] over segmentation under segmentation total % Mean Shift Segmentation 23

34 4.4: % Mean Shift Segmentation 24

35 5 SIFT SIFT(Scale Invariant Feature Transform) 5.1 Bag of Keypoints Bag of Keypoints [3] Bag of Keypoints Bag of Keypoints Bag of Word Bag of Word Bag of Keypoints (keypoint, visual work, visual term) Bag of Keypoints 5.1 Bag of Keypoints Bag of Keypoints 25

36 5.1: Bag of Keypoints Bag of Keypoints SIFT SIFT SIFT 26

37 SIFT SIFT SIFT 2 Interest point detector SIFT Quelhas [27] SIFT [28] DoG Csurka [3] Affine Invariant keypoint [29] SIFT : Affine Invariant keypoint SIFT (a)affine Invariant keypoint (b) (c) 8 ( [3] ) Regular grid SIFT Fei-Fei [5] SIFT 13 DoG 5.1: [%] ( [5] ) Descriptor Grid Random DoG 11x11 pixel N/A 128-dim SIFT Bag of Keypoints 27

38 SVM Naive Bayes plsa LDA plsa LDA Bag-of-Word Naive Bayes Naive Bayes w = {w 1,w 2,...,w n }, c Naive Bayes c c = arg max p(c w) p(c)p(w c) =p(c) c n p(w n c) (5.1) plsa plsa (Probabilistic Latent Semantic Analysis) [30] 2 d D = {d 1,...,d N }, w W = {w 1,...,w M }, z Z = {z 1,...,z K } p(d, w) =p(d) p(w z)p(z d) (5.2) z Z p(d, w) = z Z p(z)p(d z)p(z w) (5.3) LDA plsa Sivic [31] plsa Translation and Scale invariant plsa (TSI-pLSA) 28

39 5.3: 5.2 SIFT [32, 33] (u, v) I x i = {u i,v i,i i } T Φ = {α j, φ j =(µ j, Σ j )} c j=1 x (5.4) EM(DAEM:Deterministic Annealing EM) [34] Φ ML Φ ML = arg max Φ c (α j p j (x µ j, Σ j )) β j=1 1 p(x µ j, Σ j ) = (2π)3 Σ j { exp 1 } 2 (x µ j) T Σ 1 j (x µ j ) (5.4) p j (x µ j, Σ j ) µ j Σ j φ j = {µ j, Σ j } β DAEM β EM α j α j > 0 c j=1 α j = Φ ML 3 2 (u, v) 29

40 5.4: GMM Φ ML x φ i C i = arg max p i (x φ i ) (5.5) i 5.4(c) Mean-Shift [4] 5.5 x i = {u i,v i,i i } T 3 30

5.5: Mean-Shift 5.2.2 SIFT descriptor [28] SIFT Descriptor SIFT descriptor L(x, y) θ(x, y) m(x, y) m(x, y) = f x (x, y) 2 + f y (x, y) 2 (5.6) ( ) θ(x, y) = tan 1 fy (x, y) (5.

41 5.5: Mean-Shift SIFT descriptor [28] SIFT Descriptor SIFT descriptor L(x, y) θ(x, y) m(x, y) m(x, y) = f x (x, y) 2 + f y (x, y) 2 (5.6) ( ) θ(x, y) = tan 1 fy (x, y) (5.7) f x (x, y) f x (x, y) = L(x +1,y) L(x 1,y) (5.8) f y (x, y) = L(x, y +1) L(x, y 1) (5.9) m θ w(x, y) = G(x, y, σ) m(x, y) (5.10) h θ = w(x, y) δ[θ, θ(x, y)] (5.11) x y G(x, y, σ) θ

42 5.6: 5.7: SIFT 4x4 8 4x SIFT SIFT SIFT LBG SIFT

43 5.8: 5.3 k-nn :

44 5.10: N = {n 1,...,n 4 } T E = {e 1,...,e 6 } T T = {N 1,...,N 4, e 1,...,e 6 } T } T X cost(t, X) = 4 n t i nx i + i=1 6 j=1 e t j ex j (5.12) T X T X T X Cost(T, X) = min i {cost(t, X i )} (5.13) Cost l Cost g α Cost = α Cost l +(1 α) Cost g (5.14) (0 α 1) knn 34

45 (SH) (HG) (BK) (VH) : α % 5.3 α =0.1 (α =0) (a)(b) (e)(f) (c)(d) (g)(h)

46 5.2: [%] α SH HG Class BK VH : (α =0.1) out SH HG BK VH correct rate[%] SH HG in BK VH Caltech database bok1 [3] Bag of keypoints SIFT bok2 [35] proposed method GMM 1 Datasets/Caltech256/ 36

47 bok1 17.6% bok2 5.6% 45 bok1 bok GMM 5.16 GMM bok2 bok2 5.5 SIFT 3.2% Bag of keypoints 17.6% 37

48 5.12: 38

49 5.13: 5.14: 39

50 5.15: 5.16: 40

51 % λ 4 Mean Shift Segmentation 4.23% 5 3.2% Bag of keypoints 17.6% Mean Shift Segmentation 41

53 43

55 [1] Y. Li, J. Sun, C.-K. Tang and H.-Y. Shum: Lazy snapping, ACM Trans. Graph., 23, 3, pp (2004). [2] C. Rother, V. Kolmogorov and A. Blake: grabcut : interactive foreground extraction using iterated graph cuts, ACM Trans. Graph., 23, 3, pp (2004). [3] C. Dance, J. Willamowski, L. Fan, C. Bray and G. Csurka: Visual categorization with bags of keypoints, ECCV International Workshop on Statistical Learning in Computer Vision (2004). [4] D. Comaniciu and P. Meer: Mean shift: A robust approach toward feature space analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 5, pp (2002). [5] F.-F. Li and P. Perona: A bayesian hierarchical model for learning natural scene categories, CVPR 05: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 05) - Volume 2, Washington, DC, USA, IEEE Computer Society, pp (2005). [6] A. W. Michael Kass and D. Terzopoulos: Snakes: Active contour models, Int. J. Computer Vision, 1, 4, pp (1988). [7] M. Sussman, P. Smereka and S. Osher: A level set approach for computing solutions to incompressible two-phase flow, J. Comput. Phys., 114, 1, pp (1994). [8] Y. Boykov and V. Kolmogorov: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 9, pp (2004). [9] Y. Boykov and M.-P. Jolly: Interactive graph cuts for optimal boundary & region segmentation of objects in n-d images, ICCV2001, 01, p. 105 (2001). [10] Y. Boykov and G. Funka-Lea: Graph cuts and efficient n-d image segmentation, Int. J. Comput. Vision, 70, 2, pp (2006). 45

56 [11] ( ),. CVIM ( ), 31, pp (2007). [12] D. Greig, B. Porteous and A. Seheult: Exact maximum a posteriori estimation for binary images, J. Royal Statistical Soc., Series B, 51, 2, pp (1989). [13] Y. Boykov, O. Veksler and R. Zabih: Fast approximate energy minimization via graph cuts, Proc. IEEE Trans. Pattern Analysis and Machine Intelligence, 23, 11, pp (2001). [14] Y. Boykov, O. Veksler and R. Zabih: Markov random fields with efficient approximations, Technical Report TR (1997). [15] H. Ishikawa and D. Geiger: Segmentation by grouping junctions, CVPR 98: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, IEEE Computer Society, p. 125 (1998). [16] H. Ishikawa and D. Geiger: Occlusions, discontinuities, and epipolar lines in stereo, ECCV 98: Proceedings of the 5th European Conference on Computer Vision-Volume I, London, UK, Springer-Verlag, pp (1998). [17] J. Kim, V. Kolmogorov and R. Zabih: Visual correspondence using energy minimization and mutual information, ICCV 03: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, IEEE Computer Society, p (2003). [18] V. Kolmogorov and R. Zabih: Computing visual correspondence with occlusions via graph cuts, ICCV, pp (2001). [19] M. H. Lin and C. Tomasi: Surfaces with occlusions from layered stereo, cvpr, 01, p. 710 (2003). [20] Y. Boykov and G. Funka-Lea: Graph cuts and efficient n-d image segmentation, Int. J. Comput. Vision, 70, 2, pp (2006). [21] Y. Boykov and V. Kolmogorov: Computing geodesics and minimal surfaces via graph cuts, ICCV 03: Proceedings of the Ninth IEEE International Conference on Computer Vision, Washington, DC, USA, IEEE Computer Society, p. 26 (2003). [22] V. Kolmogorov and R. Zabih: Multi-camera scene reconstruction via graph cuts, ECCV 02: Proceedings of the 7th European Conference on Computer Vision-Part III, London, UK, Springer-Verlag, pp (2002). 46

57 [23] L. Ford and D. Fulkerson: Flow in Networks (1962). [24] A. V. Goldberg and R. E. Tarjan: A new approach to the maximum flow problem, STOC 86: Proceedings of the eighteenth annual ACM symposium on Theory of computing, New York, NY, USA, ACM, pp (1986). [25] C. Stauffer and W. E. L. Grimson: Adaptive background mixture models for realtime tracking, Proceedings of the IEEE Computer Science Conference on Computer Vision and Pattern Recognition (CVPR-99), Los Alamitos, IEEE, pp (1999). [26] A. P. Dempster, N. M. Laird and D. B. Rubin: Maximum likelihood from incomplete data via the em algorithm, Journal of the Royal Statistical Society. Series B (Methodological), 39, 1, pp (1977). [27] P. Quelhas, F. Monay, J.-M. Odobez, D. Gatica-Perez, T. Tuytelaars and L. V. Gool: Modeling scenes with local descriptors and latent aspects, ICCV 05: Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 05) Volume 1, Washington, DC, USA, IEEE Computer Society, pp (2005). [28] D. G. Lowe: Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vision, 60, 2, pp (2004). [29] K. Mikolajczyk and C. Schmid: An affine invariant interest point detector, ECCV (1), pp (2002). [30] T. Hofmann: Unsupervised learning by probabilistic latent semantic analysis, Mach. Learn., 42, 1/2, pp (2001). [31] J. Sivic and A. Zisserman: Video Google: A text retrieval approach to object matching in videos, Proceedings of the International Conference on Computer Vision, Vol. 2, pp (2003). [32] M. Seki, K. Sumi, H. Taniguchi and M. Hashimoto: Gaussian mixture model for object recognition, MIRU2004, 1, pp (2004). [33] H. Nami, S. Makito, O. Haruhisa and H. Manabu: Vehicle detection using gaussian mixture model from ir image, Technical report of IEICE. PRMU, 105, 62, pp (2005). [34] N. Ueda and R. Nakano: Deterministic annealing em algorithm, Neural Netw., 11, 2, pp (1998). 47

58 [35] S. Lazebnik, C. Schmid and J. Ponce: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, CVPR 06: Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington, DC, USA, IEEE Computer Society, pp (2006). [36] J. J. Koenderink: The structure of images, Proc. of Biological Cybernetics, 50, pp (1984). [37] T. Lindeberg: Scale-space theory: A basic tool for analysing structures at different scales, J. of Applied Statistics, 21(2), pp (1994). [38] D. G. Lowe: Object recognition from local scale-invariant features, Proc. of the International Conference on Computer Vision ICCV, Corfu, pp (1999). 48

59 [1],,.,, Vol 22, 2008( ) [1] S. Shimizu, T. Nagahashi, and H. Fujiyoshi. Robust and Accurate Detection of Object Orientation and ID without Color Segmentation, Proc. on ROBOCUP2005 SYMPOSIUM, [2] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. Object Type Classification Using Structure-based Feature Representation, MVA2007 IAPR Conference on Machine Vision Applications, pp , May, [3] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing, Asian Conference on Computer Vision 2007, Part II, LNCS 4844, pp , [1],,. ID, 21 SIG-Challenge, pp , May, [2],,., CVIM 154, pp ,

60 [3],,. SIFT, SC-07-8, pp39-44, Jan [4],,., 10 (MIRU2007), pp , Jul, [1],,.,, O-455, Sep, [1] MIRU2007 [2]

61 A Mean Shift A.1 (Kernel Density Estimation) n d {x i } i=1,...,n (Kernel Density Estimation Method) Parzen ˆf h,k (x) = 1 nh d n ( ) x xi K h i=1 (A.1) (A.1) K x ˆf h,k (x) h h K(x) (A.1) K(x) = c k,d k( x 2 ) (A.2) c k,d K(x) Normal Kernel k N (x) = exp ( 12 ) x x 0 (A.3) K N (x) = (2π) d/2 exp ( 12 ) x 2 (A.4) Epancechnikov Kernel k E (x) = K E (x) = { 1 x 0 x 1 0 x>1 { 1 2 c 1 d (d + 2)(1 x 2 ) x 1 0 otherwise (A.5) (A.6) 51

62 (A.1) ˆf h,k (x) = c k,d nh d ( n x x i 2) k (A.7) h i=1 A.2 (Density Gradient Estimation) ˆf(x) ˆf(x) f(x) 0 (A.7) ˆ f h,k (x) ˆ f h,k (x) ˆf h,k (x) = 2c k,d nh d+2 n i=1 (x x i )k ( x x i h k(x) k (x) 2) (A.8) g(x) = k (x) (A.9) G(x) = c g,d g ( x 2) (A.10) c g,d (A.9) (A.8) ( B ) ˆ f h,k (x) = 2c k,d nh d+2 = 2c k,d nh d+2 ( n x x i (x i x)g h i=1 ( [ n ( x x i )] n 2 i=1 x ig g h i=1 2 ) x x i h n i=1 g ( x x i h 2 ) x (A.11) 2 ) (A.7) G x ˆf h,g (x) ˆf h,g (x) = c g,d nh d Mean Shift Vector m h,g (x) m h,g (x) = ( n x x i 2) g (A.12) h i=1 n i=1 x ig ( x x i h n i=1 g ( x x i h 2 ) ) x (A.13) 2 52

63 (A.12), (A.13) (A.11) ˆ f h,k (x) = ˆf h,g (x) 2c k,d h 2 c g,d m h,g (x) (A.14) m h,g (x) = 1 2 h2 c ˆ f h,k (x) ˆf h,g (x) (A.15) c (A.15) G Mean Shift Vector m h,g (x) K Mean Shift Vector Mean Shift Vector Mean Shift Vector (A.13) Mean Shift Vector m h,g (x) A.1 A.1: (A.13) { y j }j=1,2,... y j+1 = n i=1 x ig ( y x j i h n i=1 g ( y j x i h 2 ) ) (A.16) 2 y j y j+1 Mean Shift Vector A.2 Mean Shift 53

64 A.2: Mean Shift ( [4] ) 54

65 B Scale-Invariant Feature Transform SIFT [28] SIFT ( ) (detection) { (description) 2 1. detection 2. { 3. description DoG B.1 1 DoG B.1.1 LoG Koenderink[36] Lindeberg[37] Lindeberg Scale-normalized Laplacian-of-Gaussian( LoG ) LoG σ (B.1) LoG ( B.1) 55

66 B.1: LoG ) LoG = f(σ) = x2 + y 2 2σ 2 exp ( x2 + y 2 2πσ 6 2σ 2 (B.1) σ x y LoG Lowe[38] Differenceof-Gaussian(DoG) DoG LoG G σ = σ 2 G (B.2) G 2 (LoG) (B.2) G σ G(x, y, kσ) G(x, y, σ) kσ σ (B.3) (B.2) (B.3) σ 2 G = G σ G(x, y, kσ) G(x, y, σ) kσ σ (B.4) (k 1)σ 2 2 G G(x, y, kσ) G(x, y, σ) (B.5) σ 2 2 G LoG (B.5) DoG LoG LoG DoG SIFT DoG 56

67 B.1.2 Difference-of-Gaussian G(x, y, σ) I(u, v) L(u, v, σ) (DoG ) L(u, v, σ) = G(x, y, σ) I(u, v) (B.6) ) 1 G(x, y, σ) = ( 2πσ exp x2 + y 2 (B.7) 2 2σ 2 DoG D(u, v, σ) DoG D(u, v, σ) = (G(x, y, kσ) G(x, y, σ)) I(u, v) = L(u, v, kσ) L(u, v, σ) (B.8) σ 0 k B.2 DoG σ SIFT σ B.2: DoG B.1.3 σ σ B.3 σ 0 L 1 (σ 0 ) σ 0 k kσ 0 57

68 L 1 (kσ 0 ) σ 1 1 2σ 0 L 1 (2σ 0 ) 1/2 1 k 1/2 L 2 (σ 0 ) 2σ 0 L 1 (2σ 0 ) L 1 (2σ 0 ) L 2 (σ 0 ) (B.9) σ B.3: σ 58

69 B.1.4 k σ k 1 s 1 σ 0 2σ 0 σ k k =2 1/s B.4 DoG s =2( 2) k =2 1/2 = 2 DoG 3 1 s s +2 DoG s +2 DoG s +3 1 s +3 [28] s=3 σ 0 = / B.4: s =2 DoG 59

70 B.1.5 DoG DoG DoG σ DoG B.5 DoG 3 DoG ( B.5 ) ( B.5 ) 26 ( B.5 ) σ DoG DoG DoG B.6 DoG ( ) B.6(a) 2 (b) DoG B.6(a) DoG σ 1 B.6(b) σ 2 σ 2 =2σ 1 2 DoG σ 2 SIFT σ B.5: B.2 B.1 DoG (low contrast) 60

71 B.6: DoG B H [ ] H = D xx D xy (B.10) D xy D yy DoG 2 1 α 2 β(α >β) Tr(H) Det(H) Tr(H) = D xx + D yy = α + β (B.11) Det(H) = D xx D yy (D xy ) 2 = αβ (B.12) γ 1 2 α = γβ Tr(H) 2 Det(H) = (α + β)2 αβ = (γβ + β)2 γβ 2 = (γ +1)2 γ (B.13) 61

B.7: α β Tr(H) 2 Det(H) < (γ th +1) 2 γ th (B.14) (B.14) γ th [28] γ th =10 12.1 B.7(a) B.7(b) B.2.2 3 (x, y, σ) 2 x =(x, y, σ) T DoG D(x) D(x) =D + D T x + 1 2 D x 2 xt x x 2 (B.

72 B.7: α β Tr(H) 2 Det(H) < (γ th +1) 2 γ th (B.14) (B.14) γ th [28] γ th = B.7(a) B.7(b) B (x, y, σ) 2 x =(x, y, σ) T DoG D(x) D(x) =D + D T x D x 2 xt x x 2 (B.15) (B.15) x 0 D x + 2 D x ˆx = 0 2 (B.16) 62

73 ˆx ( ) 2 D ˆx = D x2 x (B.17) 2 D x 2 2 D xy 2 D xσ 2 D xy 2 D y 2 2 D yσ 2 D xσ 2 D yσ 2 D σ 2 x y σ = D x D y D σ (B.18) (B.18) ˆx x y σ = 2 D x 2 2 D xy 2 D xσ 2 D xy 2 D y 2 2 D yσ 2 D xσ 2 D yσ 2 D σ 2 1 D x D y D σ (B.19) (B.19) ˆx =(x, y, σ) B.2.3 DoG (B.19) ˆx = 2 D 1 D x 2 x (B.20) (B.20) (B.15) D(ˆx) =D T D ˆx x (B.21) D DoG ˆx (B.21) DoG DoG [28] 0.03 DoG ( ) B.7(c) 63

74 B.3 2 L(u, v) m(u, v) θ(u, v) { m(u, v) = f u (u, v) 2 + f v (u, v) 2 (B.22) θ(u, v) = tan 1 f v(u, v) (B.23) f u (u, v) f u (u, v) =L(u +1,v) L(u 1,v) f v (u, v) =L(u, v +1) L(u, v 1) (B.24) m(x, y) θ(x, y) B.8 h h θ = w(x, y) δ [θ,θ(x, y)] (B.25) x y w(x, y) = G(x, y, σ) m(x, y) (B.26) h θ 36 w(x, y) (x, y) G(x, y, σ) m(x, y) δ Kronecker θ(x, y) θ % B.8 1 B B.4 SIFT descriptor 128 B.10 64

75 B.8: ( B.10 ) 4 16 B.11 8 (45 ) B = =

76 B.9: 2 B.10: B.5 SIFT SIFT SIFT JPEG 5 1 B.12 SIFT (128 ) B.12 B.12(b), (c), (d), (e) JPEG ( ) B.12(f) Mikolajczyk 66

77 B.11: SIFT [29] 67

78 B.12: SIFT 68

79 C [1],,.,, Vol 22, 2008( ) [1] S. Shimizu, T. Nagahashi, and H. Fujiyoshi. Robust and Accurate Detection of Object Orientation and ID without Color Segmentation, Proc. on ROBOCUP2005 SYMPOSIUM, [2] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. Object Type Classification Using Structure-based Feature Representation, MVA2007 IAPR Conference on Machine Vision Applications, pp , May, [3] Tomoyuki Nagahashi, Hironobu Fujiyoshi, and Takeo Kanade. Image Segmentation Using Iterated Graph Cuts Based on Multi-scale Smoothing, Asian Conference on Computer Vision 2007, Part II, LNCS 4844, pp , [1],,. ID, 21 SIG-Challenge, pp , May, [2],,., CVIM 154, pp , [3],,. SIFT, SC-07-8, pp39-44, Jan

80 [4],,., 10 (MIRU2007), pp , Jul, [1],,.,, O-455, Sep,

IPSJ SIG Technical Report 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai,

IPSJ SIG Technical Report 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai, 1,a) 1,b) 1,c) 1,d) 2,e) 2,f) 2,g) 1. [1] [2] 2 [3] 1 599 8531 1 1 Osaka Prefecture University 1 1, Gakuencho, Naka, Sakai, Osaka 599 8531, Japan 2 565 0871 Osaka University 1 1, Yamadaoka, Suita, Osaka