* A Multimodal Constellation Model for Generic Object Recognition Yasunori KAMIYA, Tomokazu TAKAHASHI,IchiroIDE, and Hiroshi MURASE Bag of Features (BoF) BoF EM 1. [1] Part-based Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya-shi, 464 8601 Japan Faculty of Economics and Information, Gifu Shotoku Gakuen University, 1 38 Nakauzura, Gifu-shi, 500 8288 Japan * 12 Bag of Features (BoF) [2] Fergus (constellation model) [3] BoF Bag of Words BoF SVM [4] [6] probabilistic Latent Semantic Analysis (plsa) Latent Dirichlet Allocation (LDA) Hierarchical Dirichlet Processes (HDP) [7] [9] (a) + [10] 1104 D Vol. J92 D No. 8 pp. 1104 1114 c 2009
(b) BoF codeword codeword BoF (c) BoF 2. 3. 4. 5. 1. 1 Fergus Weber [11] Weber [12] Weber Fergus BoF (b) Fergus [13] Fergus 2. Fergus 2. 1 Fergus [3] p(i Θ) = h H p(a, X, S, h Θ) = h H { p(a h,θ A)p(X h,θ X) } p(s h,θ S)p(h θ other ). I Θ Θ={θ A,θ X,θ S,θ other } I A X S R h I H h H p(a h,θ A) R p(x h,θ X) x, y 2R 1105
2009/8 Vol. J92 D No. 8 p(s h,θ S) R [3] ( h H) (p(a, X, S, h Θ)) Fergus 2. 2 { K L } p m(i Θ) = G(x l θ k,ˆrk,l ) π k = k k l { K L G(A l θ (A) k,ˆr k,l ) ˆr k,l =argmaxg(x l θ k,r ). r l } G(X l θ (X) k,ˆr k,l )G(S l θ (S) k,ˆr k,l ) π k K 2 k L I G() µ Σ Θ ={θ k,r,π k } θ = {µ, Σ} I = {x l } x =(A, X, S) θ k,r k r x l l A, X, S x π k k 0 π k 1 K k π k =1 ˆr k,l k l R 2. 3 Fergus 2 1. Σ (x µ) t Σ 1 (x µ) Σ Σ D D O(D 3 ) O(D) Σ σd 2 D (x µ) t Σ 1 1 (x µ) = (x d μ d ) 2 Σ = D d σ 2 d Σ Σ Σ Σ 0 Σ 2. Σ h H L l arg max r Fergus h H L R p(a, X, S, h Θ) O(L R ) A* L l arg max r O(LR) [14] Fergus 2 d σ 2 d 1106
Fig. 1 1 Model parameter estimation algorithm for the Multimodal Constellation Model. Fergus Σ h H L l (arg max r) Fergus Fergus Fergus h 1107
2009/8 Vol. J92 D No. 8 2. 4 EM [15] 1 N n x n,l n l ˆr k,n,l n ˆr k,l k l (1) µ Σ σ 2 µ Σ π 1 K EM µ, Σ n k q k,n µ, Σ µ, Σ ˆr k,n,l r l l:(ˆr k,n,l =r) 3. ĉ c ĉ =argmaxp c m(i Θ c)p(c) p(c) c 4. (Multi-CM) (Uni-CM) Uni-CM K =1 BoF LDA+BoF SVM+BoF LDA SVM BoF Multi-CM Uni-CM LDA+BoF SVM+BoF LDA K R Fergus 1. (b) BoF (c) BoF Multi-CM 4. 1 Caltech Database [3] Caltech PASCAL Visual Object Classes Challenge 2006 [16] Pascal = 1108
1 Caltech [3] Table 1 Number of object areas in Caltech [3]. Airplanes 1,074 Cars Rear 1,155 Faces 450 Motorbikes 826 2 Pascal [16] Table 2 Number of object areas in Pascal [16]. Bicycle 649 Bus 469 Car 1,708 Cat 858 Cow 628 Dog 845 Horse 650 Motorbike 549 Person 2,309 Sheep 843 2 Caltech [3] Fig. 2 Target images in Caltech [3]. 1 Caltech 4 1 2 Pascal 10 2 3 Cat Dog Person Pascal Caltech 10 K 5 3 Pascal [16] Fig. 3 Target images in Pascal [16]. R 21 4. 3 4. 4 Kadir Brady saliency detector KB detector [17] DCT (Discrete Cosine Transform) KB detector DCT 20 x A 20 X 2 S 1 23 4. 2 BoF Uni-CM Multi-CM LDA+BoF SVM+BoF 1 Caltech101 256 1109
2009/8 Vol. J92 D No. 8 Table 3 3 (%) Effectivity of multimodalization and comparison to related works, by average classification rates (%). LDA+BoF SVM+BoF Uni-CM Multi-CM Caltech 94.7 96.4 98.7 99.5 Pascal 29.6 27.9 37.0 38.8 Fig. 4 4 K Influence of K (number of components) on average classification rate. BoF codeword k-means k LDA LDA K 3 Caltech Pascal Multi- CM Uni-CM Caltech Face Pascal Bicycle 4. 5 LDA+BoF SVM+BoF BoF 4. 3 K K K 1 9 2 K K =1 Uni-CM K 2 Multi-CM R 21 4 Caltech K K =5 Pascal K =7 Pascal K Pascal Caltech K 2 K K =5 K 2 K =1 4. 4 R R R 3 21 3 R Multi-CM K 5 1110
Fig. 5 5 R Influence of R (number of regions) on average classification rate. Fig. 6 6 (Caltech) Example of groupings for each component of the model (Caltech). Each row shows each component. Uni-CM Multi-CM 5 R Caltech R =9 Pascal R =21 Pascal Caltech R R Multi-CM Uni-CM 4. 5 K 10 { L } l G(x l θ k,ˆrk,l ) π k 10 6 7 Caltech Cars Rear Motorbikes 1111
2009/8 Vol. J92 D No. 8 Fig. 7 7 (Pascal) Example of groupings for each component of the model (Pascal). Each row shows each component. Pascal Car DCT Motorbike Cow Cat 4. 6 Fergus Fergus Fergus 1 (L) 20 (R) 3 (K =1) 10 4 Caltech Pascal Fergus 1112
4 Fergus (%) L =20 R =3 K =1 Table 4 Comparison with Fergus s constellation model, by average classification rate (%). L = 20, R =3,(K = 1, proposed model only). Proposed model Fergus s model Caltech 93.0 71.1 Pascal 31.3 19.5 Table 5 5 (%) Validation of effectivity of continuous value expression and position-scale information, by average classification rate (%). LDA+BoF Multi-CM no-x,s Multi-CM Caltech 94.7 96.5 99.5 Pascal 29.6 33.5 38.8 Fergus 4. 7 [3] Fergus R =6 7 L =20 30 400 24 36 (K 2) R, L 4. 6 (R =3 L =20 K =1)1 1 Fergus 5 4. 8 1. (b) BoF (c) BoF (b) BoF LDA+BoF Multi-CM Multi-CM no-x,s (c) Multi-CM no-x,s Multi-CM 5 3 LDA+BoF Multi-CM no-x,s Multi- CM no-x,s Multi-CM 5. BoF Fergus K R [1] vol.48, no.sig 16 (CVIM 19), pp.1 24, 2007. [2] G. Csurka, C.R. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Proc. ECCV International Workshop on Statistical Learning in Computer Vision, pp.1 22, 2004. [3] R. Fergus, P. Perona, and A. Zisserman, Object class recognition by unsupervised scale-invariant learning, Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol.2, pp.264 271, 2003. 1113
2009/8 Vol. J92 D No. 8 [4] K. Grauman and T. Darrell, The pyramid match kernel: Discriminative classification with sets of image features, Proc. IEEE Int. Conf. on Computer Vision, vol.2, pp.1458 1465, 2005. [5] M. Varma and D. Ray, Learning the discriminative power-invariance trade-off, Proc. IEEE Int. Conf. on Computer Vision, 2007. [6] J. Zhang, M. Marszalek, S. Lazebnik, and C. Schmid, Local features and kernels for classification of texture and object categories: A comprehensive study, Int. J. Comput. Vis., vol.73, no.2, pp.213 238, 2007. [7] A. Bosch, A. Zisserman, and X. Munoz, Scene classification via plsa, Proc. European Conf. on Computer Vision, vol.4, pp.517 530, 2006. [8] L. Fei-Fei and A.P. Perona, A Bayesian hierarchical model for learning natural scene categories, Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol.2, pp.524 531, 2005. [9] G. Wang, Y. Zhang, and L. Fei-Fei, Using dependent regions for object categorization in a generative framework, Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol.2, pp.1597 1604, 2006. [10] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. [11] M. Weber, M. Welling, and P. Perona, Unsupervised learning of models for recognition, Proc. European Conf. on Computer Vision, vol.1, pp.18 32, 2000. [12] M. Weber, M. Welling, and P. Perona, Towards automatic discovery of object categories, Proc. European Conf. on Computer Vision, vol.2, pp.101 108, 2000. [13] R. Fergus, P. Perona, and A. Zisserman, A sparse object category model for efficient learning and exhaustive recognition, Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, vol.1, pp.380 387, 2005. [14] X. Ma and W.E.L. Grimson, Edge-based rich representation for vehicle classification, Proc. IEEE Int. Conf. on Computer Vision, vol.2, pp.1185 1192, 2005. [15] A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Statistical Society, Series B, vol.39, no.1, pp.1 38, 1977. [16] M. Everingham, A. Zisserman, C.K.I. Williams, and L. Van Gool, The PASCAL Visual Object Classes Challenge 2006 (VOC2006) results, http://www.pascal-network.org/challenges/voc/ voc2006/results.pdf. [17] T. Kadir and M. Brady, Saliency, scale and image description, Int. J. Comput. Vis., vol.45, no.2, pp.83 105, 2001. 20 10 10 21 2 25 17 19 21 MMM2009 9 12 15 2 COE 17 3 20 6 8 12 16 19 14 16 17 19 IRISA IEEE Computer Society ACM 53 55 NTT 4 1 15 60 6 IEEE-CVPR 7 8 IEEE-ICRA 13 13 14 15 16 IEEE Trans MM IEEE 1114