27 1 30
i ( ) (RF: Relevance Feedback) RF 1 Regularized Nearest Points(RNP) RF 2 RF RNP
ii RF RNP 2 RF (+1) (-1) 2 RNP 2 RF RF 33.6 2 SVM RF 2.7
Specific Person Image Retrieval by Relevance Feedback using outlier robust set-to-set distance Abstract iii Kazuhisa TAKAGI In this study, we focus on the Specific Person Image Retrieval, which is the system that shows a user all the images of a target person from a gallery database. We propose a method to apply an outlier robust set-to-set distance to Track based Specific Person Image Retrieval with Relevance Feedback. In Track based Specific Person Image Retrieval, firstly we register person images, each of which is the bounding box of a person s body in each frame, on a gellery database by the tracklet. Here, a tracklet is a set of person images shot while one person is going through the viewing field of each camera. A user input a tracklet as the query, and the system calculates similarity between the query tracklet and each tracklet in the database by tracklet s feature, which is a set of features of person images in a tracklet. The system shows several candidate tracklets to the user in decreasing order of similarity to the query tracklet. Then we apply Relevance Feedback (RF). In RF, a user tells the system whether the shown tracklets are the same person s as the query or not. The system optimizes the query feature or the similarity measure using the training samples which consist of the person s data and the feedbacks about them, and shows the next candidate tracklets from non-presented person s data to the user. In making gallery database, person images are usually extracted automatically. In this case, we often get erroneous person images such as misalignment person s bounding boxes, wrong detections and person images with hidden body by obstacles. We call these kind of erroneous person images as outlier images. Outlier images cause the problem that the similarity between the same persons tracklets becomes smaller than the similarity between different persons ones. We introduce an outlier robust set-to-set distance into the calculation of similarity in order to solve this problem. Especially, we introduce Regularized Nearest Points (RNP) distance, and we propose the way to introduce it into RF with query optimization and RF with two-class classifier. RNP distance defines the distance between two tracklets as the distance
iv between the representative points of two tracklets. The representative points are decided by using Regularized Affine Hull of tracklet s feature to be close to the mean feature of the tracklet and to be close to the other representative point. This method decides the optimal representative point with taking account of the mean feature. So, it can do robust calculation for outliers with taking account of data s distribution. In RF with query optimization, the query tracklet s feature transfers to be close to the mean tracklet s feature of the same person and to be far away from that of other persons. We calculate the similarity between the query and each tracklet by the set-to-set distance between them. Here, the smaller the distance is, the larger similarity is. In this method, we introduce RNP distance into the calculation of the distance between tracklets features. On the other hand, in RF with two-class classifier, we don t calculate the similarity by the distance between tracklets but by using the separating hyperplane which classifies person images into the same person s class (+1) and other persons class (-1) as the target person. In particular, we define the similarity between the target person and each person image in each tracklet as the value of evaluation function used in two-class classifier, and we define the similarity of tracklet as the maximum of the similarities of person images in the tracklet. We can regard this method defines the similarity by the distance between tracklet and the most relevant hyperplane which is translated from the separating hyperplane to be further from the different person s side than any features of person images on the gallery database. The distance between tracklet and hyperplane is the length of perpendicular of hyperplane to the representative point of tracklet. In our method, we introduce RNP to calculate the distance between the most relevant hyperplane and tracklet. In the exeriments, we use a gallery database automatically extracted from the surveillance videos in a shopping mall. When using the RF with query optimization and with two-class classififer, our experiments showed that the proposed method improves the performance by up to 33.6 points and 2.7 points, respectively. Our idea can be applied to video retrievals and it is left for future work.
1 1 2 3 2.1............................ 3 2.2....................................... 4 2.2.1............................ 5 2.2.2........................ 7 2.3......................... 9 3 10 3.1.................................. 10 3.2.................. 13 3.3....... 15 4 18 4.1....................................... 18 4.2....................................... 18 4.2.1.............. 18 4.2.2.................... 20 4.2.3................................. 21 4.2.4................................. 21 4.3....................................... 22 4.4.......................................... 22 4.4.1 SVM..... 22 4.4.2 Rocchio 23 4.4.3 2........... 23 5 24 24 25
1 2010 1000 1 ( ) ( ) ( ) 2 [1,2] [3 6] ( ) ( ) 1
(i) N (ii) (iii) N (iv) (ii) (iii) / Metternich [1] ( ) Metternich 2 2 2
2 3 4 5 2 2.1 [7] Gray [8] RGB YCbCr HSV Schmid Gabor Adaboost (Ensemble of Localized Features: ELF) [9] 1 1 [1] Bak [2] (Mean Riemannian Covariance Grid: MRCG) 3
[3 6] 1 / 2.2 1 2 I i T i = {I i1,..., I ini } n i i f ij F i = {f i1,..., f ini } T Q F Q D D(F Q, F i ) 4
(S(F Q, F i ) = D(F Q, F i )) 1. T i S(F Q, F i ) T i N 2. N 3. 4. N 5. 2 5 N 2.2.1 Metternich [1] 3 Average Description(AD) Largest Detection(LD) 5
1: 6
Minimum Pointwise Distance(MPD) 2 [10] 2 2 D D 2 2 2.2.2 Metternich [1] 5 (a) (b) (c) (d) (e)2 (a) 2 1 (b) 0 2 7
(c) (d) 2 (e)2 2 2 2 1 (a) (d) (e)2 2 Metternich [1] 2 SVM SVM 2 K(x, y) = t ϕ(x)ϕ(y) ϕ w, b x (+1) (-1) 8
sgn( t wϕ(x) + b) (1) SVM SVM Zhang [5] 1 SVM x t wϕ(x) + b (2) Metternich [1] T i I ij T i S(F Q, F i ) = max( t wϕ(f ij ) + b) (3) j 2.3 2 2 2.2.1 9
LD MPD ( 2) AD 2 ( 2) AD 2 LD 2 MPD 2 SVM MPD AD LD AD LD 3 3.1 10
2: AD( ),LD( ),MPD( ) 11
Yang [11] Regularized Nearest Points(RNP) RNP 2 Regularized Affine Hull(RAH ) RAH F i = {f i1,..., f ini } RAH (4) RAH = f = k f ik a k f ik F i, k a k = 1, a 2 k σ k RAH σ σ RAH RAH F Q F Q = (f Q1,..., f QnQ ) i F i = (f i1,..., f ini ) F Q F i α, β i min F Q α F i β i 2 α,β i 2, (5) subject to α k = 1, k k (4) β ik = 1, (6) α 2 σ 1, β i 2 σ 2 (7) 2 L2 σ 1, σ 2 RAH (6) k α k 1, k β ik 1 12
(5),(6),(7) { } min z ˆF Q α ˆF i β i 2 2 + λ 1 α 2 2 + λ 2 β i 2 2 α,β i (8) z = t (0, γ 1, γ 2 ), ˆF Q = t ( t F Q, 0, γ2 1), ˆF i = t ( t F i, γ 1 1, 0) λ 1, λ 2, γ 1, γ 2 (8) α, β i 0 α = ( tˆf Q ˆF Q + λ 2 I) 1tˆF Q (z ˆF i β i ) (9) β i = ( tˆf i ˆF i + λ 2 I) 1tˆF i (z ˆF Q α) (10) β i = 1 n i 2 α, β i α, β i α, β i D RNP (F Q, F i ) = ( F Q + F ) F Q α F i β i 2 2 (11) F Q, F i F Q, F i F Q, F i RAH ( 3) γ 1, γ 2, λ 1, λ 2 RNP F Q α F i β i 3.2 2.2.2 (a) (d) RNP 13
3: RNP (c) F Q i F i D D(F Q, F i ) D RNP (F Q, F i ) (e)2 2 SVM Metternich SVM RNP SVM RNP 2 14
1 1 SVM SVM 2 SVM RNP RNP (3) SVM RNP Metternich [1] (3) 4 (3) MPD Metternich MPD RNP 3.3 SVM RNP K(x, y) = t ϕ(x)ϕ(y) SVM 15
4: t wϕ(x) + b = 0 (12) f K(f, f) m 2 m( 0) 4 (f 0 = 0) m x d t wϕ(x) + b d = (13) w 2 x d 0 d > 0 d 0 16
d 0 = b (14) w 2 m d 0 = m b = m w 2 (15) t wϕ(x) m w 2 = 0 (15) K(x, y) = t ϕ(x)ϕ(y) RNP F i RNP ψ(f i )β i ψ(f i ) = (ϕ(f i1 ),..., ϕ(f ini )) ψ(f i )β i SVM (15) t wψ(f i )β i m w 2 ψ(f i )β i w (16) w 2 2 RNP (5) F Q α (16) ( t wψ(f i )β i m w 2 ) 2 min, (17) β i w 2 2 subject to (18) β ik = 1, (19) k β i 2 σ 2 (20) F i SVM RNP RNP (19) min β i { ( t wψ(f i )β i m w 2 ) 2 } + λ w 2 2 β i 2 2 + γ 2 (1 t 1β i ) 2 2 (21) 1 1 β i β 0 β i ( ) t 1 ( ψ(f i )ψ(f i ) β i = + λ w 2 2 I + γ 2 M ni γ 2 1 + m ) t ( t wψ(f i )) 2 w 2 17 (22)
(21) β i I n i n i M ni 1 n i n i ψ(f i )β i F i (2) t wψ(f i )β i m w 2 (23) 4 4.1 4.2 4.2.1 16 89 16039 4337 1 1 69 OKAO Vision [12] 5,6,7 5 6 7 Wu [13] 160 f c 160 Schmid f s 80 Gabor f g 18
図 5: 人物の向きの変化を含むトラックレット 図 6: 人物領域の誤りによる少数の外れ値画像を含むトラックレット 図 7: 人物領域の切り出しの失敗が生じた人物画像のみから成るトラックレット 19
400 f = t ( t f c, t f s, t f g ) f c RGB HSV dense sampling 160 4.2.2 2.2.2 Metternich [1] SVM Metternich SVM MATLAB SVM 5% 1 MPD SVM 2.2.2 (c) Rocchio [3] Rocchio [3] Rocchio (24) F l Q = af l 1 Q + b F l r c F l nr (24) 20
FQ l l F r, l Fnr l l F l r, F l nr F r, F nr Rocchio a, b, c(a + b c = 1) (a, b, c) = (0.7, 0.8, 0.5) N=20 N 3.1 RNP Metternich [1] MPD F Q F i MPD D MP D (F Q, F i ) f Qj F Q, f i k F i (25) 4.2.3 D MP D (F Q, F i ) = min j,k d(f Qj, f ik ) (25) 16 12 12 4.2.4 Cumulative Matching Curve (CMC: ) CMC l n CMC 21
CMC(l) = 1 n n k=1 l (26) 4.3 CMC 8,9 8 SVM (Relevance Feedback:RF) 9 Rocchio [3] RF 2 1500 CMC 8: SVM RF 9: Rocchio RF 4.4 4.4.1 SVM 8 300 84.09% 86.80% 2.71 Metternich SVM 22
4.4.2 Rocchio 9 1,100 38.22% 71.86% 33.64 4.4.3 2 SVM Rocchio Rocchio [3] Rocchio RNP SVM 3.2 RNP SVM Rocchio SVM Rocchio SVM Rocchio SVM / 2 Rocchio SVM Rocchio 23
5 2 2 SVM 2.7 Rocchio 33.6 Metternich [1] SVM 24
[1] Michael J Metternich, Marcel Worring, Track based relevance feedback for tracing persons in surveillance videos, Computer Vision and Image Understanding, Vol. 117, No. 3, pp. 229 237, 2013. [2] Slawomir Bak, Etienne Corvee, François Bremond, Monique Thonnat, Multiple-shot human re-identification by mean riemannian covariance grid, IEEE International Conference on Advanced Video and Signal-Based Surveillance (AVSS 2011), pp. 179 184, 2011. [3] Joseph John Rocchio, Relevance feedback in information retrieval, the Smart Retrieval System: Experiments in Automatic Document Processing, pp. 313 323, 1971. [4] Xiang Sean Zhou, Thomas S Huang, Relevance feedback in image retrieval: A comprehensive review, Multimedia systems, Vol. 8, No. 6, pp. 536 544, 2003. [5] Lei Zhang, Fuzong Lin, Bo Zhang, Support vector machine learning for image retrieval, IEEE International Conference on Image Processing (ICIP 2001)., Vol. 2, pp. 721 724, 2001. [6] Yoshiharu Ishikawa, Ravishankar Subramanya, Christos Faloutsos, Mindreader: Querying databases through multiple examples, Computer Science Department, p. 551, 1998. [7],,,, :,, PRMU2011-21, pp. 117 124, 2011. [8] Douglas Gray, Hai Tao, Viewpoint invariant pedestrian recognition with an ensemble of localized features, European Conference on Computer Vision (ECCV 2008), pp. 262 275. 2008. [9],,,, PRMU2011-25, Vol. 111, No. 49, pp. 139 146, 2011. 25
[10] Daniel P. Huttenlocher, Gregory A. Klanderman, William J Rucklidge, Comparing images using the hausdorff distance, IEEE Transactions on Pattern Analysis and Machine Intelligence,, Vol. 15, No. 9, pp. 850 863, 1993. [11] Meng Yang, Pengfei Zhu, Luc Van Gool, Lei Zhang, Face recognition based on regularized nearest points between image sets, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG 2013), pp. 1 7, 2013. [12], Okao vision, http://www.omron.co.jp/ecb/products/mobile/ [13] Yang Wu, Michihiko Minoh, Masayuki Mukunoki, Collaboratively regularized nearest points for set based recognition, British Machine Vision Conference (BMVC 2013), pp. 134.1 134.10, 2013. 26