bag-of-words bag-of-keypoints Web bagof-keypoints Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 84

Bag-of-Keypoints Web G.Csurka bag-of-keypoints Web Bag-of-keypoints SVM 5.% Web Image Classification with Bag-of-Keypoints Taichi joutou and Keiji yanai Recently, need for generic image recognition is getting larger due to the explosive increase of digital images. Then, we have performed classification expreiments for general images gathered from Web employing the bag-of-keypoints method proposed by G.Csurka et al. In the experiments, we have obtained the 5.% classification rate for object classes... World Wide Web Flickr ) Web Web Department of Computer Science, The University of Electro-Communications Corel Corel Image Gallery. G. Csurka bag-of-keypoints ) Web.3

bag-of-words bag-of-keypoints Web bagof-keypoints Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 84% 5%. SIFT SIFT David Lowe 3) 4) SIFT Sivic, J. 5) 4) G. Csurka ) bag-of-keypoints SIFT Naive Bayes SVM 7 SIFT bag-of-keypoints Sivic, J. G. Csurka SIFT bag-of-keypoints 3. Web SIFT bag-of-keypoints ) ( ) k-means visual words Nearest Neighbor SVM Nearest Neighbor SIFT Nearest Neighbor bag-of-keypoints Nearest Neighbor SVM 5 bag-of-keypoints Web

4. 4. 4.. 3 3 4.. (SIFT) SIFT(Scale Invariant Feature Transformation) 3) David Lowe SIFT DOG(Difference of Gaussian) 8 SIFT SIFT ( ) 8 (4 4 8) 3D SIFT C++ SIFT++ 6) pgm SIFT (8 ) SIFT++ C++ 4. Bag-of-keypoints Bag-of-keypoints ) visual words code book visual word (bag) Bag-of-keypoints ( ) ( ) codebook ( 3 ) codebook ( 4 ) 4.. Visual words visual word SIFT (4.. ) code book visual words( ) visual words visual words visual words 4... k-means kmeans k k code book k 3 5 8 5 4.. Code book code book (bag-of-keypoints) code book () ( 3) code book visual word code 3

book k k 3 4.3 / SVM multiclass (Multi-Class Support Vector Machine) 7) 4.3. 4.3.. M.J.Swain 8) I j j M j j S IM n S IM = min(i j, M j) j= 4.3.. (Nearest Neighbor) () 4.3. SVM(Support Vector Machine) SVM 9) 96 Vapnik Optimal Separating Hyperplane SVM SVM SVM light) SVM multiclass (Multi-Class support Vector Machine) 7) SVM 4.3.. Φ (Φ(x) Φ(x )) K(x, x ) linear radial basis function linear( ) y = sign(w T x h) radial basis function() K(x, x ) = exp( x x ) σ radial basis function SVM Radial Basis Function(RBF) 5. 5. ( ) Web ( ) ( ) ( ) 5 ) ( 4) 4

5 5 ) ( ) 8 9 6 3 5 6 4 5 () () 5. ( 4) Nearest Neighbor bag-of-keypoints Nearest Neighbor 3 bag-of-keypoints SVM( linear) 4 bag-of-keypoints SVM( radial basis function) bag-of-keypoints code book 3 5 8 5.3 5-fold cross validation 5 (( )/( )) (( )/( )) F- (F-measure) (confusion matrix) 5.3. F- (F-measure) F- (recall) (precision) () F = + recall precision 5.3. (confusion matrix) M ij M ij = {I k C j : h(i k ) = i} C j i, j {,..., N c}(n c ) C j j h(i k ) I k 5.4 5.4. 4 3 4 5

図4 種類の画像のサンプル (左上からグローブ ) す手法では code book のサイズ別に表記した次に手法の分類器を Nearest Neighbor から SVM に変えて分類実験した結果を表 5 と表 6 で示す SVM はカーネルを通り試したためカーネル別に分けてから code book サイズで分けて表記したまたクラス別にそれぞれの手法の F-値を計算したものを表 5 で示すただし bag-of-keypoints を用いた手法においては結果がよかった code book サイズの実験結果のみを示す最後に種類の F-値の平均が最も良かったカーネルに radial basis function を用いた bag-ofkeypoints+svm の手法 (手法 4) で code book サイズを 8 としたときの結果を混合行列にして表 7 で示す図 5 種類の F-値とその平均 5.4. 種類分類実験の結果種類分類の結果より SVM を用いた手法においてカーネルの違いによる大きな結果の違いがなかったため学習時間が圧倒的に短い線形 (linear) カーネルによる SVM を種類分類において採用する種類分類の結果を種類の平均再現率と平均適合率を用いて表 8 で示すさらに F-値で評価した結果の上位種類と下位種類を表 9 で示す結果が特に良かったもの悪かったものを図 6 で示す表 3 カラー+NN と bag-of-keypoints+nn の再現率平均カラーヒストグラム.4.4.68.68.6.48.8.6.48.5.5 bag-of-keypoints(code book size(c)) 3 5 8.74.9.86.78.96.86.74.8.7.8.9.9.98.96.98.54.74.66.6.56.9.86.9..94.34.34.38..34.6.8.7.54.9.7.58.64.6.9.84.9.88.9.9.94.98.88.9.8.74.78.77.69.8 表8 種類の分類結果平均再現率.5 平均適合率.489 表 4 カラー+NN と bag-of-keypoints+nn の適合率平均カラーヒストグラム.46.5.83.54.36.48.43.6.5.58.53 表9 bag-of-keypoints(code book size(c)) 3 5 8.59.6.7.46.7.76.69.87.79.83.75.8.66.7.6.96.78.8.6.78.86.75.87.87.94.43.74.68.35.78.76..75.6.89.84.8.98...87.9.8.77.94.8.79.79.77.8.76.79.79.69.83 種類の F-値の TOP と WORST 自転車ネックレス阿波踊り花火大会スパゲッティ温泉剣道ねぶた石垣スーツ 6 TOP.87.78.77.76.76.76.76.75.74.73 橋港神社猫相撲アスパラガス川うなぎトカゲツツジ WORST.3.5.5.8..4.7.8.8.9

表 5 bag-of-keypoints+svm での再現率カーネル平均.7.8.5.9.96..76.6.9.9.7 linear code book size 3 5 8.6.6.34.56.8.9.6.68.94.94.8.8.94.96...44.54.76.96.84.88.96..8.83.9.6.9.84.68.79.8.9.96.46.88..4.9.8.98.94.79 radial basis function code book size 3 5 8.6.64.34.56.8.86.66.7.94.94.84.86.94.96...5.56.78.96.84.8.9.98.84.84.9.64.9.8.69.8.8.8.74.58.8.98.6.9.6.9.9.73.9.98.48.84...9.8.98.96.8 表 6 bag-of-keypoints+svm での適合率カーネル平均.59.89..69.6..9.7.84.63.68 linear code book size 3 5 8.63.97.85.7.98.98.66.89.96.68.87.75.65.59.68.3.8.8.93.73.84.68.67.68.7.9.87.8.85.9.6.74.83.88...7.63..83.69.94.77.77.56.9.97.74.77.8.8.8.8.6.7 radial basis function code book size 3 5 8.55.94.74.68.98.98.67.88.96.7.88.7.64.6.7.4.87.88.93.73.8.75.73.7.7.89.85.8.87.87.67.84.84.85...7.65.35.79.78.94.8.79 表 7 手法 4(code book size=8) の混合行列 True class 7 5 43 47 4 43 7 5 5 8 8 6 3 4 49 5 3 45 3 4 図 6 左図が結果がよかった例右図が結果が悪かった例 5.5 考察 5.5. 種類の分類まず手法と手法の結果よりカラーヒストグラムと SIFT 特徴 (bag-of-keypoints) の特徴量について考察してみる表 3 表 4 を見ると再現率適合率とも SIFT 特徴の方が圧倒的に認識率が優れている特にの画像は図 4 を見るとわかるようにには角や丸みを帯びたものなど他の種類にない物体の形状の特徴を持っている逆に色の視点から見てみると背景に建物や空が写っていて色だけでは判断しにくいことがわかるこのことによりカラーヒストグラムより SIFT 特徴の方が特徴量として有効だと示される次に手法と手法 3 の結果より Nearest Neighbor と SVM の分類器について考察してみる表 3 表 4 より手法の再現率と適合率の平均は.76 と.77 であり手法 3 は表 5 表 6 より線形カーネルを用いた SVM のものは.76 と.73 で若干手法の方 7

SVM.77.77 3 4 F- ( 5) SVM bag-of-keypoints 8 5.5. ( 8) 5% %.94.8..4 9 4 6. Web 84% 5% Nearest Neighbor visual words bag-of-keypoints O.Maron ) Multiple Instance Learning(MIL) MIL () O.Maron Diverse Density MIL () Chen, Y. ) MIL Chen, Y. ) Flickr: http://www.flickr.com/. ) Csurka, G., Bray, C., Dance, C. and Fan, L.: Visual categorization with bags of keypoints, Workshop on Statistical Learning in Computer Vision,European Conference on Computer Vision, pp. (4). 3) Lowe, D.: Distinctive Image Features from Scale-Invariant Keypoints, International Journal of Computer Vision, Vol.6, No., pp.9 (4). 4) Brown, M. and Lowe, D.: Recognising panoramas, Proc. The International Conference on Computer Vision, pp.8 5 (3). 5) Sivic, J. and Zisserman, A.: Video Google: A Text Retrieval Approach to Object Matching in Videos, Proceedings of the International Conference on Computer Vision, Vol., pp. 47 477 (3). 6) Vedaldi, A.: SIFT++, http://vision.ucla. edu/ vedaldi/code/siftpp/siftpp.html. 7) Joachims, T.: SVM multiclass, http://www.cs. cornell.edu/people/tj/svm light/svm multiclass. html. 8) M.J.Swain and D.H.Ballard: Color Indexing, International Journal of Computer Vision, Vol.7, No., pp. 3 (99). 9) (3). ) Joachims, T.: SVM light, http://www.cs. cornell.edu/people/tj/svm light/index.html. ) Maron, O. and Ratan, A.: Multiple-instance learning for natural scene classification, The Fifteenth International Conference on Machine Learning, pp.34 349 (998). ) Chen, Y., Bi, J. and Wang, J.: MILES: Multiple-Instance Learning via Embedded Instance Selection, IEEE Transaction on Pattern Analalysis and Machine Intelligence, Vol. 8, No., pp.93 947 (6). 8