DEIM Forum 2019 A7-1 Flexible Distance-based Hashing 731 3194 E-mail: mc66023@e.hiroshima-cu.ac.jp,{wakaba,s naga,inagi,yoko}@hiroshima-cu.ac.jp, morikei18@gmail.com Flexible Distance-based Hashing(FDH) Flexible Distance-based Hashing (FDH) 1 [2], [4], [5], [6] D d > = 20 KD-tree Rtree B+-tree Cover tree Locality-Sensitive Hashing LSH Flexible Distance-based Hashing FDH [8]FDH FDH FDH FDH δ 2 Flexible Distance-based Hashing FDH 3 FDH δ 4 2 5
2 2. 1 n D = {r 1,..., p n} D d (p 1,..., r d ) r i [L i, U i] L i U i r i D d R d D R d q q D R d p q dist(p, q) 2. 2 Flexible Distance-based Hashing FDH Flexible Distance-based Hashing FDH [8] D = {p 1,..., p n} D A A n = {a 1, a 2,..., a A} a i a j i =j a i A n a i r i D near(r i) D far (r i) D near(r i) = {p D dist(a i, p) < = r i} D far (r i) = {p D dist(a i, p) > r i} FDH dist(p, q) p q r D r dist(p, q) p r r d R d D A n R d p BM(p) = b 1, b 2,..., b A A { 0 ifdist(a i, p) < BM(p)[i] = b i = = r i (1) 1 otherwise BM(p) p d R d 2 A BM(p) = BM(q) p q d = 2 A = 3 1 FDH d D D BM(q) q [8] Flexible Distance-based Hashing FDH 1 p 1, p 2,..., p 16 16 p 14 = a 1 p 15 = a 2 p 16 = a 3 p 010 BM(p) = 010 010 p 5 p 7 p 8 p 11 p p p 8 FDH FDH FDH FDH 2. 3 FDH FDH p q Hamming(p, q) p q BM p = 011 BM(q) = 110 Hamming p, q = 2 Hamming() Hamming(p, R) BM(p BM(R) p R BM(R) R FDH
H H < = A A FDH q R 1, R 2,..., R k R i 1 < = i < = k, BM(q, R i) < = H H H = A 3 3. 1 FDH 2. 3 FDH 1 16 q q 110 p 2 H 1 010,100,111 p 2 H 2 p 2 001 H FDH FDH δ FDH 3. 2 3. 3 3. 2 FDH 3. 2. 1 δ 1 H = 1 q p 2 q a 1 a 2 000 110 000 000 FDH q 1.0 δ δ FDH δ δ δ F N(q, δ) q BM(q) = b 1, b 2,..., b A a i 1 < = i < = A b 1, b 2,..., b A if(dist(a i, q) < = r i) (dist(a i, q) > = (1 δ)r i)then b i = 1 if(dist(a i, q) > r i) (dist(a i, q) < = (1 + δ)r i)then b i = 0 otherwise b i = b i r i a i dist(a i, q) a i q q δ F N(q, δ) b 1, b 2,..., b A BM(F N(q, δ)) = b 1, b 2,..., b A 1 δ = 0.1 q BM(q) = 110 δ b 1b 2b 3 = 000 F N(q, δ) 000 δ FDH q H δ NR(q, H) FDH NR(q, H) NR(q, H) = {R R R all, Hamming(q, R) < = H} R all 1 q H = 1 NR(q, H) = {110, 010, 100, 111} H = A NR(q, H) δ F NR(q, δ) F NR(q, δ) = {R R R all, Hamming(q, R) + Hamming(R, F ) < = Hamming(q, F )} F q δ F = F N(q, δ) F NR(q, δ) R q F F 2 NR(q, H) F NR(q, δ) 1 q δ = 0.1, F = 000 F NR(q, δ) = {000, 010, 100, 110} F NR (q, δ) F NR(q, δ) = 2 Hamming(q,F )
2 NR(q, H) F NR(q, δ) δ 1.0 δ δ + FDH Adaptive FDH A-FDH 4 3. 2. 3 A-FDH δ δ A-FDH F q δ F NR(q, δ) FDH 2 1 NR(q, H) F NR(F, H) δ-fdh F = F NR(q, δ) F NR(F, H) δ H δ-fdh δ F H δ-fdh δ δ 1 NR(q, H) F NR(q, δ) Extended FDH E-FDH NR(q, H) F NR(q, δ) 1 {000, 010, 100, 110, 111} p 2 H = 2 E-FDH 3. 2. 2 δ E-FDH FDH H δ H δ δ δ 0 FDH Inputs qd = {p 1, p 2,..., p n} A n = {a 1, a 2,..., a A} R a = {r 1, r 2,..., r A} H δ Step1 BM(q) Step2 δ 0 SR NR(q, H) min d Step3 old d min d Step4 q SR Step4-1 dist(q, p) < min d p min p min d dist(q, p) Step4-2 SR Step Step4 Step5 old d = min d p min Step6 δ δ + Step7 SR {δ FNR (q, δ)} Step3 A-FDH H 3. 3 3. 3. 1 FDH FDH 1 FDH Multi-Anchor FDH MA-FDH FDH MA-FDH MA-FDH MA = {A 1, A 2,..., A s} A i A MA-FDH H FDH
MA-FDH FDH s FDH MA-FDH H MA-FDH FDH MA-FDH MA-FDH 1 1 4 MA-FDH H MA-FDH FDH 3. 3. 2 MA-FDH MA-FDH s s S s = 10 S = 100 Q Q = 10000 S FDH FDH FDH FDH s MA-FDH MA-FDH MA-FDH s 4 4. 1 q p algo p opt q AE(q, p algo, p opt) RE(q, p algo, p opt) AE(q, p algo, p opt) = dist(p algo, q) dist(p opt, q) (2) 0 0 0 RE(q, p algo, p opt) = (AE(q, p algo, p opt)/dist(p opt, q)) 100(%) (3) N M (M < = N) Accuracy Rate AR AR = (M/N) 100(%) 10,000 32 64 128-999.99 999.99 4. 2 4. 2. 1 δ FDH FDH δ H NR(F, H) δ-fdh FDH F NR(q, δ) Extended FDH E-FDH δ Adaptive FDH A-FDH 4 A 10 H 5 N 100,000 128 3,4,6 1 3 4 δfdh δ FDH 6 E-FDH δ FDH 1 E-FDH δ A-FDH 3 FDH δ = 0.01, 0.02, 0.03 δ-fdh 3 δ δ FDH H = 2 H = 3 δ δ 0.1 δ-fdh 4 FDH 4 δ = 0.1 FDH 5 δ E-FDH 6 6 FDH δ = 0.01, 0.05, 0.1 E-FDH 6 4
3 FDH δfdh 6 δ-fdh EFDH 1 E-FDH A-FDH δ/ H [%] [%] [µs] E-FDH 0.01 1 6.07 27.19 58.1 (δ) 2 3.55 22.94 132.9 3 1.94 21.35 323.4 A-FDH 0.01 1 5.88 27.19 70.6 ( ) 2 3.54 22.94 141.2 3 1.94 21.35 332.6 4 FDH δfdh 7 8 100,000 FDH H 5 δ-fdh δ FDH δ = 0.05 H = 1 FDH H = 1 1 E-FDH A-FDH H = 1 2 A-FDH E-FDH δ 4. 2. 2 MA-FDH FDH MA-FDH 128 5,10,20, FDH MA-FDH 7 8 100,000 7 128 2 MA-FDH FDH MA-FDH., 10 5, 10,,
かった 属性数 (次元数)32, 64, 128 の場合 クエリとの距離計算に おいてフラグを用いて重複する距離計算を省いた場合と フラ グを用いず 毎回距離計算を行った場合の MA-FDH の処理時 間の比較を図 11,12,13 に示す 縦軸は処理時間で 100,000 個の クエリに対する探索時間の合計 横軸は FDH の探索領域を定 めるハミング距離 H を表す 図 8 相対誤差平均の比較 属性数 128 5 の場合とあまり変わらない結果となってしまったのではない かと考えられる. 属性数 (次元数)64 のデータ集合を用いた場合のアンカー集 合数 5 10 の場合について ランダムに選択した場合と最適に 選択した場合のそれぞれでアンカー集合を選んだ MA-FDH の 正答率と相対誤差平均を比較した結果を図 9 10 に示す 横軸 図 11 フラグの有無による処理時間の比較 属性数 32 は処理時間で 100,000 個のクエリに対する探索時間の合計 縦 軸は図 9 は正答率 図 10 は 100,000 個のクエリに対する相対 誤差の平均 図中の数字は FDH の探索領域を定めるハミング 距離 H を表す. 図 12 フラグの有無による処理時間の比較 属性数 64 図9 正答率の比較 属性数 64 図 13 フラグの有無による処理時間の比較 属性数 128 フラグを用いた場合と用いなかった場合で同じアンカー集合 数でハミング距離が 0 のときは 両者の場合で探索数にほとん ど違いがないため計算時間に差がないが ハミング距離を大き 図 10 相対誤差平均の比較 属性数 64 くするにつれて 特にアンカー集合数が 10 の場合については 7 培近く処理時間に差が出ることから クエリとデータ点間の アンカー集合をランダムに選択したものよりも最適に選択 したほうが正答率が高くなり 相対誤差平均も下回ることがわ 距離計算が全体の処理時間に大きく影響しており フラグ導入 の有効性が確認された
5 FDH FDH δ FDH δ-fdh Extended FDH Adaptive FDH FDH δ-fdh E-FDH A-FDH FDH δ-fdh E-FDH A-FDH k C 17K001881 Fourth annual ACM-SIAM symposium on Discrete algorithms, pp.311 321, 1993. [8] Man Lung Yiu, Ira Assent, Christian S. Jensen, and Panos Kalnis, Outsourced similarity search on metric data assets, IEEE Trans. Knowledge and Data Engineering, Vol.24, No.2, pp.338 352, 2012. [1] Vassills Athitsos, Michalis Potamias, Panagiotis Papapetrou, George Kollios, Nearest neighbor retrieval using distance-based hashing, Proc. IEEE International Conference on Data Engineering, pp.327 336, 2008. [2] Gérard Biau, Luc Devroye, Lectures on the Nearest Neighbor Method, Springer, 2015. [3] Junfeng He, Regunathan Radhakrishnan, Shih-Fu Chang, Claus Bauer, Compact hashing with joint optimization of search accuracy and time, Proc. 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp.753 760, 2011. [4] Alexander Hinneburg, Charu C. Aggarwal, Daniel A. Keim, What is the nearest neighbor in high dimensional spaces?, Proc. 26th VLDB Conference, pp.506 515, 2000. [5] Yoonho Hwang, Bohyung Han, Hee-Kap Ahn, A fast nearest neighbor search algorithm by nonlinear embedding, Proc. 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp.3053 3060, 2012. [6] Marius Muja, David G. Lowe, Scalable nearest neighbor algorithms for high dimensional data, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.36, No.11, pp.2227 2240, 2014. [7] Peter N. Yianilos, Data structures and algorithms for nearest neighbor search in general metric spaces, Proc.