rank GPU ERATO 2011 11 1 1 / 26
GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26
GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26
GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26
1 2 3 CUDA 4 5 6 3 / 26
1 2 3 CUDA 4 5 6 4 / 26
x {0, 1} n b {0, 1} rank b (x, i) := x[1i] b rank 1 (x 1, 4) = 2 rank 0 (x 2, 7) = 3 1 2 3 4 5 6 7 8 9 10 x 1 = 0 1 1 0 0 1 1 1 0 0 x 2 = 1 0 1 1 1 0 0 0 1 0 5 / 26
x {0, 1} n b {0, 1} rank b (x, i) := x[1i] b rank 1 (x 1, 4) = 2 rank 0 (x 2, 7) = 3 1 2 3 4 5 6 7 8 9 10 x 1 = 0 1 1 0 0 1 1 1 0 0 x 2 = 1 0 1 1 1 0 0 0 1 0 5 / 26
x {0, 1} n b {0, 1} rank b (x, i) := x[1i] b x {0, 1} n rank b (x, i) O(1) o(n) GPU 6 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
Jacobson 89 rank 0 (B, i) = i rank 1 (B, i) rank 1 (B, i) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST rank 1 (B, i) = LT[i/L] + ST[i/S] + rank 7 / 26
1 2 3 CUDA 4 5 6 8 / 26
CUDA GPU NVIDIA GPU C / C++ Single Instruction Multiple Thread SIMT 32 9 / 26
10 / 26
4GB 1 49kB 11 / 26
CUDA = 1024 1024 64 128 256 = 65535 65535 65535 200 300 12 / 26
1 2 3 CUDA 4 5 6 13 / 26
1 Population count (Popcount) 32bit / 64bit 1 GPU 2 Prefix sum (x ( 1, x 2,, x k,, x n ) x 1, x 1 + x 2,, k i=1 x k,, ) n i=1 x i O(log n) 14 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 2 3 5 8 17 24 30 32 35 36 15 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 8 12 16 13 8 5 4 2 3 5 8 17 24 30 32 35 36 15 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 8 12 16 13 8 5 4 2 3 5 8 17 24 30 32 35 36 15 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 8 12 16 13 8 5 4 2 5 8 17 24 25 24 18 12 3 5 8 17 24 30 32 35 36 15 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 8 12 16 13 8 5 4 2 5 8 17 24 25 24 18 12 3 5 8 17 24 30 32 35 36 15 / 26
Prefix Sum 1 1 i 2 i 1 5 3 9 7 6 2 3 1 1 5 8 12 16 13 8 5 4 2 5 8 17 24 25 24 18 12 3 5 8 17 24 30 32 35 36 5 8 17 24 30 32 35 36 15 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST 1 1 1 1 16 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 LT 0 4 7 ST 0 1 2 0 2 3 0 1 3 1 B L L = log 2 n LT 2 B S S = log n/2 ST 1 1 1 1 16 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 2 2 1 0 1 2 0 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 2 2 1 0 1 2 0 (Prefix sum) 1 2 4 2 3 3 1 3 3 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 2 2 1 0 1 2 0 (Prefix sum) 1 2 4 2 3 3 1 3 3 ST 0 1 2 0 2 3 0 1 3 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 2 2 1 0 1 2 0 (Prefix sum) 1 2 4 2 3 3 1 3 3 ST 0 1 2 0 2 3 0 1 3 LT 0 4 3 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 1 1 2 2 1 0 1 2 0 (Prefix sum) 1 2 4 2 3 3 1 3 3 ST 0 1 2 0 2 3 0 1 3 LT 0 4 3 (Prefix sum) 0 4 7 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 B 0 1 1 0 1 1 1 1 0 1 0 0 0 1 1 1 0 0 ST 0 1 2 0 2 3 0 1 3 LT 0 4 7 1 Popcount 2 Prefix Sum 3 1 ST 4 1 LT 5 LT Prefix Sum 17 / 26
LT n = 2 32 16MB GPU CPU Prefix sum 18 / 26
LT n = 2 32 16MB GPU CPU Prefix sum 18 / 26
Prefix Sum 1 2 Prefix Sum 3 4 1 5 Prefix Sum 6 Prefix Sum 19 / 26
1 2 3 CUDA 4 5 6 20 / 26
CPU AMD Phenom X4 9850 (25GHz) GPU Tesla C2070 115 GHz 4 GB 49 kb Sux: Implementing Succinct Data Structures Broadword Implementation of Rank / Select Queries S Vigna WEA 2008: 7th International Workshop on Experimental Algorithms (pp 154-168) 21 / 26
(1) 2 log n 64 256 4,194,304 100Mbit 1Gbit 3Gbit Sux(CPU) 0932058 s 9372586 s 28005750 s GPU 0089909 s 0397739 s 1027892 s CPU/GPU 1037 2356 2725 = 04 s n = 3G 22 / 26
(1) 2 log n 64 256 4,194,304 100Mbit 1Gbit 3Gbit Sux(CPU) 0932058 s 9372586 s 28005750 s GPU 0089909 s 0397739 s 1027892 s CPU/GPU 1037 2356 2725 = 04 s n = 3G 22 / 26
(1) 2 log n 64 256 4,194,304 100Mbit 1Gbit 3Gbit Sux(CPU) 0932058 s 9372586 s 28005750 s GPU 0089909 s 0397739 s 1027892 s CPU/GPU 1037 2356 2725 = 04 s n = 3G 22 / 26
(1) 2 log n 64 256 4,194,304 100Mbit 1Gbit 3Gbit Sux(CPU) 0932058 s 9372586 s 28005750 s GPU 0089909 s 0397739 s 1027892 s CPU/GPU 1037 2356 2725 = 04 s n = 3G 22 / 26
(1) 2 log n 64 256 4,194,304 100Mbit 1Gbit 3Gbit Sux(CPU) 0932058 s 9372586 s 28005750 s GPU 0089909 s 0397739 s 1027892 s CPU/GPU 1037 2356 2725 = 04 s n = 3G 22 / 26
(2) n = 3G 2 log n 64 m 2771 23 / 26
(2) n = 3G 2 log n 64 m 2771 23 / 26
(3) n = 3G k 2 log n 64k m 2825 24 / 26
(3) n = 3G k 2 log n 64k m 2825 24 / 26
1 2 3 CUDA 4 5 6 25 / 26
GPU 28 GPU 26 / 26
GPU 28 GPU 26 / 26