Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19
Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops, total 2.15 TFlops IBM Blue Gene PowerPC440 2 512MB 10,240 57.3TFlops
1000 1000x3 3
Particle In Cell (PIC) ( ) Particle In Cell Particle-Particle
Particle In Cell, 2 20-30
Particle In Cell (PIC) (2-3 ) cell α
FFT G(r r )ρ(r )dr (CR) FFT CR 7
1/γ 2 1/γ (d 2 x/dt 2 ) 2
PIC Beam-beam PIC 107-10 8. cell 100x100-200x200 limit
PIC cell ( ) ( ) cell
SR11000 KEKB 100 10 4 10 6 1 SuperKEKB 10 4 x10 4 =10 8 10 CPU JPARC-MR 10 3 10 5
GRAPE NxN 1/N1/2
e + e - z+ z- s=0 s s=(z+,i-z-,j)/2 z +,i+z-,j
s 1 N 2 1 N 2 xtime step z
(SuperKEKB) 10 6 x10 6 10 4 100 10 4 x10 4 100x100 1 10 4 10 4 x10 4 10 8 time step
(J-PARC) 10 5 2 ( beam-beam ) 10 8 time step
Blue Gene 128x128 128kB 1.5msx108 =40h, Glue Gene IBM (2006 11 ) 128KB MPI_Allreduce rhoxy calc_potential_psn phi 128KB 128x128 MPI_Allreduce 10^8 10^4 32 32CPU 13.37 sec 32 64CPU 17.80 sec 10^8 10^4 MPI_Allreduce tree N N 32 32=2^5 512 512=2^9 9/5
SR11000 KEKB 100 10 4 10 6 1 SuperKEKB 10 4 x10 4 =10 8 10 CPU JPARC-MR 10 3 10 5
Blue Gene JPARC Space charge simulation (50 )
HITACHI SR11000 KEK super computer System A GPU(Tesla1060)
RF
Electron Gun for KEK cerl Q=80 pc (max) σr=0.5mm σt=10-20 ps Ez=7 MV/m V=500 kv
PIC solver in KEK-System A 3D Poisson solver Boundary condition in free space, φ( )=0. Green function Potential ϕ(r) = 1 4π 0 G(r) = 1 r G(r r )ρ(r )dr
Implementation Make Green function table G i,j,k = 1 x y z G(r) = 1 r xi + x/2 yj + y/2 zk + z/2 x i x/2 1 x2 + y 2 + z 2 dr = y j y/2 z k z/2 1 x2 + y 2 + z 2 dr x2 yz 2 tan 1 x y2 zx x 2 + y 2 + z 2 2 tan 1 y z2 xy x 2 + y 2 + z 2 2 tan 1 z x 2 + y 2 + z 2 +yz ln(x + x 2 + y 2 + z 2 )+zxln(y + x 2 + y 2 + z 2 )+xy ln(z + x 2 + y 2 + z 2 ) Calculate ρ array from macro particles distribution ρ i,j,k
ϕ(r) = 1 4π 0 Integration, convolution G(r r )ρ(r )dr = 1 4π 0 Direct summation Range of the suffix: i=1,nx, i-i =1-Nx,Nx-1 Since G-i,j,k=Gi,j.k, the G table size can be NxNyNz. i,j,k G i i,j j,k k ρ i,j,k
Solver using FFT G(k) = ρ(k) = G(r) exp(ik r)dr ρ(r) exp(ik r)dr Convolution ϕ(r) = 1 4π 0 1 (2π) 3 G(k)ρ(k) exp( ik r)dk
Discrete space G k = N xyz i=1 G(r i ) exp(ik r i ) G(r i )= 1 N xyz N xyz k=1 G k exp( ik r i ) ρ k = N xyz i=1 ρ(r i ) r exp(ik r i ) ρ(r i ) r = 1 N xyz N xyz i=1 ρ k exp( ik r i ) Convolution 4π 0 ϕ(r i )= j G(r i r j )ρ(r j ) r = 1 N xyz N xyz k=1 G k ρ k exp( ik r i )
Shifted Green function Mirror charge Mirror charge Green G m (r) = 1 r r 0 G i,j,k = 1 x y z xi + x/2 yj + y/2 zk + z/2 x i x/2 y j y/2 z k z/2 1 x2 + y 2 +(z z 0 ) 2 dr 1 x2 + y 2 +(z z 0 ) 2
Potential of Gaussian Charge distribution 1200 1000 800 with σr=1mm Green: Charge distribution in free space Red: Charge distribution with mirror at x=0.035 mirror at -0.07 free space 1/r (m -1 600 400 200 mirror y=z=0.0005 0 r =0.001-200 -0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08 x (m)
GPGPU GPGPU - General Purpose computing on Graphical Processor Unit CUDA(NVIDIA), ATI Stream(ATI), OpenCL My machine: Core i7 PC with NVIDIA Tesla 1060 (500k yen). NVIDIA Tesla, 240 PU/GPU, 4GB memory Tesla performance 0.933TFlops/single precision and 78GFlops/double precision. KEK supercomputer SR11000, 0.13TFlops/Node.
3D particle-particle interaction Based on a Demo code: Fast N-Body Simulation with CUDA (L. Nyland, M. Harris, J. Prins, NVIDIA SDK) F i = e2 4π 0 j=i r ij ( r ij 2 + ε 2 ) 3/2 r ij = r i r j
CPU GPU GPU GPU CPU
H = P = ee(z) Ż = Reference frame P 2 c 2 + m 2 0 c4 e z 0 E(z )dz Pc 2 P 2 c 2 + m 2 0 c4 P = m 0 V 1 V 2 /c 2 P n = P n 1 + ee(z n 1 ) t V n = P n c 2 P 2 n + m 2 0 c4 Z n = Z n 1 + P n c 2 t P 2 n + m 2 0c4
Lorentz transformation Space charge r =...e H 0 e eez L(V 2 )e ϕ L 1 (V 2 ) e H 0 e eez L(V 1 )e ϕ L 1 (V 1 ) L 1 (V 2 )e eez e H 0 L(V 1 ) reference frame z, Δt Lorentz e H 0 e eez e ϕ r 0
Equation of motion in the reference frame v i,n = n e r e c 2 t N e /n e 1 Vn 2 /c 2 j=i r ij ( r ij 2 + ε 2 ) 3/2 n e : charge in a macro particle Particle motion is assumed to be non-relativistic in the reference frame.
Expression of L(V) L -1 (V1) e e R Edr e H 0 L(V2) v x,0 = v x 1 V 2 1 /c 2 1 V 1 v z /c 2 v y,0 = v y 1 V 2 1 /c 2 1 V 1 v z /c 2 v z,0 = v z V 1 1 V 1 v z /c 2 v x = v x,0 1 V 2 2 /c 2 1+V 2 v z,0 /c 2 v y = v y,0 1 V 2 2 /c 2 1+V 2 v z,0 /c 2 v z = v z,0 + V 2 1+V 2 v z,0 /c 2 z 0 = z V 1 t 1 V 2 1 /c 2 t 0 = t V 1z/c 2 1 V 2 1 /c 2 t 0 = t 1 V1 2/c2 z = z 0 + V 2 t 0 1 V 2 2 /c 2 t = t 0 + V 2 z 0 /c 2 1 V 2 2 /c 2
H = r p 2 c 2 + m 2 0 c4 e 0 H0 E(r )dr p n = p n 1 + ee(r n 1 ) t r n = r n 1 + p n c 2 t p 2 n c 2 + m 2 0c4
NVIDIA-Tesla: 30,000 ( 400GFlops 0.042 sec/step. 100,000 0.67sec/step ( N 2 ) Hitachi SR11000(KEK-SystemA), 3D-PIC 100,000 0.15 sec/step ( Blue Gene(KEK-SystemB)