EGunGPU

Similar documents
supercomputer2010.ppt

untitled

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

i

m dv = mg + kv2 dt m dv dt = mg k v v m dv dt = mg + kv2 α = mg k v = α 1 e rt 1 + e rt m dv dt = mg + kv2 dv mg + kv 2 = dt m dv α 2 + v 2 = k m dt d


4 2 Rutherford 89 Rydberg λ = R ( n 2 ) n 2 n = n +,n +2, n = Lyman n =2 Balmer n =3 Paschen R Rydberg R = cm 896 Zeeman Zeeman Zeeman Lorentz

GPGPU

Fourier series to Fourier transform Masahiro Yamamoto September 9, 2016 OB (r j)j r (r i)i Figure 1: normal coordinate, projection, inner product 3 r

untitled

2 G(k) e ikx = (ik) n x n n! n=0 (k ) ( ) X n = ( i) n n k n G(k) k=0 F (k) ln G(k) = ln e ikx n κ n F (k) = F (k) (ik) n n= n! κ n κ n = ( i) n n k n

Table 1: Basic parameter set. Aperture values indicate the radius. δ is relative momentum deviation. Parameter Value Unit Initial emittance 10 mm.mrad

untitled

A 99% MS-Free Presentation

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

all.dvi

II 2 II

JKR Point loading of an elastic half-space 2 3 Pressure applied to a circular region Boussinesq, n =

n (1.6) i j=1 1 n a ij x j = b i (1.7) (1.7) (1.4) (1.5) (1.4) (1.7) u, v, w ε x, ε y, ε x, γ yz, γ zx, γ xy (1.8) ε x = u x ε y = v y ε z = w z γ yz

液晶の物理1:連続体理論(弾性,粘性)

5 1.2, 2, d a V a = M (1.2.1), M, a,,,,, Ω, V a V, V a = V + Ω r. (1.2.2), r i 1, i 2, i 3, i 1, i 2, i 3, A 2, A = 3 A n i n = n=1 da = 3 = n=1 3 n=1

4/15 No.

07-二村幸孝・出口大輔.indd

( )

変 位 変位とは 物体中のある点が変形後に 別の点に異動したときの位置の変化で あり ベクトル量である 変位には 物体の変形の他に剛体運動 剛体変位 が含まれている 剛体変位 P(x, y, z) 平行移動と回転 P! (x + u, y + v, z + w) Q(x + d x, y + dy,

.5 z = a + b + c n.6 = a sin t y = b cos t dy d a e e b e + e c e e e + e 3 s36 3 a + y = a, b > b 3 s363.7 y = + 3 y = + 3 s364.8 cos a 3 s365.9 y =,

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

Part () () Γ Part ,

A

2.2 h h l L h L = l cot h (1) (1) L l L l l = L tan h (2) (2) L l 2 l 3 h 2.3 a h a h (a, h)

02-量子力学の復習

MUFFIN3

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

( ; ) C. H. Scholz, The Mechanics of Earthquakes and Faulting : - ( ) σ = σ t sin 2π(r a) λ dσ d(r a) =

9. 05 L x P(x) P(0) P(x) u(x) u(x) (0 < = x < = L) P(x) E(x) A(x) P(L) f ( d EA du ) = 0 (9.) dx dx u(0) = 0 (9.2) E(L)A(L) du (L) = f (9.3) dx (9.) P

80 4 r ˆρ i (r, t) δ(r x i (t)) (4.1) x i (t) ρ i ˆρ i t = 0 i r 0 t(> 0) j r 0 + r < δ(r 0 x i (0))δ(r 0 + r x j (t)) > (4.2) r r 0 G i j (r, t) dr 0

all.dvi

( )/2 hara/lectures/lectures-j.html 2, {H} {T } S = {H, T } {(H, H), (H, T )} {(H, T ), (T, T )} {(H, H), (T, T )} {1

sec13.dvi


! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

Microsoft PowerPoint - 島田美帆.ppt

main.dvi

The Physics of Atmospheres CAPTER :

2.5 (Gauss) (flux) v(r)( ) S n S v n v n (1) v n S = v n S = v S, n S S. n n S v S v Minoru TANAKA (Osaka Univ.) I(2012), Sec p. 1/30

housoku.dvi

50 2 I SI MKSA r q r q F F = 1 qq 4πε 0 r r 2 r r r r (2.2 ε 0 = 1 c 2 µ 0 c = m/s q 2.1 r q' F r = 0 µ 0 = 4π 10 7 N/A 2 k = 1/(4πε 0 qq

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1


1 1.1,,,.. (, ),..,. (Fig. 1.1). Macro theory (e.g. Continuum mechanics) Consideration under the simple concept (e.g. ionic radius, bond valence) Stru

2 Chapter 4 (f4a). 2. (f4cone) ( θ) () g M. 2. (f4b) T M L P a θ (f4eki) ρ H A a g. v ( ) 2. H(t) ( )

No δs δs = r + δr r = δr (3) δs δs = r r = δr + u(r + δr, t) u(r, t) (4) δr = (δx, δy, δz) u i (r + δr, t) u i (r, t) = u i x j δx j (5) δs 2

3 filename=quantum-3dim110705a.tex ,2 [1],[2],[3] [3] U(x, y, z; t), p x ˆp x = h i x, p y ˆp y = h i y, p z ˆp z = h

基礎から学ぶトラヒック理論 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.

OHO.dvi

) ] [ h m x + y + + V x) φ = Eφ 1) z E = i h t 13) x << 1) N n n= = N N + 1) 14) N n n= = N N + 1)N + 1) 6 15) N n 3 n= = 1 4 N N + 1) 16) N n 4

GPU n Graphics Processing Unit CG CAD

TOP URL 1

Gmech08.dvi

2012専門分科会_new_4.pptx

all.dvi

I ( ) 1 de Broglie 1 (de Broglie) p λ k h Planck ( Js) p = h λ = k (1) h 2π : Dirac k B Boltzmann ( J/K) T U = 3 2 k BT

GPUを用いたN体計算


Agenda GRAPE-MPの紹介と性能評価 GRAPE-MPの概要 OpenCLによる四倍精度演算 (preliminary) 4倍精度演算用SIM 加速ボード 6 processor elem with 128 bit logic Peak: 1.2Gflops

AHPを用いた大相撲の新しい番付編成

SO(3) 49 u = Ru (6.9), i u iv i = i u iv i (C ) π π : G Hom(V, V ) : g D(g). π : R 3 V : i 1. : u u = u 1 u 2 u 3 (6.10) 6.2 i R α (1) = 0 cos α


18 2 F 12 r 2 r 1 (3) Coulomb km Coulomb M = kg F G = ( ) ( ) ( ) 2 = [N]. Coulomb

W u = u(x, t) u tt = a 2 u xx, a > 0 (1) D := {(x, t) : 0 x l, t 0} u (0, t) = 0, u (l, t) = 0, t 0 (2)

k m m d2 x i dt 2 = f i = kx i (i = 1, 2, 3 or x, y, z) f i σ ij x i e ij = 2.1 Hooke s law and elastic constants (a) x i (2.1) k m σ A σ σ σ σ f i x

1 filename=mathformula tex 1 ax 2 + bx + c = 0, x = b ± b 2 4ac, (1.1) 2a x 1 + x 2 = b a, x 1x 2 = c a, (1.2) ax 2 + 2b x + c = 0, x = b ± b 2

120 9 I I 1 I 2 I 1 I 2 ( a) ( b) ( c ) I I 2 I 1 I ( d) ( e) ( f ) 9.1: Ampère (c) (d) (e) S I 1 I 2 B ds = µ 0 ( I 1 I 2 ) I 1 I 2 B ds =0. I 1 I 2

46 4 E E E E E 0 0 E E = E E E = ) E =0 2) φ = 3) ρ =0 1) 0 2) E φ E = grad φ E =0 P P φ = E ds 0

スライド 1

日本内科学会雑誌第102巻第4号

HPC (pay-as-you-go) HPC Web 2

Introduction SFT Tachyon condensation in SFT SFT ( ) at 1 / 38

1. ( ) 1.1 t + t [m]{ü(t + t)} + [c]{ u(t + t)} + [k]{u(t + t)} = {f(t + t)} (1) m ü f c u k u 1.2 Newmark β (1) (2) ( [m] + t ) 2 [c] + β( t)2



untitled

1 I 1.1 ± e = = - = C C MKSA [m], [Kg] [s] [A] 1C 1A 1 MKSA 1C 1C +q q +q q 1

ohpr.dvi

HPC pdf

PowerPoint Presentation

(2 X Poisso P (λ ϕ X (t = E[e itx ] = k= itk λk e k! e λ = (e it λ k e λ = e eitλ e λ = e λ(eit 1. k! k= 6.7 X N(, 1 ϕ X (t = e 1 2 t2 : Cauchy ϕ X (t

‚åŁÎ“·„´Šš‡ðŠp‡¢‡½‹âfi`fiI…A…‰…S…−…Y…•‡ÌMarkovŸA“½fiI›ð’Í

LLG-R8.Nisus.pdf

2.4 ( ) ( B ) A B F (1) W = B A F dr. A F q dr f(x,y,z) A B Γ( ) Minoru TANAKA (Osaka Univ.) I(2011), Sec p. 1/30

1 (1) () (3) I 0 3 I I d θ = L () dt θ L L θ I d θ = L = κθ (3) dt κ T I T = π κ (4) T I κ κ κ L l a θ L r δr δl L θ ϕ ϕ = rθ (5) l

Gravothermal Catastrophe & Quasi-equilibrium Structure in N-body Systems

Microsoft PowerPoint - GPU_computing_2013_01.pptx


dynamics-solution2.dvi

cm λ λ = h/p p ( ) λ = cm E pc [ev] 2.2 quark lepton u d c s t b e 1 3e electric charge e color charge red blue green qq

JFE.dvi


Transcription:

Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19

Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops, total 2.15 TFlops IBM Blue Gene PowerPC440 2 512MB 10,240 57.3TFlops

1000 1000x3 3

Particle In Cell (PIC) ( ) Particle In Cell Particle-Particle

Particle In Cell, 2 20-30

Particle In Cell (PIC) (2-3 ) cell α

FFT G(r r )ρ(r )dr (CR) FFT CR 7

1/γ 2 1/γ (d 2 x/dt 2 ) 2

PIC Beam-beam PIC 107-10 8. cell 100x100-200x200 limit

PIC cell ( ) ( ) cell

SR11000 KEKB 100 10 4 10 6 1 SuperKEKB 10 4 x10 4 =10 8 10 CPU JPARC-MR 10 3 10 5

GRAPE NxN 1/N1/2

e + e - z+ z- s=0 s s=(z+,i-z-,j)/2 z +,i+z-,j

s 1 N 2 1 N 2 xtime step z

(SuperKEKB) 10 6 x10 6 10 4 100 10 4 x10 4 100x100 1 10 4 10 4 x10 4 10 8 time step

(J-PARC) 10 5 2 ( beam-beam ) 10 8 time step

Blue Gene 128x128 128kB 1.5msx108 =40h, Glue Gene IBM (2006 11 ) 128KB MPI_Allreduce rhoxy calc_potential_psn phi 128KB 128x128 MPI_Allreduce 10^8 10^4 32 32CPU 13.37 sec 32 64CPU 17.80 sec 10^8 10^4 MPI_Allreduce tree N N 32 32=2^5 512 512=2^9 9/5

SR11000 KEKB 100 10 4 10 6 1 SuperKEKB 10 4 x10 4 =10 8 10 CPU JPARC-MR 10 3 10 5

Blue Gene JPARC Space charge simulation (50 )

HITACHI SR11000 KEK super computer System A GPU(Tesla1060)

RF

Electron Gun for KEK cerl Q=80 pc (max) σr=0.5mm σt=10-20 ps Ez=7 MV/m V=500 kv

PIC solver in KEK-System A 3D Poisson solver Boundary condition in free space, φ( )=0. Green function Potential ϕ(r) = 1 4π 0 G(r) = 1 r G(r r )ρ(r )dr

Implementation Make Green function table G i,j,k = 1 x y z G(r) = 1 r xi + x/2 yj + y/2 zk + z/2 x i x/2 1 x2 + y 2 + z 2 dr = y j y/2 z k z/2 1 x2 + y 2 + z 2 dr x2 yz 2 tan 1 x y2 zx x 2 + y 2 + z 2 2 tan 1 y z2 xy x 2 + y 2 + z 2 2 tan 1 z x 2 + y 2 + z 2 +yz ln(x + x 2 + y 2 + z 2 )+zxln(y + x 2 + y 2 + z 2 )+xy ln(z + x 2 + y 2 + z 2 ) Calculate ρ array from macro particles distribution ρ i,j,k

ϕ(r) = 1 4π 0 Integration, convolution G(r r )ρ(r )dr = 1 4π 0 Direct summation Range of the suffix: i=1,nx, i-i =1-Nx,Nx-1 Since G-i,j,k=Gi,j.k, the G table size can be NxNyNz. i,j,k G i i,j j,k k ρ i,j,k

Solver using FFT G(k) = ρ(k) = G(r) exp(ik r)dr ρ(r) exp(ik r)dr Convolution ϕ(r) = 1 4π 0 1 (2π) 3 G(k)ρ(k) exp( ik r)dk

Discrete space G k = N xyz i=1 G(r i ) exp(ik r i ) G(r i )= 1 N xyz N xyz k=1 G k exp( ik r i ) ρ k = N xyz i=1 ρ(r i ) r exp(ik r i ) ρ(r i ) r = 1 N xyz N xyz i=1 ρ k exp( ik r i ) Convolution 4π 0 ϕ(r i )= j G(r i r j )ρ(r j ) r = 1 N xyz N xyz k=1 G k ρ k exp( ik r i )

Shifted Green function Mirror charge Mirror charge Green G m (r) = 1 r r 0 G i,j,k = 1 x y z xi + x/2 yj + y/2 zk + z/2 x i x/2 y j y/2 z k z/2 1 x2 + y 2 +(z z 0 ) 2 dr 1 x2 + y 2 +(z z 0 ) 2

Potential of Gaussian Charge distribution 1200 1000 800 with σr=1mm Green: Charge distribution in free space Red: Charge distribution with mirror at x=0.035 mirror at -0.07 free space 1/r (m -1 600 400 200 mirror y=z=0.0005 0 r =0.001-200 -0.08-0.06-0.04-0.02 0 0.02 0.04 0.06 0.08 x (m)

GPGPU GPGPU - General Purpose computing on Graphical Processor Unit CUDA(NVIDIA), ATI Stream(ATI), OpenCL My machine: Core i7 PC with NVIDIA Tesla 1060 (500k yen). NVIDIA Tesla, 240 PU/GPU, 4GB memory Tesla performance 0.933TFlops/single precision and 78GFlops/double precision. KEK supercomputer SR11000, 0.13TFlops/Node.

3D particle-particle interaction Based on a Demo code: Fast N-Body Simulation with CUDA (L. Nyland, M. Harris, J. Prins, NVIDIA SDK) F i = e2 4π 0 j=i r ij ( r ij 2 + ε 2 ) 3/2 r ij = r i r j

CPU GPU GPU GPU CPU

H = P = ee(z) Ż = Reference frame P 2 c 2 + m 2 0 c4 e z 0 E(z )dz Pc 2 P 2 c 2 + m 2 0 c4 P = m 0 V 1 V 2 /c 2 P n = P n 1 + ee(z n 1 ) t V n = P n c 2 P 2 n + m 2 0 c4 Z n = Z n 1 + P n c 2 t P 2 n + m 2 0c4

Lorentz transformation Space charge r =...e H 0 e eez L(V 2 )e ϕ L 1 (V 2 ) e H 0 e eez L(V 1 )e ϕ L 1 (V 1 ) L 1 (V 2 )e eez e H 0 L(V 1 ) reference frame z, Δt Lorentz e H 0 e eez e ϕ r 0

Equation of motion in the reference frame v i,n = n e r e c 2 t N e /n e 1 Vn 2 /c 2 j=i r ij ( r ij 2 + ε 2 ) 3/2 n e : charge in a macro particle Particle motion is assumed to be non-relativistic in the reference frame.

Expression of L(V) L -1 (V1) e e R Edr e H 0 L(V2) v x,0 = v x 1 V 2 1 /c 2 1 V 1 v z /c 2 v y,0 = v y 1 V 2 1 /c 2 1 V 1 v z /c 2 v z,0 = v z V 1 1 V 1 v z /c 2 v x = v x,0 1 V 2 2 /c 2 1+V 2 v z,0 /c 2 v y = v y,0 1 V 2 2 /c 2 1+V 2 v z,0 /c 2 v z = v z,0 + V 2 1+V 2 v z,0 /c 2 z 0 = z V 1 t 1 V 2 1 /c 2 t 0 = t V 1z/c 2 1 V 2 1 /c 2 t 0 = t 1 V1 2/c2 z = z 0 + V 2 t 0 1 V 2 2 /c 2 t = t 0 + V 2 z 0 /c 2 1 V 2 2 /c 2

H = r p 2 c 2 + m 2 0 c4 e 0 H0 E(r )dr p n = p n 1 + ee(r n 1 ) t r n = r n 1 + p n c 2 t p 2 n c 2 + m 2 0c4

NVIDIA-Tesla: 30,000 ( 400GFlops 0.042 sec/step. 100,000 0.67sec/step ( N 2 ) Hitachi SR11000(KEK-SystemA), 3D-PIC 100,000 0.15 sec/step ( Blue Gene(KEK-SystemB)