OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1
( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5] PVM[2] 3. 1 MPI PVM ( 1(b)) OpenMP Fortran (C, C++ ) MPI OpenMP MPI 1 Thinking Machines CM-5 C C* CM-5 2
VPP Fortran MPI OpenMP GP7000F, GS320 OpenMP Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D. and McDonald, J.: Parallel Programming in OpenMP, Morgan Kaufmann Publishers, 2000. OpenMP 2001 9 OpenMP http://www.openmp.org/ OpenMP web OpenMP OpenMP FAQ (Frequently Asked Questions, ) http://pdplab.trc.rwcp.or.jp/pdperf/omni/spec.ja/home.html (RWCP) ( ) 3 OpenMP OpenMP (1) ( ) OpenMP OpenMP OpenMP (2) ( ) (Fortran DO C(C++) for ) OpenMP OpenMP (3) ( ) OpenMP 1 2 OpenMP OpenMP OpenMP 3,. 4,, OpenMP. 5, OpenMP,. 3
,. OpenMP, Fortran, Fortran C., C., Fortran, OpenMP. 2 OpenMP 2.1 OpenMP OpenMP OpenMP OpenMP Architecture Review Board (ARB) OpenMP OpenMP ARB OpenMP 2001 6 SPEC OpenMP OpenMP 2.2 OpenMP OpenMP Fortran C(C++) pragma pragma OpenMP 4
OpenMP 2 program sequential...... end proguram sequential program parallel...!omp parallel!omp end parallel... end program parallel 2: OpenMP Fortran OpenMP OpenMP ( ) OMP_NUM_THREADS 2 2 4 5
OMP_NUM_THREADS 4 OpenMP OMP_GET_THREAD_NUM() if OpenMP 3 1. 2. 3. 2.3 OpenMP 1 OpenMP 1. MPI PVM 2. OpenMP 3. OpenMP 4. OpenMP (OpenMP ARB) 6
MPI PVM OpenMP 3,, OpenMP.,.,.. 3.1,,., 2 3.,., ( ).,,.,,,.,.,. GS320 UNIX GP7000F,, 1. 3.2,,.,,.,,,.,, 7
1: GS320 (kyu-ss.cc.kyushu-u.ac.jp) -check bounds -check format -check overflow -check underflow -tune ev6 -arch ev6 GS320 CPU EV68. -unroll N.. (N :. 6, 8 16. ) -fast.. ( 15 ) -O5 20 30.,.,,. UNIX GP7000F model 900 (kyu-cc.cc.kyushu-u.ac.jp) -Haesux -Kfast -Keval -Kfast_GP=2,prefetch=4 -O4 Solaris, C/C++ -Kfast_GP=2,prefetch -Kmfunc. 8
,,,. 4 4.1 OpenMP OpenMP,,,., OpenMP OpenMP,., OpenMP OpenMP,. 4.1.1 OpenMP OpenMP,... OpenMP 2.!$omp #pragma omp OpenMP (Fortran ).,. 2 Fortran parallel do, shared(a). C/C++ parallel for. OpenMP,., OpenMP, OpenMP OpenMP,., OpenMP, OpenMP OpenMP.,. OpenMP, OpenMP,. OpenMP, OpenMP. OpenMP, OpenMP ( 3).,. 9
2: OpenMP Fortran ( )!$omp OpenMP. (. ) :!$omp parallel do shared(a) Fortran ( )!$omp c$omp *$omp OpenMP. 6 0 OpenMP. : c$omp parallel do shared(a) C/C++ #pragma omp OpenMP. (. ) : #pragma omp parallel for shared(a) Fortran( ) Fortran( ) C/C++ 3: OpenMP!$,!$ OpenMP,. :!$ call parallel_init(a)!$ c$ *$ OpenMP,. :!$ call parallel_init(a) OpenMP _OPENMP. : #ifdef _OPENMP parallel_init(a); #else serial_init(a); #endif 10
4.1.2 OpenMP,.,.,,,.,.,. OpenMP C C++,. #include <omp.h>, Fortran OpenMP,,., omp_get_num_threads(). integer omp_get_num_threads 4.1.3 OpenMP 3 OpenMP,. Fortran, 2 OpenMP.!$omp parallel!$omp end parallel parallel, end parallel. Fortran OpenMP, parallel end parallel., C/C++. #pragma omp parallel parallel. C, C++ OpenMP, parallel.,. 11
Fortran program hello implicit none integer omp_get_thread_num print *, " "!$omp parallel print *, ". ", omp_get_thread_num()!$omp end parallel print *, " " end program hello C #include <stdio.h> #include <omp.h> main() { printf(" \n"); #pragma omp parallel { printf(". %d\n", omp_get_thread_num()); } printf(" \n"); } 3: parallel 12
3 :.. 1. 2. 3. 0., 4., parallel 4. print *, ". ", omp_get_thread_num() 4.,. program hello integer omp_get_thread_num print *, " "!$omp parallel print *, " "...!$omp end parallel print *, " " print *, " "... print *, " "... print *, " "... print *, " "... print *, " " end print *, " " 4: 3,., OpenMP, 1.,. print (printf ),. OpenMP,..,,., omp_set_num_threads(). 13
, OMP_NUM_THREADS.,., omp_get_thread_num() OpenMP,. 4, print.,. OpenMP,. OpenMP. 1.. 2.. 3. OpenMP. 4.. 5.. 6. 2 5.,,. 4.2, OpenMP parallel do (C/C++ parallel for ). 5, OpenMP., x a, y z. 1, i 1 100 1,.,, 6 OpenMP, i. 4, 7. i 4, i =1 25 0, i =26 50 1, i =51 75 2, i =76 100 3,, 3., 1/4. parallel do (parallel for ), do (for ),. 3 OpenMP, 7 0. 14
Fortran program ex1 implicit none integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = 0.0 x(i) = 2.0 end do a = 4.0 y = 1.0 call daxpy(z, a, x, y) end program ex1 subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = a * x(i) + y end do return end C #include <stdio.h> #include <omp.h> main() { int i; double z[100], a, x[100], y; } for (i = 0; i < 100; i++){ z[i] = 0.0; x[i] = 2.0; } a = 4.0; y = 1.0; daxpy(z, a, x, y); void daxpy(z, a, x, y) double z[], a, x[], y; { int i; for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 5: 15
Fortran program ex1 implicit none integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = 0.0 x(i) = 2.0 end do a = 4.0 y = 1.0 call daxpy(z, a, x, y) end program ex1 subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do return end C #include <stdio.h> #include <omp.h> main() { int i; double z[100], a, x[100], y; } for (i = 0; i < 100; i++){ z[i] = 0.0; x[i] = 2.0; } a = 4.0; y = 1.0; daxpy(z, a, x, y); void daxpy(z, a, x, y) double z[], a, x[], y; { int i; #pragma omp parallel for for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 6: OpenMP 16
subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do i = 1 to 25 z(i) = a * x(i) + y i = 26 to 50 z(i) = a * x(i) + y i = 51 to 75 z(i) = a * x(i) + y i = 76 to 100 z(i) = a * x(i) + y 0 1 2 3 return end 7: 6, parallel do (parallel for ).,.. do i = 2, 100 z[i] = z[i] + z[i - 1] end do, z[i] z[i-1]. z[i-1], i, i. i parallel do (parallel for ), i,,., parallel for (parallel do ),., OpenMP (2). 4.3, OpenMP., OpenMP. 6, x, a, y, z., x[1] x[1]., 0 x[1], 1 2 x[1]., x. a, y., z. 0 a*x[1]+y 17
3.0, z[1], z[1] 3.0., z., i. i, z., i, i,. 0 i=1, i 1 i 26. 0, i 25,,.,,., i,., i,., 0 i 1 25, 1 i 26 50. subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y z a x y i!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do i i i i return end 8: 8.,,., x, z a, y,., i. i, i. OpenMP,,.,., shared.!$omp parallel shared(a) 18
,, private.!$omp parallel private(a) shared private,, shared private.,,( ),.!$omp parallel shared(a, b, c) private(d, e), 6 parallel do (parallel for ),., OpenMP. OpenMP,.,,,. 6,, 9. Fortran subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do shared(z, a, x, y) private(i) do i = 1, 100 z(i) = a * x(i) + y end do return end C void daxpy(z, a, x, y) double z[], a, x[], y; { int i; #pragma omp parallel for shared(z, a, x, y) private(i) for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 9:, OpenMP, shared. private, 19
. C/C++ OpenMP, 4. 2,.,,.,,. 2, OpenMP 10. Fortran subroutine matvec(a, x, y) integer i, j double precision a(100, 100), x(100), y(100)!$omp parallel do private(j) do i = 1, 100 do j = 1, 100 y(i) = a(j, i) * x(i) end do end do return end C void matvec(a, x, y) double a[][100], x[], y[]; { int i, j; #pragma omp parallel for private(j) for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) y[i] += a[i][j] * x[i]; } 10: OpenMP 4.4, 11., 6 parallel do (parallel for ) i 4 Fortran OpenMP,,., C C++ Fortran,. 20
Fortran function total(x) integer i double precision t, total, x(100) t = 0.0 do i = 1, 100 t = t + x(i) end do total = t end C double total(x) double x[]; { int i; double t; t = 0.0; for (i = 0; i < 100; i++) t += x[i]; return t; } 11: ( )., 4 25,.,, 11 x t., i., t., x 1.0., x, t 100.0.,.,,.,. 1: t. 2: x[i]. 3:. 4: t., 12(a)., 12(b) 21
i=1 1:t 2:x[0] 3: 4: t i=2 1:t 2:x[1] 3: 4: t i=3 1:t 2:x[2] 3: 4: t... i=1 1:t 2:x[0] 3: 4: t i=2 1:t 2:x[1] 3: 4: t i=3 1:t 2:x[2] 3: 4: t... i=26 1:t 2:x[25] 3: 4: t i=27 1:t 2:x[26] 3: 4: t i=28 1:t 2:x[27] 3: 4: t... i=51 1:t 2:x[50] 3: 4: t i=52 1:t 2:x[51] 3: 4: t i=53 1:t 2:x[52] 3: 4: t... i=76 1:t 2:x[75] 3: 4: t i=77 1:t 2:x[76] 3: 4: t i=78 1:t 2:x[77] 3: 4: t... (a) (b) 12:, OpenMP,,,., 1 4, 1., t. t 12.0 0 1. x[i] 1.0, t 13.0. t,, t 13.0.,,., 100.0.,. OpenMP,.., 1.,,. 11 t = t + x(i), t t t,., i 1,.,., 22
, OpenMP.,.,,.,.,,.,,,,., parallel for reduction. 11 13., reduction reduction(+:t), + ( ) t. reduction(+:t,u,v),( ). t, t,., t x[i]. i, t, t., i,,. 5 OpenMP, UNIX OpenMP,.,. 5.1 OpenMP 2,. 5.1.1 COMPAQ GS320 COMPAQ GS320,. GS320 Alpha 21264 (731MHz) CPU. CPU,.., GS320 64GByte, 1,, 16GByte. 23
Fortran function total(x) integer i double precision t, total, x(100) t = 0.0!$omp parallel do reduction(+:t) do i = 1, 100 t = t + x(i) end do total = t return end C double total(x) double x[]; { int i; double t; t = 0.0; #pragma omp parallel for reduction(+:t) for (i = 0; i < 100; i++) t += x[i]; return t; } 13: ( ) 24
COMPAQ GS320 UNIX GP7000F model 900 GS320 OpenMP Fortran C. OpenMP C++., GS320 C Fortran. OpenMP. OpenMP, OpenMP. GS320,, MPI, PVM, HPF., VPP5000 UNIX GP7000F model 900. Compaq Tru64 UNIX Digital UNIX Alpha CPU GS320., Alpha GS320,.,, UNIX kyu-cc.cc.kyushu-u.ac.jp, touroku. kyu-cc% touroku kyu-ss Password: kyu-cc... kyu-cc%,. kyu-ss.cc.kyushu-u.ac.jp 25
5.1.2 UNIX GP7000F model 900 UNIX GP7000F model 900,. GP7000F model 900, SPARC64-GP (300MHz) 64. SPARC, OS( ) Solaris 7, SPARC,., GP7000F model 900 64GByte, 1,, 32GByte. GP7000F model 900 OpenMP Fortran, C C++,, C Fortran.,,, MPI, VPP5000 GS320. UNIX,. kyu-cc.cc.kyushu-u.ac.jp 5.2 OpenMP, OpenMP. OpenMP, OpenMP,., OpenMP,. 5.2.1 GS320 GS320, Fortran C -omp, OpenMP., GS320.f90 Fortran.,.f.for Fortran., -o example. kyu-ss% f90 -omp example.f90 -o example kyu-ss% cc -omp example.c -o example 26
GS320, -check omp_bindings (C -check_omp), OpenMP., OpenMP. 5.2.2 GP7000F GP7000F, Fortran, C, C++ -KOMP, OpenMP., GP7000F.f90.f95 Fortran.,.f.for Fortran., -o example. Fortran: kyu-cc% frt -KOMP example.f90 -o example C: kyu-cc% fcc -KOMP example.c -o example C++: kyu-cc% FCC -KOMP example.cc -o example Fortran OpenMP,. 27
-Kspinwait -Knospinwait -Kthreadstacksize=N -Kspinwait, CPU, CPU.,,. -KOMP,. -Kspinwait -Knospinwait, -Kspinwait. -Knospinwait,, CPU. CPU., CPU,. -KOMP,. -Kthreadstacksize=N, K (1 N 2147483647).,.,. -KOMP,., THREAD_STACK_SIZE. 5.3 OpenMP OpenMP,,., OpenMP,. OMP_NUM_THREADS. omp_set_num_threads(),., UNIX GP7000F, THREAD_STACK_SIZE. K 1 2147483647. 5.4,., UNIX timex., OpenMP (ex1-seri), OpenMP (ex1-para) timex,., OMP_NUM_THREADS 2, UNIX. 28
kyu-cc% timex./ex1-seri real 15.93 user 10.94 sys 4.40 kyu-cc% timex./ex1-para real 10.54 user 10.99 sys 4.46 real ( ). user CPU, sys CPU. OpenMP 1CPU, 15.93. OpenMP, OMP_NUM_THREADS 2 2CPU, 10.54., 15.93/10.54 = 1.51., CPU 5..,. CPU,.,, UNIX GP7000F ( : kyu-cc.cc.kyushu-u.ac.jp) sc8 sc32., qsub.,,,.,. [1] Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D. and McDonald, J.: Parallel Programming in OpenMP, Morgan Kaufmann Publishers, 2000. 2001 9 OpenMP [2] Geist, A., Beguelin, A., Dognarra, J., Jiang, W., Manchek, R. and Sunderam, V.: PVM: Parallel Virtual Machine A Users Guide and Tutorial for Networked Parallel Computing, The MIT Press, 1994. PVM 5, CPU. 29
PVM Web. http://www.cc.kyushu-u.ac.jp/scp/system/library/mpl/pvm.html [3] High Performance Fortran Forum,,,,,, NEC: High Performance Fortran2.0,, 1999. HPF 2.0,. HPF Web. http://www.cc. kyushu-u.ac.jp/scp/system/library/fortran/hpf.html [4] OpenMP Architecture Review Board: OpenMP Fortran Application Program Interface, http://www.openmp.org/specs/mp-documents/fspec10.pdf, October 1997. Fortran OpenMP 1.0 ( ) http://www.openmp.org/specs/ Fortran 2.0 C(C++) http://pdplab.trc.rwcp.or.jp/pdperf/omni/spec.ja/home.html (RWCP) [5] Pacheco, P. S.: Parallel Programming with MPI, Morgan Kaufmann Publishers, 1997. P. / MPI 2001 7 MPI Web. http://www.cc.kyushu-u.ac.jp/scp/ system/library/mpl/mpi.html [6] UXP/V VPP Fortran V20 1999 VPP Fortran web MPI Web. http://www.cc.kyushu-u.ac.jp/scp/system/ library/fortran/vpp_fortran.html 30