OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

Similar documents
OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

01_OpenMP_osx.indd

02_C-C++_osx.indd

コードのチューニング

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

演習1: 演習準備

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

11042 計算機言語7回目 サポートページ:

卒業論文

openmp1_Yaguchi_version_170530

Microsoft Word - 計算科学演習第1回3.doc

ex01.dvi

OpenMPプログラミング

C

ex01.dvi

Microsoft PowerPoint - OpenMP入門.pptx

Microsoft Word - openmp-txt.doc

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

スパコンに通じる並列プログラミングの基礎

nakao

XMPによる並列化実装2

スパコンに通じる並列プログラミングの基礎

MPI usage

XcalableMP入門

C言語によるアルゴリズムとデータ構造

develop

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1

120802_MPI.ppt

スパコンに通じる並列プログラミングの基礎

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

/* do-while */ #include <stdio.h> #include <math.h> int main(void) double val1, val2, arith_mean, geo_mean; printf( \n ); do printf( ); scanf( %lf, &v

2012年度HPCサマーセミナー_多田野.pptx

C C UNIX C ( ) 4 1 HTML 1

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

untitled

enshu5_4.key

1.ppt

NUMAの構成

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

sim98-8.dvi

Intel® Compilers Professional Editions

±é½¬£²¡§£Í£Ð£É½éÊâ

Microsoft PowerPoint - KHPCSS pptx

2 2.1 Mac OS CPU Mac OS tar zxf zpares_0.9.6.tar.gz cd zpares_0.9.6 Mac Makefile Mekefile.inc cp Makefile.inc/make.inc.gfortran.seq.macosx make

I117 II I117 PROGRAMMING PRACTICE II SOFTWARE DEVELOPMENT ENV. 1 Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara

040312研究会HPC2500.ppt

解きながら学ぶC言語

演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 RIKEN AICS HPC Spring School /3/5

ohp1.dvi

Second-semi.PDF

P05.ppt

導入基礎演習.ppt

8 / 0 1 i++ i 1 i-- i C !!! C 2

: (1), ( ) 1 1.1,, 1 OpenMP [3, 5, 21, 22], MPI [13, 18, 23].., (C Fortran)., OS,. C Fortran,,,,. ( ),,.,,.,,,.,,,.,.,. 1

10/ / /30 3. ( ) 11/ 6 4. UNIX + C socket 11/13 5. ( ) C 11/20 6. http, CGI Perl 11/27 7. ( ) Perl 12/ 4 8. Windows Winsock 12/11 9. JAV

WinHPC ppt

2 /83

1重谷.PDF

AICS 村井均 RIKEN AICS HPC Summer School /6/2013 1

PowerPoint プレゼンテーション

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

インテル(R) Visual Fortran Composer XE 2011 Windows版 入門ガイド

untitled

次世代スーパーコンピュータのシステム構成案について

all.dvi

A/B (2010/10/08) Ver kurino/2010/soft/soft.html A/B

(1/2) 2/45 HPC top runner application programmer PC-9801F N88-BASIC Quick BASIC + DOS ( ) BCB Windows Percolation, Event-driven MD ActionScript Flash

インテル(R) Visual Fortran Composer XE

¥×¥í¥°¥é¥ß¥ó¥°±é½¬I Exercise on Programming I [1zh] ` `%%%`#`&12_`__~~~ alse

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

para02-2.dvi

I I / 47

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

Microsoft PowerPoint - 演習1:並列化と評価.pptx

超初心者用

NUMAの構成

joho09.ppt

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

1-4 int a; std::cin >> a; std::cout << "a = " << a << std::endl; C++( 1-4 ) stdio.h iostream iostream.h C++ include.h 1-4 scanf() std::cin >>

main.dvi

A Responsive Processor for Parallel/Distributed Real-time Processing

A/B (2018/10/19) Ver kurino/2018/soft/soft.html A/B

3. :, c, ν. 4. Burgers : t + c x = ν 2 u x 2, (3), ν. 5. : t + u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,., ν. t +

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E >

Informatics 2010.key

プラズマ核融合学会誌5月号【81-5】/内外情報_ソフト【注:欧フォント特殊!】

num2.dvi

pptx

debug ( ) 1) ( ) 2) ( ) assert, printf ( ) Japan Advanced Institute of Science and Technology

untitled

07-二村幸孝・出口大輔.indd

tuat1.dvi

Itanium2ベンチマーク

I ASCII ( ) NUL 16 DLE SP P p 1 SOH 17 DC1! 1 A Q a q STX 2 18 DC2 " 2 B R b

C による数値計算法入門 ( 第 2 版 ) 新装版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 新装版 1 刷発行時のものです.

The 3 key challenges in programming for MC

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

Transcription:

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1

( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5] PVM[2] 3. 1 MPI PVM ( 1(b)) OpenMP Fortran (C, C++ ) MPI OpenMP MPI 1 Thinking Machines CM-5 C C* CM-5 2

VPP Fortran MPI OpenMP GP7000F, GS320 OpenMP Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D. and McDonald, J.: Parallel Programming in OpenMP, Morgan Kaufmann Publishers, 2000. OpenMP 2001 9 OpenMP http://www.openmp.org/ OpenMP web OpenMP OpenMP FAQ (Frequently Asked Questions, ) http://pdplab.trc.rwcp.or.jp/pdperf/omni/spec.ja/home.html (RWCP) ( ) 3 OpenMP OpenMP (1) ( ) OpenMP OpenMP OpenMP (2) ( ) (Fortran DO C(C++) for ) OpenMP OpenMP (3) ( ) OpenMP 1 2 OpenMP OpenMP OpenMP 3,. 4,, OpenMP. 5, OpenMP,. 3

,. OpenMP, Fortran, Fortran C., C., Fortran, OpenMP. 2 OpenMP 2.1 OpenMP OpenMP OpenMP OpenMP Architecture Review Board (ARB) OpenMP OpenMP ARB OpenMP 2001 6 SPEC OpenMP OpenMP 2.2 OpenMP OpenMP Fortran C(C++) pragma pragma OpenMP 4

OpenMP 2 program sequential...... end proguram sequential program parallel...!omp parallel!omp end parallel... end program parallel 2: OpenMP Fortran OpenMP OpenMP ( ) OMP_NUM_THREADS 2 2 4 5

OMP_NUM_THREADS 4 OpenMP OMP_GET_THREAD_NUM() if OpenMP 3 1. 2. 3. 2.3 OpenMP 1 OpenMP 1. MPI PVM 2. OpenMP 3. OpenMP 4. OpenMP (OpenMP ARB) 6

MPI PVM OpenMP 3,, OpenMP.,.,.. 3.1,,., 2 3.,., ( ).,,.,,,.,.,. GS320 UNIX GP7000F,, 1. 3.2,,.,,.,,,.,, 7

1: GS320 (kyu-ss.cc.kyushu-u.ac.jp) -check bounds -check format -check overflow -check underflow -tune ev6 -arch ev6 GS320 CPU EV68. -unroll N.. (N :. 6, 8 16. ) -fast.. ( 15 ) -O5 20 30.,.,,. UNIX GP7000F model 900 (kyu-cc.cc.kyushu-u.ac.jp) -Haesux -Kfast -Keval -Kfast_GP=2,prefetch=4 -O4 Solaris, C/C++ -Kfast_GP=2,prefetch -Kmfunc. 8

,,,. 4 4.1 OpenMP OpenMP,,,., OpenMP OpenMP,., OpenMP OpenMP,. 4.1.1 OpenMP OpenMP,... OpenMP 2.!$omp #pragma omp OpenMP (Fortran ).,. 2 Fortran parallel do, shared(a). C/C++ parallel for. OpenMP,., OpenMP, OpenMP OpenMP,., OpenMP, OpenMP OpenMP.,. OpenMP, OpenMP,. OpenMP, OpenMP. OpenMP, OpenMP ( 3).,. 9

2: OpenMP Fortran ( )!$omp OpenMP. (. ) :!$omp parallel do shared(a) Fortran ( )!$omp c$omp *$omp OpenMP. 6 0 OpenMP. : c$omp parallel do shared(a) C/C++ #pragma omp OpenMP. (. ) : #pragma omp parallel for shared(a) Fortran( ) Fortran( ) C/C++ 3: OpenMP!$,!$ OpenMP,. :!$ call parallel_init(a)!$ c$ *$ OpenMP,. :!$ call parallel_init(a) OpenMP _OPENMP. : #ifdef _OPENMP parallel_init(a); #else serial_init(a); #endif 10

4.1.2 OpenMP,.,.,,,.,.,. OpenMP C C++,. #include <omp.h>, Fortran OpenMP,,., omp_get_num_threads(). integer omp_get_num_threads 4.1.3 OpenMP 3 OpenMP,. Fortran, 2 OpenMP.!$omp parallel!$omp end parallel parallel, end parallel. Fortran OpenMP, parallel end parallel., C/C++. #pragma omp parallel parallel. C, C++ OpenMP, parallel.,. 11

Fortran program hello implicit none integer omp_get_thread_num print *, " "!$omp parallel print *, ". ", omp_get_thread_num()!$omp end parallel print *, " " end program hello C #include <stdio.h> #include <omp.h> main() { printf(" \n"); #pragma omp parallel { printf(". %d\n", omp_get_thread_num()); } printf(" \n"); } 3: parallel 12

3 :.. 1. 2. 3. 0., 4., parallel 4. print *, ". ", omp_get_thread_num() 4.,. program hello integer omp_get_thread_num print *, " "!$omp parallel print *, " "...!$omp end parallel print *, " " print *, " "... print *, " "... print *, " "... print *, " "... print *, " " end print *, " " 4: 3,., OpenMP, 1.,. print (printf ),. OpenMP,..,,., omp_set_num_threads(). 13

, OMP_NUM_THREADS.,., omp_get_thread_num() OpenMP,. 4, print.,. OpenMP,. OpenMP. 1.. 2.. 3. OpenMP. 4.. 5.. 6. 2 5.,,. 4.2, OpenMP parallel do (C/C++ parallel for ). 5, OpenMP., x a, y z. 1, i 1 100 1,.,, 6 OpenMP, i. 4, 7. i 4, i =1 25 0, i =26 50 1, i =51 75 2, i =76 100 3,, 3., 1/4. parallel do (parallel for ), do (for ),. 3 OpenMP, 7 0. 14

Fortran program ex1 implicit none integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = 0.0 x(i) = 2.0 end do a = 4.0 y = 1.0 call daxpy(z, a, x, y) end program ex1 subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = a * x(i) + y end do return end C #include <stdio.h> #include <omp.h> main() { int i; double z[100], a, x[100], y; } for (i = 0; i < 100; i++){ z[i] = 0.0; x[i] = 2.0; } a = 4.0; y = 1.0; daxpy(z, a, x, y); void daxpy(z, a, x, y) double z[], a, x[], y; { int i; for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 5: 15

Fortran program ex1 implicit none integer i double precision z(100), a, x(100), y do i = 1, 100 z(i) = 0.0 x(i) = 2.0 end do a = 4.0 y = 1.0 call daxpy(z, a, x, y) end program ex1 subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do return end C #include <stdio.h> #include <omp.h> main() { int i; double z[100], a, x[100], y; } for (i = 0; i < 100; i++){ z[i] = 0.0; x[i] = 2.0; } a = 4.0; y = 1.0; daxpy(z, a, x, y); void daxpy(z, a, x, y) double z[], a, x[], y; { int i; #pragma omp parallel for for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 6: OpenMP 16

subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do i = 1 to 25 z(i) = a * x(i) + y i = 26 to 50 z(i) = a * x(i) + y i = 51 to 75 z(i) = a * x(i) + y i = 76 to 100 z(i) = a * x(i) + y 0 1 2 3 return end 7: 6, parallel do (parallel for ).,.. do i = 2, 100 z[i] = z[i] + z[i - 1] end do, z[i] z[i-1]. z[i-1], i, i. i parallel do (parallel for ), i,,., parallel for (parallel do ),., OpenMP (2). 4.3, OpenMP., OpenMP. 6, x, a, y, z., x[1] x[1]., 0 x[1], 1 2 x[1]., x. a, y., z. 0 a*x[1]+y 17

3.0, z[1], z[1] 3.0., z., i. i, z., i, i,. 0 i=1, i 1 i 26. 0, i 25,,.,,., i,., i,., 0 i 1 25, 1 i 26 50. subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y z a x y i!$omp parallel do do i = 1, 100 z(i) = a * x(i) + y end do i i i i return end 8: 8.,,., x, z a, y,., i. i, i. OpenMP,,.,., shared.!$omp parallel shared(a) 18

,, private.!$omp parallel private(a) shared private,, shared private.,,( ),.!$omp parallel shared(a, b, c) private(d, e), 6 parallel do (parallel for ),., OpenMP. OpenMP,.,,,. 6,, 9. Fortran subroutine daxpy(z, a, x, y) integer i double precision z(100), a, x(100), y!$omp parallel do shared(z, a, x, y) private(i) do i = 1, 100 z(i) = a * x(i) + y end do return end C void daxpy(z, a, x, y) double z[], a, x[], y; { int i; #pragma omp parallel for shared(z, a, x, y) private(i) for (i = 0; i < 100; i++) z[i] = a * x[i] + y; } 9:, OpenMP, shared. private, 19

. C/C++ OpenMP, 4. 2,.,,.,,. 2, OpenMP 10. Fortran subroutine matvec(a, x, y) integer i, j double precision a(100, 100), x(100), y(100)!$omp parallel do private(j) do i = 1, 100 do j = 1, 100 y(i) = a(j, i) * x(i) end do end do return end C void matvec(a, x, y) double a[][100], x[], y[]; { int i, j; #pragma omp parallel for private(j) for (i = 0; i < 100; i++) for (j = 0; j < 100; j++) y[i] += a[i][j] * x[i]; } 10: OpenMP 4.4, 11., 6 parallel do (parallel for ) i 4 Fortran OpenMP,,., C C++ Fortran,. 20

Fortran function total(x) integer i double precision t, total, x(100) t = 0.0 do i = 1, 100 t = t + x(i) end do total = t end C double total(x) double x[]; { int i; double t; t = 0.0; for (i = 0; i < 100; i++) t += x[i]; return t; } 11: ( )., 4 25,.,, 11 x t., i., t., x 1.0., x, t 100.0.,.,,.,. 1: t. 2: x[i]. 3:. 4: t., 12(a)., 12(b) 21

i=1 1:t 2:x[0] 3: 4: t i=2 1:t 2:x[1] 3: 4: t i=3 1:t 2:x[2] 3: 4: t... i=1 1:t 2:x[0] 3: 4: t i=2 1:t 2:x[1] 3: 4: t i=3 1:t 2:x[2] 3: 4: t... i=26 1:t 2:x[25] 3: 4: t i=27 1:t 2:x[26] 3: 4: t i=28 1:t 2:x[27] 3: 4: t... i=51 1:t 2:x[50] 3: 4: t i=52 1:t 2:x[51] 3: 4: t i=53 1:t 2:x[52] 3: 4: t... i=76 1:t 2:x[75] 3: 4: t i=77 1:t 2:x[76] 3: 4: t i=78 1:t 2:x[77] 3: 4: t... (a) (b) 12:, OpenMP,,,., 1 4, 1., t. t 12.0 0 1. x[i] 1.0, t 13.0. t,, t 13.0.,,., 100.0.,. OpenMP,.., 1.,,. 11 t = t + x(i), t t t,., i 1,.,., 22

, OpenMP.,.,,.,.,,.,,,,., parallel for reduction. 11 13., reduction reduction(+:t), + ( ) t. reduction(+:t,u,v),( ). t, t,., t x[i]. i, t, t., i,,. 5 OpenMP, UNIX OpenMP,.,. 5.1 OpenMP 2,. 5.1.1 COMPAQ GS320 COMPAQ GS320,. GS320 Alpha 21264 (731MHz) CPU. CPU,.., GS320 64GByte, 1,, 16GByte. 23

Fortran function total(x) integer i double precision t, total, x(100) t = 0.0!$omp parallel do reduction(+:t) do i = 1, 100 t = t + x(i) end do total = t return end C double total(x) double x[]; { int i; double t; t = 0.0; #pragma omp parallel for reduction(+:t) for (i = 0; i < 100; i++) t += x[i]; return t; } 13: ( ) 24

COMPAQ GS320 UNIX GP7000F model 900 GS320 OpenMP Fortran C. OpenMP C++., GS320 C Fortran. OpenMP. OpenMP, OpenMP. GS320,, MPI, PVM, HPF., VPP5000 UNIX GP7000F model 900. Compaq Tru64 UNIX Digital UNIX Alpha CPU GS320., Alpha GS320,.,, UNIX kyu-cc.cc.kyushu-u.ac.jp, touroku. kyu-cc% touroku kyu-ss Password: kyu-cc... kyu-cc%,. kyu-ss.cc.kyushu-u.ac.jp 25

5.1.2 UNIX GP7000F model 900 UNIX GP7000F model 900,. GP7000F model 900, SPARC64-GP (300MHz) 64. SPARC, OS( ) Solaris 7, SPARC,., GP7000F model 900 64GByte, 1,, 32GByte. GP7000F model 900 OpenMP Fortran, C C++,, C Fortran.,,, MPI, VPP5000 GS320. UNIX,. kyu-cc.cc.kyushu-u.ac.jp 5.2 OpenMP, OpenMP. OpenMP, OpenMP,., OpenMP,. 5.2.1 GS320 GS320, Fortran C -omp, OpenMP., GS320.f90 Fortran.,.f.for Fortran., -o example. kyu-ss% f90 -omp example.f90 -o example kyu-ss% cc -omp example.c -o example 26

GS320, -check omp_bindings (C -check_omp), OpenMP., OpenMP. 5.2.2 GP7000F GP7000F, Fortran, C, C++ -KOMP, OpenMP., GP7000F.f90.f95 Fortran.,.f.for Fortran., -o example. Fortran: kyu-cc% frt -KOMP example.f90 -o example C: kyu-cc% fcc -KOMP example.c -o example C++: kyu-cc% FCC -KOMP example.cc -o example Fortran OpenMP,. 27

-Kspinwait -Knospinwait -Kthreadstacksize=N -Kspinwait, CPU, CPU.,,. -KOMP,. -Kspinwait -Knospinwait, -Kspinwait. -Knospinwait,, CPU. CPU., CPU,. -KOMP,. -Kthreadstacksize=N, K (1 N 2147483647).,.,. -KOMP,., THREAD_STACK_SIZE. 5.3 OpenMP OpenMP,,., OpenMP,. OMP_NUM_THREADS. omp_set_num_threads(),., UNIX GP7000F, THREAD_STACK_SIZE. K 1 2147483647. 5.4,., UNIX timex., OpenMP (ex1-seri), OpenMP (ex1-para) timex,., OMP_NUM_THREADS 2, UNIX. 28

kyu-cc% timex./ex1-seri real 15.93 user 10.94 sys 4.40 kyu-cc% timex./ex1-para real 10.54 user 10.99 sys 4.46 real ( ). user CPU, sys CPU. OpenMP 1CPU, 15.93. OpenMP, OMP_NUM_THREADS 2 2CPU, 10.54., 15.93/10.54 = 1.51., CPU 5..,. CPU,.,, UNIX GP7000F ( : kyu-cc.cc.kyushu-u.ac.jp) sc8 sc32., qsub.,,,.,. [1] Chandra, R., Menon, R., Dagum, L., Kohr, D., Maydan, D. and McDonald, J.: Parallel Programming in OpenMP, Morgan Kaufmann Publishers, 2000. 2001 9 OpenMP [2] Geist, A., Beguelin, A., Dognarra, J., Jiang, W., Manchek, R. and Sunderam, V.: PVM: Parallel Virtual Machine A Users Guide and Tutorial for Networked Parallel Computing, The MIT Press, 1994. PVM 5, CPU. 29

PVM Web. http://www.cc.kyushu-u.ac.jp/scp/system/library/mpl/pvm.html [3] High Performance Fortran Forum,,,,,, NEC: High Performance Fortran2.0,, 1999. HPF 2.0,. HPF Web. http://www.cc. kyushu-u.ac.jp/scp/system/library/fortran/hpf.html [4] OpenMP Architecture Review Board: OpenMP Fortran Application Program Interface, http://www.openmp.org/specs/mp-documents/fspec10.pdf, October 1997. Fortran OpenMP 1.0 ( ) http://www.openmp.org/specs/ Fortran 2.0 C(C++) http://pdplab.trc.rwcp.or.jp/pdperf/omni/spec.ja/home.html (RWCP) [5] Pacheco, P. S.: Parallel Programming with MPI, Morgan Kaufmann Publishers, 1997. P. / MPI 2001 7 MPI Web. http://www.cc.kyushu-u.ac.jp/scp/ system/library/mpl/mpi.html [6] UXP/V VPP Fortran V20 1999 VPP Fortran web MPI Web. http://www.cc.kyushu-u.ac.jp/scp/system/ library/fortran/vpp_fortran.html 30