¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

Similar documents
目 目 用方 用 用 方

120802_MPI.ppt

Microsoft PowerPoint - KHPCSS pptx

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

±é½¬£²¡§£Í£Ð£É½éÊâ

XcalableMP入門

MPI usage

WinHPC ppt

コードのチューニング

Microsoft PowerPoint - 演習2:MPI初歩.pptx

Microsoft PowerPoint - 講義:コミュニケータ.pptx

Microsoft Word - 計算科学演習第1回3.doc

NUMAの構成

演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 RIKEN AICS HPC Spring School /3/5

Microsoft PowerPoint - 講義:片方向通信.pptx

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裱£²²ó ¡Ý½ÉÂꣲ¤Î²òÀ⡤±é½¬£²¡Ý

115 9 MPIBNCpack 9.1 BNCpack 1CPU X = , B =

ex01.dvi

para02-2.dvi

DKA ( 1) 1 n i=1 α i c n 1 = 0 ( 1) 2 n i 1 <i 2 α i1 α i2 c n 2 = 0 ( 1) 3 n i 1 <i 2 <i 3 α i1 α i2 α i3 c n 3 = 0. ( 1) n 1 n i 1 <i 2 < <i

講義の流れ 並列プログラムの概要 通常のプログラムと並列プログラムの違い 並列プログラム作成手段と並列計算機の構造 OpenMP による並列プログラム作成 処理を複数コアに分割して並列実行する方法 MPI による並列プログラム作成 ( 午後 ) プロセス間通信による並列処理 処理の分割 + データの

nakao

MPI

MPI コミュニケータ操作

課題 S1 解説 C 言語編 中島研吾 東京大学情報基盤センター

<4D F736F F F696E74202D C097F B A E B93C782DD8EE682E890EA97705D>

untitled

ex01.dvi

コードのチューニング

untitled

chap2.ppt

2012年度HPCサマーセミナー_多田野.pptx

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E >

num2.dvi

86 8 MPIBNCpack 15 : int n, myid, numprocs, i; 16 : double pi, start_x, end_x; 17 : double startwtime = 0.0, endwtime; 18 : int namelen; 19 : char pro

I I / 47

XMPによる並列化実装2

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

情報処理概論(第二日目)

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1

演習 II 2 つの講義の演習 奇数回 : 連続系アルゴリズム 部分 偶数回 : 計算量理論 部分 連続系アルゴリズム部分は全 8 回を予定 前半 2 回 高性能計算 後半 6 回 数値計算 4 回以上の課題提出 ( プログラム + 考察レポート ) で単位

スライド 1

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru

スライド 1

01_OpenMP_osx.indd

C

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

58 7 MPI 7 : main(int argc, char *argv[]) 8 : { 9 : int num_procs, myrank; 10 : double a, b; 11 : int tag = 0; 12 : MPI_Status status; 13 : 1 MPI_Init

XACC講習会

GNU開発ツール

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

£Ã¥×¥í¥°¥é¥ß¥ó¥°(2018) - Âè11²ó – ½ÉÂꣲ¤Î²òÀ⡤±é½¬£² –

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

スライド 1

内容に関するご質問は まで お願いします [Oakforest-PACS(OFP) 編 ] 第 85 回お試しアカウント付き並列プログラミング講習会 ライブラリ利用 : 科学技術計算の効率化入門 スパコンへのログイン テストプログラム起動 東京大学情報基盤セ

11042 計算機言語7回目 サポートページ:

C言語によるアルゴリズムとデータ構造

Microsoft PowerPoint _MPI-03.pptx

Microsoft PowerPoint 並列アルゴリズム04.ppt

1.overview

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裵²ó ¨¡ À©¸æ¹½Â¤¡§¾ò·ïʬ´ô ¨¡


all.dvi

86

スライド 1

08 p Boltzmann I P ( ) principle of equal probability P ( ) g ( )g ( 0 ) (4 89) (4 88) eq II 0 g ( 0 ) 0 eq Taylor eq (4 90) g P ( ) g ( ) g ( 0

tuat1.dvi

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

r07.dvi

£Ã¥×¥í¥°¥é¥ß¥ó¥°ÆþÌç (2018) - Â裶²ó ¨¡ À©¸æ¹½Â¤¡§·«¤êÊÖ¤· ¨¡

Microsoft PowerPoint _MPI-01.pptx

ohp07.dvi

untitled

スライド 1

<4D F736F F F696E74202D C097F B A E B93C782DD8EE682E890EA97705D>

sim98-8.dvi

Microsoft PowerPoint MPI.v...O...~...O.e.L.X.g(...Q..)

1.ppt

Sae x Sae x 1: 1. {x (i) 0 0 }N i=1 (x (i) 0 0 p(x 0) ) 2. = 1,, T a d (a) i (i = 1,, N) I, II I. v (i) II. x (i) 1 = f (x (i) 1 1, v(i) (b) i (i = 1,

演習準備

Fujitsu Standard Tool

040312研究会HPC2500.ppt

新版明解C言語 実践編

PowerPoint プレゼンテーション

4th XcalableMP workshop 目的 n XcalableMPのローカルビューモデルであるXMPのCoarray機能を用 いて Fiberミニアプリ集への実装と評価を行う PGAS(Pertitioned Global Address Space)言語であるCoarrayのベ ンチマ

課題 S1 解説 Fortran 編 中島研吾 東京大学情報基盤センター

1F90/kouhou_hf90.dvi

[1] #include<stdio.h> main() { printf("hello, world."); return 0; } (G1) int long int float ± ±

/* do-while */ #include <stdio.h> #include <math.h> int main(void) double val1, val2, arith_mean, geo_mean; printf( \n ); do printf( ); scanf( %lf, &v

Java updated

C 2 / 21 1 y = x 1.1 lagrange.c 1 / Laglange / 2 #include <stdio.h> 3 #include <math.h> 4 int main() 5 { 6 float x[10], y[10]; 7 float xx, pn, p; 8 in

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf ("%s", str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i

Microsoft PowerPoint - MPIprog-C1.ppt [互換モード]

演習問題の構成 ディレクトリ構成 MPI/ --practice_1 演習問題 1 --practice_2 演習問題 2 --practice_3 演習問題 3 --practice_4 演習問題 4 --practice_5 演習問題 5 --practice_6 演習問題 6 --sample

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

A/B (2010/10/08) Ver kurino/2010/soft/soft.html A/B

9 8 7 (x-1.0)*(x-1.0) *(x-1.0) (a) f(a) (b) f(a) Figure 1: f(a) a =1.0 (1) a 1.0 f(1.0)

Intel® Compilers Professional Editions

untitled

Transcription:

Rhpc COM-ONE 2015 R 27 12 5 1 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 2 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 3 / 29

Rhpc, R HPC Rhpc, ( ), snow..., Rhpc worker call Rhpc lapply 4 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 5 / 29

Rhpc Rhpc SPMD...apply MPI Embedding R (libr ) Windows 6 / 29

Rhpc Rhpc 1 MPI Rhpc initialize Rhpc gethandle Rhpc finalize Rhpc numberofworker( ) Rhpc worker Rhpc worker call Rhpc Export Rhpc EvalQ 7 / 29

Rhpc Rhpc 2 Apply Rhpc lapply Rhpc lapplylb Rhpc setuprng Rhpc worker noback ( : MPI ) 8 / 29

Rhpc Rhpc 3 lapply Rhpc apply Rhpc sapply Rhpc sapplylb ( ) Rhpc serialize, Rhpc unserialize Rhpc enquote, Rhpc splitlist 9 / 29

Rhpc Many workers example (1): Rhpc Export and parallel::clusterexport(mpi) Export performance Rhpc::Rhpc_Export parallel::clusterexport(rmpi) sec 0 20 40 60 0 50 100 150 Number of workers 10 / 29

Rhpc Many workers example (2A): Rhpc lapply* and parallel::clusterapply*(mpi) SQRT performance 1 sec 0 20 40 60 80 100 120 140 Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB parallel::clusteraapply(rmpi+patch) parallel::clusteraapplylb(rmpi+patch) parallel::clusteraapply(rmpi) parallel::clusteraapplylb(rmpi) 0 50 100 150 Number of workers 11 / 29

Rhpc Many workers example (2B): Rhpc lapply* and parallel::clusterapply*(mpi) SQRT performance 2 sec 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB parallel::clusteraapply(rmpi+patch) parallel::clusteraapplylb(rmpi+patch) 0 50 100 150 Number of workers 12 / 29

Rhpc Many workers example (2C): Rhpc lapply* and parallel::clusterapply*(mpi) SQRT performance 3 sec 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Rhpc::Rhpc_lapply Rhpc::Rhpc_lapplyLB 0 50 100 150 Number of workers 13 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 14 / 29

MPI MPI (C Fortran ) Master(rank0) Worker(rank1 ), SPMD. Rhpc MPI, MPI. Rhpc MPI. Rhpc lapply MPI. Rhpc, Rhpc worker noback. 15 / 29

Rhpc options Rhpc MPI options (options ). Rhpc.mpi.f.comm Fortran (R : ) Rhpc.mpi.c.comm C (R : ) Rhpc.mpi.procs MPI Rhpc.mpi.rank MPI 16 / 29

call of using.fortran,.c and.call from R Fortran C MPI R 1 mpipif<-function(n) 2 { 3 ## Exported functions get values by getoption() 4 ## when they run on workers 5 out<-.fortran("mpipif", 6 comm=getoption("rhpc.mpi.f.comm"), 7 n=as.integer(n), 8 outpi=as.double(0)) 9 out$outpi 10 } 1 mpipic<-function(n) 2 { 3 ## Exported functions get values by getoption() 4 ## when they run on workers 5 out<-.c("mpipic", 6 comm=getoption("rhpc.mpi.f.comm"), 7 n=as.integer(n), 8 outpi=as.double(0)) 9 out$outpi 10 } 1 mpipicall<-function(n) 2 { 3 ## Exported functions get values by getoption() 4 ## when they run on workers 5 out<-.call("mpipicall", 6 comm=getoption("rhpc.mpi.c.comm"), 7 n=as.integer(n)) 8 out 9 }.C R Fortran. C.Call. see help(.c) 17 / 29

Changing MPI Fortran code for.fortran in R. program main subroutine mpipif(mpi_comm,n,outpi) include "mpif.h" include "mpif.h" double precision mypi, sumpi double precision mypi, sumpi double precision h, sum, x, f, a double precision h, sum, x, f, a double precision pi double precision pi parameter (pi=3.14159265358979323846) parameter (pi=3.14159265358979323846) integer n, rank, procs, i, ierr integer n, rank, procs, i, ierr character*16 argv integer mpi_comm integer argc double precision outpi f(a) 4.d0 / (1.d0 + a*a) argc = COMMAND_ARGUMENT_COUNT() < f(a) = 4.d0 / (1.d0 + a*a) n=0 if (argc.ge. 1) then < call getarg(1, argv) < read(argv,*) n < endif < c call MPI_INIT(ierr) < COMM c COMM call MPI_COMM_RANK(MPI_COMM_WORLD, call MPI_COMM_RANK(mpi_comm, & rank, ierr) & rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, call MPI_COMM_SIZE(mpi_comm, & procs, ierr) & procs, ierr) call MPI_BCAST(n,1,MPI_INTEGER,0, call MPI_BCAST(n,1,MPI_INTEGER,0, & MPI_COMM_WORLD,ierr) & mpi_comm,ierr) if ( n.le. 0 ) goto 30 if ( n.le. 0 ) goto 30 h = 1.0d0/n h = 1.0d0/n sum = 0.0d0 sum = 0.0d0 do 20 i = rank+1, n, procs do 20 i = rank+1, n, procs 20 x = h * (dble(i) - 0.5d0) x = h * (dble(i) - 0.5d0) sum = sum + f(x) sum = sum + f(x) continue 20 continue mypi = h * sum mypi = h * sum call MPI_REDUCE(mypi,sumpi,1, call MPI_REDUCE(mypi,sumpi,1, & MPI_DOUBLE_PRECISION, MPI_SUM,0, & MPI_DOUBLE_PRECISION, MPI_SUM,0, & MPI_COMM_WORLD,ierr) & mpi_comm,ierr) if (rank.eq. 0) then outpi=sumpi print *, pi =, sumpi 30 continue 30 endif return call MPI_FINALIZE(ierr) stop < end end 18 / 29

Changing MPI C code for.c in R. #include "mpi.h" #include "mpi.h" #include <stdio.h> #include <stdio.h> #include <math.h> #include <math.h> #include <R.h> > #include <Rinternals.h> int main( int argc, char *argv[] ) int mpipic( int *comm, int *N, double *outpi ) { > { MPI_Comm mpi_comm; int n=0, rank, procs, i; int n=0, rank, procs, i; double mypi, pi, h, sum, x; double mypi, pi, h, sum, x; if ( argc >= 2){ mpi_comm = MPI_Comm_f2c(*comm); n = atoi(argv[1]); n = *N; } < MPI_Init(&argc,&argv); < // COMM MPI_Comm_size(MPI_COMM_WORLD,&procs); // COMM MPI_Comm_size(mpi_comm, &procs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_rank(mpi_comm, &rank); MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(&n, 1, MPI_INT, 0, mpi_comm); h 1.0 / (double) n; sum = 0.0; h 1.0 / (double) n; sum = 0.0; for (i = rank + 1; i <= n; i += procs) { for (i = rank + 1; i <= n; i += procs) { x = h * ((double)i - 0.5); x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); sum += (4.0 / (1.0 + x*x)); } mypi = h * sum; } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); mpi_comm); if (rank == 0) *outpi=pi; printf("pi = %.16f\n", pi); < MPI_Finalize(); < return(0); return(0); } } 19 / 29

Changing MPI C code for.call in R. #include "mpi.h" #include "mpi.h" #include <stdio.h> #include <stdio.h> #include <math.h> #include <math.h> #include <R.h> > #include <Rinternals.h> int main( int argc, char *argv[] ) SEXP mpipicall(sexp comm, SEXP N) { > { MPI_Comm mpi_comm; int n=0, rank, procs, i; > SEXP ret; int n=0, rank, procs, i; double mypi, pi, h, sum, x; double mypi, pi, h, sum, x; if ( argc >= 2){ mpi_comm = *((MPI_Comm*)R_ExternalPtrAddr(comm)); n = atoi(argv[1]); PROTECT(ret=allocVector(REALSXP,1)); } n = INTEGER(N)[0]; MPI_Init(&argc,&argv); < // COMM MPI_Comm_size(MPI_COMM_WORLD,&procs); // COMM MPI_Comm_size(mpi_comm, &procs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_rank(mpi_comm, &rank); MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); MPI_Bcast(&n, 1, MPI_INT, 0, mpi_comm ); h 1.0 / (double) n; sum = 0.0; h 1.0 / (double) n; sum = 0.0; for (i = rank + 1; i <= n; i += procs) { for (i = rank + 1; i <= n; i += procs) { x = h * ((double)i - 0.5); x = h * ((double)i - 0.5); sum += (4.0 / (1.0 + x*x)); sum += (4.0 / (1.0 + x*x)); } mypi = h * sum; } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); mpi_comm ); if (rank == 0) REAL(ret)[0]=pi; printf("pi = %.16f\n", pi); UNPROTECT(1); MPI_Finalize(); return(ret); return(0); < } } 20 / 29

Call foreign MPI program from R 1 source("mpipicall.r") 2 source("mpipic.r") 3 source("mpipif.r") 4 5 library(rhpc) 6 Rhpc_initialize() 7 cl<-rhpc_gethandle(4) 8 9 n<-100 10 11 ## Load shared library 12 Rhpc_worker_call(cl,dyn.load,"pi.so"); dyn.load("pi.so") 13 14 ## Rhpc_worker_noback calls a function, but does not 15 ## get any result. 16 ## Workers should be started faster than a master. 17 Rhpc_worker_noback(cl,mpipicall,n); mpipicall(n) 18 Rhpc_worker_noback(cl,mpipic,n); mpipic(n) 19 Rhpc_worker_noback(cl,mpipif,n); mpipif(n) 20 21 Rhpc_finalize() 21 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 22 / 29

Windows Rhpc CRAN Windows MPI MS-MPI, MS-MPI Rhpc Windows CRAN MS-MPI MS-MPIv4.2 MS-MPI MS-MPIv7 MS-MPI v5 SDK 64bit def link mpiexec MPI, SDK Rhpc, MS-MPI 23 / 29

Windows Rhpc: 1 Windows Rhpc: 1 C:\Users\boofoo> mpiexec.exe -env PATH "C:\Program Files\R\R-3.2.2\bin\x64;%PATH%" -n 1 CMD /C "C:\Program Files\R\R-3.2.2\bin\x64\Rgui.exe" : -env PATH "C:\Program Files\R\R-3.2.2\bin\x64;%PATH%" -n 3 "%USERPROFILE%\Documents\R\win-library\3.2\Rhpc\RhpcWorker64.exe"... 24 / 29

Windows Rhpc: 2 Windows Rhpc: 2 C:\Users\boofoo> Documents\R\win-library\3.2\Rhpc\RhpcWin64.cmd, ( ) NPROCS ( ) OMP NUM THREADS (1) R HOME ( ) R VER ( ) 25 / 29

Windows Rhpc: Windows Rhpc: > library(rhpc) > Rhpc initialize() rank 0/ 4(1140850688) : hostname : 2152 > cl <- Rhpc gethandle() # Detected communication size 4 26 / 29

Windows Rhpc: Windows64bit 4 (1 Master 3 Worker).,, export MPI., *lapply. parallel(sock) Rhpc Transfer of matrix4000 2 by *export 1.54sec 1.39sec 10000 times of calc sqrt by *lapply 0.70sec 0.08sec 10000 times of calc sqrt by *lapplylb 0.91sec 0.11sec 27 / 29

1 2 Rhpc 3 forign MPI 4 Windows 5 28 / 29

R R Rhpc. Rhpc MPI Rhpc,. 29 / 29