スライド 1

Similar documents
NUMAの構成

目 目 用方 用 用 方

Microsoft PowerPoint - KHPCSS pptx

120802_MPI.ppt

講義の流れ 並列プログラムの概要 通常のプログラムと並列プログラムの違い 並列プログラム作成手段と並列計算機の構造 OpenMP による並列プログラム作成 処理を複数コアに分割して並列実行する方法 MPI による並列プログラム作成 ( 午後 ) プロセス間通信による並列処理 処理の分割 + データの

演習 II 2 つの講義の演習 奇数回 : 連続系アルゴリズム 部分 偶数回 : 計算量理論 部分 連続系アルゴリズム部分は全 8 回を予定 前半 2 回 高性能計算 後半 6 回 数値計算 4 回以上の課題提出 ( プログラム + 考察レポート ) で単位

スライド 1

para02-2.dvi

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

chap2.ppt

WinHPC ppt

演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 RIKEN AICS HPC Spring School /3/5

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

Microsoft PowerPoint - 講義:片方向通信.pptx

Microsoft PowerPoint - 演習2:MPI初歩.pptx

コードのチューニング

Microsoft PowerPoint _MPI-03.pptx

<4D F736F F F696E74202D C097F B A E B93C782DD8EE682E890EA97705D>

PowerPoint プレゼンテーション

Microsoft PowerPoint MPI.v...O...~...O.e.L.X.g(...Q..)

情報処理概論(第二日目)


86

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

±é½¬£²¡§£Í£Ð£É½éÊâ

MPI

Microsoft PowerPoint - 講義:コミュニケータ.pptx

スライド 1

スライド 1

Microsoft PowerPoint 並列アルゴリズム04.ppt

Microsoft PowerPoint - 第10回講義(2015年12月22日)-1 .pptx

XcalableMP入門

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf ("%s", str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i

コードのチューニング

演習準備

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(

Microsoft PowerPoint - 演習1:並列化と評価.pptx

Microsoft Word - Win-Outlook.docx

情報処理演習 II

Microsoft PowerPoint _MPI-01.pptx

untitled

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

MPI usage

MPI () MPIMessage Passing Interface MPI MPI OpenMP 7 ( ) 1

MPI MPI MPI.NET C# MPI Version2

115 9 MPIBNCpack 9.1 BNCpack 1CPU X = , B =

DKA ( 1) 1 n i=1 α i c n 1 = 0 ( 1) 2 n i 1 <i 2 α i1 α i2 c n 2 = 0 ( 1) 3 n i 1 <i 2 <i 3 α i1 α i2 α i3 c n 3 = 0. ( 1) n 1 n i 1 <i 2 < <i

58 7 MPI 7 : main(int argc, char *argv[]) 8 : { 9 : int num_procs, myrank; 10 : double a, b; 11 : int tag = 0; 12 : MPI_Status status; 13 : 1 MPI_Init

Microsoft PowerPoint - MPIprog-C2.ppt [互換モード]

Page 1 of 6 B (The World of Mathematics) November 20, 2006 Final Exam 2006 Division: ID#: Name: 1. p, q, r (Let p, q, r are propositions. ) (10pts) (a

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

<4D F736F F F696E74202D C097F B A E B93C782DD8EE682E890EA97705D>

VE-GP32DL_DW_ZA

2007年度 計算機システム演習 第3回

課題 S1 解説 C 言語編 中島研吾 東京大学情報基盤センター

untitled

MPI コミュニケータ操作

第8回講義(2016年12月6日)

取説_VE-PV11L(応用編)

GP05取説.indb

Quiz 1 ID#: Name: 1. p, q, r (Let p, q and r be propositions. Determine whether the following equation holds or not by completing the truth table belo

4.1 % 7.5 %

VE-GD21DL_DW_ZB

fx-9860G Manager PLUS_J

かし, 異なったプロセス間でデータを共有するためには, プロセス間通信や特殊な共有メモリ領域を 利用する必要がある. このためマルチプロセッサマシンの利点を最大に引き出すことができない. こ の問題はマルチスレッドを用いることで解決できる. マルチスレッドとは,1 つのプロセスの中に複 数のスレッド

PFQX2227_ZA

C H H H C H H H C C CUTION:These telephones are for use in Japan only. They cannot be used in other countries because of differences in voltages, tele

untitled

2 except for a female subordinate in work. Using personal name with SAN/KUN will make the distance with speech partner closer than using titles. Last

,,,,., C Java,,.,,.,., ,,.,, i

AN 100: ISPを使用するためのガイドライン

h23w1.dvi


スライド 1

入学検定料支払方法の案内 1. 入学検定料支払い用ページにアクセス ポータルの入学検定料支払いフォームから 入学検定料支払い用 URL の ここをクリック / Click here をクリックしてください クリックを行うと 入学検定料支払い用のページが新たに開かれます ( 検定料支払い用ページは ポ

Fundamental MPI 1 概要 MPI とは MPI の基礎 :Hello World 全体データと局所データ グループ通信 (Collective Communication) 1 対 1 通信 (Point-to-Point Communication)

Krylov (b) x k+1 := x k + α k p k (c) r k+1 := r k α k Ap k ( := b Ax k+1 ) (d) β k := r k r k 2 2 (e) : r k 2 / r 0 2 < ε R (f) p k+1 :=

Microsoft PowerPoint - 高速化WS富山.pptx

LC304_manual.ai

Kyushu Communication Studies 第2号

Read the following text messages. Study the names carefully. 次のメッセージを読みましょう 名前をしっかり覚えましょう Dear Jenny, Iʼm Kim Garcia. Iʼm your new classmate. These ar

GNU開発ツール

05-opt-system.ppt

Fundamental MPI 1 概要 MPI とは MPI の基礎 :Hello World 全体データと局所データタ グループ通信 (Collective Communication) 1 対 1 通信 (Point-to-Point Communication)

Compatibility list: vTESTstudio/CANoe

国際恋愛で避けるべき7つの失敗と解決策

XACC講習会

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

L1 What Can You Blood Type Tell Us? Part 1 Can you guess/ my blood type? Well,/ you re very serious person/ so/ I think/ your blood type is A. Wow!/ G

はじめに

帯域を測ってみよう (適応型QoS/QoS連携/帯域検出機能)

<4D F736F F F696E74202D C097F B A E B93C782DD8EE682E890EA97705D>

Technische Beschreibung P82R SMD

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

How to read the marks and remarks used in this parts book. Section 1 : Explanation of Code Use In MRK Column OO : Interchangeable between the new part

C. S2 X D. E.. (1) X S1 10 S2 X+S1 3 X+S S1S2 X+S1+S2 X S1 X+S S X+S2 X A. S1 2 a. b. c. d. e. 2

並列計算導入.pptx

Transcription:

Parallel Programming in MPI part 2 1 1

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 2

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 3

ノンブロッキング通信関数 Non-blocking communication functions ノンブロッキング = ある命令の完了を待たずに次の命令に移る Non-blocking = Do not wait for the completion of an instruction and proceed to the next instruction Example) MPI_Irecv & MPI_Wait Blocking Non-Blocking Wait for the arrival of data MPI_Recv data Proceed to the next instruction without waiting for the data MPI_Irecv next instructions MPI_Wait data next instructions 4

MPI_Irecv Non-Blocking Receive Parameters: start address for storing received data, number of elements, data type, rank of the source, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request request: 通信要求 Communication Request この通信の完了を待つ際に用いる Used for Waiting completion of this communication Usage: int MPI_Irecv(void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Request *r); Example) MPI_Request req;... MPI_Irecv(a, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &req);... MPI_Wait(&req, &status); 5 5

MPI_Isend Non-Blocking Send Parameters: start address for sending data, number of elements, data type, rank of the destination, tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases), request Usage: int MPI_Isend(void *b, int c, MPI_Datatype d, int dest, int t, MPI_Comm comm, MPI_Request *r); Example) MPI_Request req;... MPI_Isend(a, 100, MPI_INT, 1, 0, MPI_COMM_WORLD, &req);... MPI_Wait(&req, &status); 6 6

Non-Blocking Send? Blocking send (MPI_Send): 送信データが別の場所にコピーされるのを待つ Wait for the data to be copied to somewhere else. ネットワークにデータを送出し終わるか 一時的にデータのコピーを作成するまで Until completion of the data to be transferred to the network or, until completion of the data to be copied to a temporal memory. Non-Blocking send (MPI_Isend): 待たない 7

Notice: ノンブロッキング通信中はデータが不定 Data is not sure in non-blocking communications MPI_Irecv: 受信データの格納場所と指定した変数の値は MPI_Wait まで不定 Value of the variable specified for receiving data is not fixed before MPI_Wait Value of A at here can be 10 or 50 MPI_Irecv to A... ~ = A... MPI_Wait A 10 A 50 arrived data 50 Value of A is 50 ~ = A 8

Notice: ノンブロッキング通信中はデータが不定 Data is not sure in non-blocking communications MPI_Isend: 送信データを格納した変数を MPI_Wait より前に書き換えると 実際に送信される値は不定 If the variable that stored the data to be sent is modified before MPI_Wait, the value to be actually sent is unpredictable. Modifying value of A here causes incorrect communication MPI_Isend A... A = 50... A 10 A 50 data sent 10 or 50 You can modify value of A at here without any problem MPI_Wait A = 100 9

MPI_Wait Usage: int MPI_Wait(MPI_Request *req, MPI_Status *stat); ノンブロッキング通信 (MPI_Isend MPI_Irecv) の完了を待つ Wait for the completion of MPI_Isend or MPI_Irecv 送信データの書き換えや受信データの参照が行える Make sure that sending data can be modified, or receiving data can be referred. Parameters: request, status status: MPI_Irecv 完了時に受信データの status を格納 The status of the received data is stored at the completion of MPI_Irecv 10

MPI_Waitall 指定した数のノンブロッキング通信の完了を待つ Wait for the completion of specified number of non-blocking communications Parameters: count, requests, statuses count: ノンブロッキング通信の数 The number of non-blocking communications Usage: int MPI_Waitall(int c, MPI_Request *requests, MPI_Status *statuses); requests, statuses: 少なくとも count 個の要素を持つ MPI_Request と MPI_Status の配列 Arrays of MPI_Request or MPI_Status that consists at least 'count' number of elements. 11

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 12

集団通信関数の中身 Inside of the functions of collective communications 通常, 集団通信関数は, MPI_Send, MPI_Recv, MPI_Isend, MPI_Irecv 等の一対一通信で実装される Usually, functions of collective communications are implemented by using message passing functions. 13

Inside of MPI_Bcast One of the most simple implementations int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; MPI_Comm_rank(comm, &myid); MPI_Comm_size(comm, &procs); if (myid == root){ for (i = 0; i < procs) if (i!= root) MPI_Send(a, c, d, i, 0, comm); else{ MPI_Recv(a, c, d, root, 0, comm, &st); return 0; 14

Another implementation: With MPI_Isend int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs, cntr; MPI_Status st, *stats; MPI_Request *reqs; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); if (myid == root){ stats = (MPI_Status *)malloc(sizeof(mpi_status)*procs); reqs = (MPI_Request *)malloc(sizeof(mpi_request)*procs); cntr = 0; for (i = 0; i < procs) if (i!= root) MPI_Isend(a, c, d, i, 0, comm, &(reqs[cntr++])); MPI_Waitall(procs-1, reqs, stats); free(stats); free(reqs); else{ MPI_Recv(a, c, d, root, 0, comm, &st); return 0; 15

Flow of the Simple Implementation Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 Isend to 1 Isend to 2 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Irecv from 0 Isend to 3 Isend to 4 Isend to 5 Isend to 6 Isend to 7 waitall wait wait wait wait wait wait wait 16

Time for Simple Implementation 1 link can transfer 1 message at a time 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Total Time = T * (P-1) T: Time for transferring 1 message P: Number of processes 17

Another implementation: Binomial Tree int MPI_Bcast(char *a, int c, MPI_Datatype d, int root, MPI_Comm comm) { int i, myid, procs; MPI_Status st; int mask, relative_rank, src, dst; int tag = 1, success = 0; MPI_Comm_rank(comm, &myid); MPI_Comm_rank(comm, &procs); relative_rank = myid - root; if (relative_rank < 0) relative_rank += procs; mask = 1; while (mask < num_procs){ if (relative_rank & mask){ src = myid - mask; if (src < 0) src += procs; MPI_Recv(a, c, d, src, 0, comm, &st); break; mask <<= 1; mask >>= 1; while (mask > 0){ if (relative_rank + mask < procs){ dst = myid + mask; if (dst >= procs) dst -= procs; MPI_Send (a, c, d, dst, 0, comm); mask >>= 1; return 0; 18

Flow of Binomial Tree Use 'mask' to determine when and how to Send/Recv Rank 0 Rank 1 Rank 2 Rank 3 Rank 4 Rank 5 Rank 6 Rank 7 mask = 1 mask = 2 mask = 4 mask = 4 Send to 4 mask = 1 Recv from 0 mask = 1 mask = 2 Recv from 0 mask = 1 Recv from 2 mask = 1 mask = 2 mask = 4 Recv from 0 mask = 1 Recv from 4 mask = 1 mask = 2 Recv from 4 mask = 1 Recv from 6 mask = 2 Send to 2 mask = 1 Send to 1 mask = 1 Send to 3 mask = 2 Send to 6 mask = 1 Send to 5 mask = 1 Send to 7 19

Time for Binomial Tree Use multiple links at a time 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Total Time = T * log 2 P T: Time for transferring 1 message P: Number of processes 20

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 21

MPI プログラムの時間計測 Measure the time of MPI programs MPI_Wtime 現在時間 ( 秒 ) を実数で返す関数 Returns the current time in seconds. Example) Measure time here... double t1, t2;... t1 = MPI_Wtime(); 処理 t2 = MPI_Wtime(); printf("elapsed time: %e sec. n", t2 t1);

並列プログラムにおける時間計測の問題 Problem on measuring time in parallel programs プロセス毎に違う時間を測定 : どの時間が本当の所要時間か? Each process measures different time. Which time is the time we want? Rank 0 Measure time here t1 = MPI_Wtime(); Read Read Send Rank 1 t1 = MPI_Wtime(); Receive Rank 2 t1 = MPI_Wtime(); Receive Read Send t1 = MPI_Wtime(); t1 = MPI_Wtime(); t1 = MPI_Wtime(); 23

集団通信 MPI_Barrier を使った解決策 Use MPI_Barrier 時間計測前に MPI_Barrier で同期 Synchronize processes before each measurement For measuring total execution time. Rank 0 MPI_Barrier Rank 1 MPI_Barrier Rank 2 MPI_Barrier Measure time here t1 = MPI_Wtime(); Read Read Send Receive Receive Read MPI_Barrier Send MPI_Barrier MPI_Barrier t1 = MPI_Wtime(); 24

より細かい解析 Detailed analysis Average MPI_Reduce can be used to achieve the average: double t1, t2, t, total; t1 = MPI_Wtime();... t2 = MPI_Wtime(); t = t2 t1; MPI_Reduce(&t, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myrank == 0) printf("ave. elapsed: %e sec. n", total/procs); MAX and MIN Use MPI_Gather to gather all of the results to Rank 0. Let Rank 0 to find MAX and MIN 25

最大 (Max) 平均 (Ave) 最小 (Min) の関係 Relationships among Max, Ave and Min プロセス毎の負荷 ( 仕事量 ) のばらつき検証に利用 Can be used for checking the load-balance. Ave Min is large Ave Min is small NG NG Max Ave is large Max Ave is small Mostly OK OK Time includes Computation Time and Communication Time 26

通信時間の計測 Measuring time for communications double t1, t2, t3, t4 comm=0; t3 = MPI_Wtime(); for (i = 0; i < N; i++){ computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 t1; computation t1 = MPI_Wtime(); communication t2 = MPI_Wtime(); comm += t2 t1; t4 = MPI_Wtime(); 27

Analyze computation time Computation time = Total time - Communication time Or, just measure the computation time 計算時間のばらつき = 負荷の不均衡の度合い Balance of computation time shows balance of the amount of computation 注意 : 通信時間には 負荷の不均衡によって生じた待ち時間が含まれるので 単純な評価は難しい Communication time is difficult to analyze since it consists waiting time caused by load-imbalance. ==> Balance computation first. 28

Today's Topic ノンブロッキング通信 Non-Blocking Communication 通信の完了を待つ間に他の処理を行う Execute other instructions while waiting for the completion of a communication. 集団通信関数の実装 Implementation of collective communications MPI プログラムの時間計測 Measuring execution time of MPI programs デッドロック Deadlock 29

Deadlock 何らかの理由で プログラムを進行させることができなくなった状態 A status of a program in which it cannot proceed by some reasons. MPI プログラムでデッドロックが発生しやすい場所 : Places you need to be careful for deadlocks: 1. MPI_Recv, MPI_Wait, MPI_Waitall Wrong case: if (myid == 0){ MPI_Recv from rank 1 MPI_Send to rank 1 if (myid == 1){ MPI_Recv from rank 0 MPI_Send to rank 0 2. Collective communications One solution: use MPI_Irecv if (myid == 0){ MPI_Irecv from rank 1 MPI_Send to rank 1 MPI_Wait if (myid == 1){ MPI_Irecv from rank 0 MPI_Send to rank 0 MPI_Wait 全部のプロセスが同じ集団通信関数を実行するまで先に進めない A program cannot proceed until all processes call the same collective communication function

Summary ノンブロッキング通信の効果 Effect of non-blocking communication 通信開始と通信完了待ちを分離 Split the start and the completion of a communication 通信と計算のオーバラップを可能にする Enable overlapping of communication and computation. 集団通信の実装 Implementation of collective communication. 内部で送信と受信を組み合わせて実装 Construct algorithms with sends and receives. 所要時間はアルゴリズムに依存 Time depends on the algorithm. MPI プログラムの時間計測 Measuring execution time of MPI programs 並列プログラムではデッドロックに注意 Be careful about deadlocks. 31

Report) Make Reduce function by yourself 次のページのプログラムの my_reduce 関数の中身を追加してプログラムを完成させる Fill the inside of 'my_reduce' function in the program shown in the next slide my_reduce: MPI_Reduce の簡略版 Simplified version of MPI_Reduce 整数の総和のみ. ルートランクは 0 限定. コミュニケータは MPI_COMM_WORLD Calculates total sum of integer numbers. The root rank is always 0. The communicator is always MPI_COMM_WORLD. アルゴリズムは好きなものを考えてよい Any algorithm is OK. 32

#include <stdio.h> #include <stdlib.h> #include "mpi.h" #define N 20 int my_reduce(int *a, int *b, int c) { complete here by yourself return 0; int main(int argc, char *argv[]) { int i, myid, procs; int a[n], b[n]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); for (i = 0; i < N; i++){ a[i] = i; b[i] = 0; my_reduce(a, b, N); if (myid == 0) for (i = 0; i < N; i++) printf("b[%d] = %d, correct answer = %d n", i, b[i], i*procs); MPI_Finalize(); return 0; 33