untitled

Similar documents
untitled

untitled

untitled

untitled

02_C-C++_osx.indd

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

(Microsoft PowerPoint \215u\213`4\201i\221\272\210\344\201j.pptx)

AICS 村井均 RIKEN AICS HPC Summer School /6/2013 1

Microsoft PowerPoint - HPCseminar2013-msato.pptx

untitled

01_OpenMP_osx.indd

GNU開発ツール

Microsoft Word - openmp-txt.doc

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP 3.0 C/C++ 構文の概要

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

The 3 key challenges in programming for MC

Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. U.S. Government Rights - Commer

Microsoft PowerPoint - OpenMP入門.pptx

OpenMPプログラミング

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

03_Fortran_osx.indd

NUMAの構成

卒業論文

コードのチューニング

2. OpenMP におけるキーワード一覧 OpenMP の全体像を理解するために 指示文 指示節 実行時ライブラリ関数 環境変数にそれぞれどうようなものがあるのかを最初に示します 各詳細については第 4 章以降で説明します 2.1 OpenMP の指示文 OpenMPの指示文は プログラム内で並列

PowerPoint プレゼンテーション

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

untitled

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

Microsoft PowerPoint ppt [互換モード]

Microsoft PowerPoint - sales2.ppt

スパコンに通じる並列プログラミングの基礎

XcalableMP入門

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

スパコンに通じる並列プログラミングの基礎

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

I I / 47

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

Microsoft PowerPoint - embedded-multicore-print.ppt [互換モード]

C

XMPによる並列化実装2

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

並列プログラミング入門(OpenMP編)

生物情報実験法 (オンライン, 4/20)

XACCの概要

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

2012年度HPCサマーセミナー_多田野.pptx

演習1: 演習準備

GPGPU

openmp1_Yaguchi_version_170530

040312研究会HPC2500.ppt

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

スパコンに通じる並列プログラミングの基礎

OpenMPプログラミング

FFTSS Library Version 3.0 User's Guide

enshu5_4.key

develop

A B 1: Ex. MPICH-G2 C.f. NXProxy [Tanaka] 2:

untitled

MPI usage

,,.,,., II,,,.,,.,.,,,.,,,.,, II i

Intel® Compilers Professional Editions

B

HPC146

Microsoft PowerPoint - AICS-SS-msato.pptx

120802_MPI.ppt

Microsoft PowerPoint ppt [互換モード]

Microsoft PowerPoint - 阪大CMSI pptx

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 理化学研究所 共通コードプロジェクト

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

: (1), ( ) 1 1.1,, 1 OpenMP [3, 5, 21, 22], MPI [13, 18, 23].., (C Fortran)., OS,. C Fortran,,,,. ( ),,.,,.,,,.,,,.,.,. 1

main.dvi

smpp_resume.dvi

ACE Associated Computer Experts bv

CSV ToDo ToDo

untitled

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

橡Webcamユーザーガイド03.PDF

C による数値計算法入門 ( 第 2 版 ) 新装版 サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 新装版 1 刷発行時のものです.

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

16.16%

Microsoft PowerPoint ppt [互換モード]

listings-ext

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

連載講座 : 高生産並列言語を使いこなす (5) 分子動力学シミュレーション 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 問題の定義 17 2 逐次プログラム 分子 ( 粒子 ) セル 系の状態 ステップ 18

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

Cell/B.E. BlockLib

07-二村幸孝・出口大輔.indd

chapt5pdf.p65

Vol.6 No (Aug. 2013) 1,a) 2,b) 2,c) , Java Java Java Java Inner Method for Code Reuse in Fine-grained and Its Effective Im

Microsoft PowerPoint ppt [互換モード]

スライド 1

スライド 1

fiš„v8.dvi

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

Transcription:

OpenMP (Message Passing) (shared memory) DSMon MPI,PVM pthread, solaris thread, NT thread OpenMP annotation thread HPF annotation, distribution hint Fancy parallel programming languages for(i=0;i<1000; i++) S += A[i] 1 2 3 4 1000 + S 1 2 250 251 500 501 750 751 1000 + + + + + S

POSIX POSIX Pthread, Solaris thread for(t=1;t<n_thd;t++) r=pthread_create(thd_main,t) thd_main(0); for(t=1; t<n_thd;t++) pthread_join(); PARAMCS For(t=1; t<n_thd;t++) CREATE(thd_main); thd_main(0) WAIT_FOR_END(n_thd-1); int s; /* global */ int n_thd; /* number of threads */ int thd_main(int id) int c,b,e,i,ss; c=1000/n_thd; b=c*id; e=s+c; ss=0; for(i=b; i<e; i++) ss += a[i]; pthread_lock(); s += ss; pthread_unlock(); return s; OpenMP OK! #pragma omp parallel for reduction(+:s) for(i=0; i<1000;i++) s+= a[i]; OpenMP OpenMP OpenMP Parallel Regionwork sharing (for)(sections)single data scope orphan static extent dynamic extent OpenMP OpenMP

OpenMP (Fortran/C/C++)directive ISV Oct. 1997 Fortran ver.1.0 API Oct. 1998 C/C++ ver.1.0 API (1999 F90 API?) URL http://www.openmp.org/ SGI Cray Origin ASCI Blue Mountain System SUN Enterprise PC-based SMP SGI Power Fortran/C SUN Impact KAI/KAP OpenMP OpenMP 5%95%(?) 5% small-scale(16medium-scale (64 pthreados-oriented, general-purpose OpenMPAPI directives/pragma Fortran77, f90, C, C++ Fortran!$OMP C: #pragma omp pragma incremental

3GHz, 10GHz 90nm 65nm, 45nm VLIW L3 Intel Hyperthreading CPU Pentium OpenMP Fork-join parallel region A... #pragma omp parallel foo(); /*..B... */ C. #pragma omp parallel D E... Call foo() fork A Call foo() Call foo() B join C D E Call foo()

OpenMP OpenMP OpenMPAPI Fortran $OMP,C$OMP,*$OMPsentinel!$OMP directive_name [clause, clause, ] directive_name: clause:, C/C++ #pragma omp pragma #pragma omp directive_name [clause, clause, ] #pragma omp parallel Parallel Region (team) Parallel Parallel regionteam regionteam Fortran: C:!$OMP PARALLEL #pragma omp parallel parallel region......... Parallel region...!$omp END PARALLEL... Parallel region (contd.) ID omp_get_thread_num() IDTeam ID=0 ID omp_set_num_threads(nthreads) OMP_NUM_THREADS parallel regionjoin critical, atomic, barrier

for(i=0;i<1000; i++) S += A[i] 1 2 3 4 1000 + S 1 2 250 251 500 501 750 751 1000 + + + + + S OpenMP #pragma omp parallel int c,b,e,i,ss; c=1000/omp_get_num_threads(); b=c*omp_get_thread_num();e=s+c;ss=0; for(i=b; i<e; i++) ss += a[i]; #pragma omp atomic s += ss; OpenMP #pragma omp parallel for reduction(+:s) for(i=0; i<1000;i++) s+= a[i]; OpenMP : (data-parallel) (task-parallel) tuning : SPMD omp_get_thread_num()id SPLASH 2PARMACS Macro backend: OpenMP e.g. Polaris Compiler OpenMP Pthread, Solaris thread for(t=1;t<n_thd;t++) r=pthread_create(thd_main,t) thd_main(0); for(t=1; t<n_thd;t++) pthread_join(); PARAMCS For(t=1; t<n_thd;t++) CREATE(thd_main); thd_main(0) WAIT_FOR_END(n_thd-1); OpenMP omp_set_num_threads(n_thd); #pragma omp parallel thd_main(omp_get_thread_num());

Work sharing For Team parallel region for sections single parallel parallel for parallel sections ForDO forcanonical shape #pragma omp for [clause] for(var=lb; var logical-op ub; incr-expr) body varprivate incr-expr ++var,var++,--var,var--,var+=incr,var-=incr logical-op break clause For schedule(kind[,chunk_size]) schedule(static,chunk_size) chunk_sizeround-roubin chunk_size=1:cyclic schedule(dynamic,chunk_size) chunk_size chunk_size=1 schedule(guided,chunk_size) chunk_size schedule(runtime) OMP_SCHEDULE implementation n schedule(static,n) Schedule(static) Schedule(dynamic,n) Schedule(guided,n) Iteration space

Sections single Matvec(double a[],int row_start,int col_idx[], double x[],double y[],int n) int i,j,start,end; double t; #pragma omp parallel for private(j,t,start,end) for(i=0; i<n;i++) start=row_start[i]; end=row_start[i+1]; t = 0.0; for(j=start;j<end;j++) t += a[j]*x[col_idx[j]]; y[i]=t; Section #pragma omp sections #pragma omp section section1 #pragma omp section section2 #pragma omp single statements Barrier Work sharingnowait barrier Critical section critical Atomic atomic flush work sharingnowait #pragma omp barrier

Atomic Critical Atomic #pragma omp atomic statement x binop= expr x++,++x, x--, --xx xexpr Atomic Critical section #pragma omp critical[(name)] statements critical section critical section conditional wait master Master ordered #pragma omp master block statements ordered #pragma omp ordered block statements fordynamic extent forordered Data scope parallelwork sharing shared(var_list) private(var_list) private firstprivate(var_list) private lastprivate(var_list) private reduction(op:var_list) reduction private

Threadprivate file-scope #pragma omp threadprivate(var_list) parallel region persistent parallelcopyin(var_list) Data scope work sharing Parallel private,firstprivate,shared,reduction,copyin default(shared none) defaultnone for private,firstprivate,lastprivate,reduction sections private,firstprivate,lastprivate,reduction single private,firstprivate Orphan directiveextent extent (orphan directive) Static extent lexical dynamic extent orphan directive Static extentdynamic extent dynamic extent dynamic extentdata scope autoprivate shared main() for(it=0;it<niter;i++) resid=cgsol() printf(,resid); cgsol() #pragma omp parallel for for(i=0;i<cols;i+) p[i]=r[i]=x[i]; for(it=0;it<nitcg;i++) matvec(); #pragma omp parallel for for(i=0;i<cols;i++) z[i]+=alpha*p[i]; main() #pragma omp parallel for(it=0;it<niter;i++) resid=cgsol() #pragma omp master printf(,resid); cgsol() #pragma omp for for(i=0;i<cols;i+) p[i]=r[i]=x[i]; for(it=0;it<nitcg;i++) matvec(); #pragma omp for for(i=0;i<cols;i++) z[i]+=alpha*p[i];

Directive binding for, sections, single,master, barrier directivedynamic extentbind dynamic extent work sharingnest master, critical nested parallelism parallel directivenest Nested parallelismenableparallel Disablethread Nested parallelism Nested parallelism in FAQ ``What about nested parallelism? Nested parallelism is permitted by the OpenMP specification. Supporting nested parallelism effectively can be difficult, and we expect most vendors will start out by executing nested parallel constructs on a single thread. In ``OpenMP Fortran Interpretations Version 1.0 In Note that an OpenMP-compliant implementation is permitted to serialize a nested parallel region. Nested parallelismserialize sectionserialize serialize OpenMPmemory consistency OpenMPweak consistency Parallel region volatile nowaitwork sharing flush flush #pragma omp flush[(var_list)] consistency omp_get_num_threads, omp_set_num_threads team omp_get_thread_num id omp_get_max_threads omp_get_num_procs omp_set_dynamic, omp_get_dynamic omp_set_nested, omp_get_nested parallel regionnest lock omp_lock_t omp_nest_lock_t

OpenMP OMP_NUM_THREADS Parallel region OMP_SCHEDULE schedule(runtime) OMP_DYNAMIC SGI origin OMP_NESTED nested parallelism nestparallel region incremental Work sharing orphan directive data mapping Iteration mapping locality reduction pragma OpenMP --- (Fortran,C/C++) fork-join incremental OpenMP2.0 reduction OpenMP3.0 Gcc Omni OpenMP locality MPI,HPF