01_OpenMP_osx.indd

Similar documents
02_C-C++_osx.indd

03_Fortran_osx.indd

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

Intel® Compilers Professional Editions

卒業論文

スライド 1

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

コードのチューニング

C

OpenMPプログラミング

XMPによる並列化実装2

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

XcalableMP入門

スパコンに通じる並列プログラミングの基礎

スパコンに通じる並列プログラミングの基礎

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

120802_MPI.ppt

040312研究会HPC2500.ppt

untitled

次世代スーパーコンピュータのシステム構成案について

I I / 47

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

main.dvi

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

untitled

MPI usage

スパコンに通じる並列プログラミングの基礎

2012年度HPCサマーセミナー_多田野.pptx

WinHPC ppt

OpenMPプログラミング

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

AICS 村井均 RIKEN AICS HPC Summer School /6/2013 1

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

(Microsoft PowerPoint \215u\213`4\201i\221\272\210\344\201j.pptx)

演習1: 演習準備

HPC146

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

ÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

インテル(R) Visual Fortran Composer XE

Microsoft Word - openmp-txt.doc

2.2 Sage I 11 factor Sage Sage exit quit 1 sage : exit 2 Exiting Sage ( CPU time 0m0.06s, Wall time 2m8.71 s). 2.2 Sage Python Sage 1. Sage.sage 2. sa

openmp1_Yaguchi_version_170530

11042 計算機言語7回目 サポートページ:

A/B (2010/10/08) Ver kurino/2010/soft/soft.html A/B

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1

v10 IA-32 64¹ IA-64²

(1/2) 2/45 HPC top runner application programmer PC-9801F N88-BASIC Quick BASIC + DOS ( ) BCB Windows Percolation, Event-driven MD ActionScript Flash

XACC講習会

all.dvi

PowerPoint プレゼンテーション

Pentium 4

CUDA 連携とライブラリの活用 2

[1] #include<stdio.h> main() { printf("hello, world."); return 0; } (G1) int long int float ± ±

C 2 / 21 1 y = x 1.1 lagrange.c 1 / Laglange / 2 #include <stdio.h> 3 #include <math.h> 4 int main() 5 { 6 float x[10], y[10]; 7 float xx, pn, p; 8 in


インテル(R) C++ Composer XE 2011 Windows版 入門ガイド

ex01.dvi

num2.dvi

double float

C言語によるアルゴリズムとデータ構造

研究背景 大規模な演算を行うためには 分散メモリ型システムの利用が必須 Message Passing Interface MPI 並列プログラムの大半はMPIを利用 様々な実装 OpenMPI, MPICH, MVAPICH, MPI.NET プログラミングコストが高いため 生産性が悪い 新しい並

Class Overview

Microsoft PowerPoint - KHPCSS pptx

enshu5_4.key

Fortran90/95 2. (p 74) f g h x y z f x h x = f x + g x h y = f y + g y h z = f z + g z f x f y f y f h = f + g Fortran 1 3 a b c c(1) = a(1) + b(1) c(

1F90/kouhou_hf90.dvi

Microsoft PowerPoint - OpenMP入門.pptx

main.dvi

SHOBI_Portal_Manual

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

PC Windows 95, Windows 98, Windows NT, Windows 2000, MS-DOS, UNIX CPU

(2-1) x, m, 2 N(m, 2 ) x REAL*8 FUNCTION NRMDST (X, M, V) X,M,V REAL*8 x, m, 2 X X N(0,1) f(x) standard-norm.txt normdist1.f x=0, 0.31, 0.5


nakao

,…I…y…„†[…e…B…fi…O…V…X…e…•‡Ì…J†[…l…‰fi®“ì‡Ì›Â”‰›»pdfauthor

XACCの概要

フカシギおねえさん問題の高速計算アルゴリズム


ohp1.dvi

新版明解C言語 実践編

±é½¬£²¡§£Í£Ð£É½éÊâ

FFTSS Library Version 3.0 User's Guide

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

プラズマ核融合学会誌5月号【81-5】/内外情報_ソフト【注:欧フォント特殊!】

インテル(R) Visual Fortran Composer XE 2011 Windows版 入門ガイド

para02-2.dvi

¥×¥í¥°¥é¥ß¥ó¥°±é½¬I Exercise on Programming I [1zh] ` `%%%`#`&12_`__~~~ alse

Microsoft Word - 計算科学演習第1回3.doc

Second-semi.PDF


2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

2 /83


インテル® VTune™ パフォーマンス・アナライザー 9.1 Windows* 版

背景

Transcription:

OpenMP* / 1

1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1

1 9.0 Linux* Windows* Xeon Itanium OS 1 2

2 WEB OS OS OS 1 OS OS OS OS OS OS 3

A B / A B / 1 OS CPU PC OS 4

3 CPU 1 : 2 3 1 2 3 1 2 3 5

CPU 0 CPU 1 CPU 2 CPU 0 CPU 1 CPU 2 CPU 0 CPU 1 CPU 2 1 CPU 0 CPU 1 CPU 2 CPU 0 CPU 1 CPU 2 CPU 0 CPU 1 CPU 2 6

4 SMP NUMA 9.0 HT HT AS AS AS AS AS AS AS Architecture State APIC Advanced Programmable Interrupt Controller HT 2 1 HT 1 OS 2 100% HT HT OS HT 2 HT 20%-30% 7

2 CPU 0 CPU 1 CPU 0 CPU 1 8

5 OS API 9.0 32bit 64bit Linux* 32bit 64bit Windows* 9.0 2 1 1 OpenMP* OpenMP* 9.0 OpenMP* 2.5 OpenMP* 5.1 /Qparallel Windows* -parallel Linux* 9

Program SPMD_Emb_Par () { Program SPMD_Emb_Par () TYPE *tmp, *func(); { global_array Program Data(TYPE); SPMD_Emb_Par () TYPE *tmp, *func(); global_array { Res(TYPE); global_array Program Data(TYPE); SPMD_Emb_Par () int N = get_num_procs(); TYPE *tmp, *func(); global_array { Res(TYPE); int id = get_proc_id(); global_array Data(TYPE); int N = get_num_procs(); TYPE *tmp, *func(); if (id==0) global_array setup_problem(n,data); Res(TYPE); int id = get_proc_id(); global_array Data(TYPE); for (int I= int 0; N I<N;I=I+Num){ = get_num_procs(); if (id==0) global_array setup_problem(n,data); Res(TYPE); tmp = int func(i); id = get_proc_id(); for (int I= int 0; Num I<N;I=I+Num){ = get_num_procs(); Res.accumulate( if (id==0) setup_problem(n,data); tmp); tmp = int func(i); id = get_proc_id(); } for (int I= 0; I<N;I=I+Num){ Res.accumulate( if (id==0) setup_problem(n, tmp); Data); } tmp = func(i); } for (int I= ID; I<N;I=I+Num){ Res.accumulate( tmp); } } } tmp = func(i, Data); Res.accumulate( tmp); } } Private Shared 2 for (i=1; i<100; i++) { a[i] = a[i] + b[i] * c[i]; } 10

// Thread 1 for (i=1; i<50; i++) { a[i] = a[i] + b[i] * c[i]; } // Thread 2 for (i=50; i<100; i++) { a[i] = a[i] + b[i] * c[i]; } 1 #define num_steps 1000000 2 double step; 3 main () 4 { int i; double x, pi, sum = 0.0; 5 6 step = 1.0/(double) num_steps; 7 8 for (i=1;i<= num_steps; i++){ 9 x = (i-0.5)*step; 10 sum = sum + 4.0/(1.0+x*x); 11 } 12 pi = step * sum; 13 } Linux* $ icc -parallel par-report3 par-threshold0 -O3 sample.c procedure: main sample.c(9) : (col. 11) remark: LOOP WAS AUTO-PARALLELIZED. parallel loop: line 9 shared : { } private : {"i", "x"} first priv.: {"step"} reductions : {"sum"} 11

$ cat -n sample.c 1 #define N 1000 2 main () 3 { int i; double a[n], b[n], c[n]; 4 for (i=1;i<= N; i++){ 5 a[i] = a[i-1] + b[i] * c[i]; 6 } 7 } $ icc -parallel -par-report3 -par-threshold0 sample.c procedure: main serial loop: line 5 flow data dependence from line 5 to line 5 stmt 2 to stmt 2, due to "a" flow data dependence from OpenMP* OpenMP* 12

5.2 OpenMP* API OpenMP* API Application Programming Interface OpenMP* 1997 OpenMP Architecture Review Board OpenMP* API Linux* UNIX* Windows* OpenMP* C/C++ Fortran* OpenMP* 9.0 OpenMP* OpenMP* OpenMP* http://www.openmp.org/ OpenMP* 2005 5 OpenMP* 2.5 C/C++ Fortran* 1998 OpenMP* C/C++ 1.0 2002 OpenMP* C/C++ 2.0 2005 OpenMP* Fortran C/C++ 2.5 1997 OpenMP* Fortran 1.0 1999 OpenMP* Fortran 1.1 2000 OpenMP* Fortran 2.0 OpenMP* C/C++ Fortran* API OpenMP* OpenMP* C/C++ Fortran* OpenMP* OpenMP* OpenMP* OpenMP* OpenMP* OpenMP* OpenMP* OpenMP* API OpenMP* / OpenMP* API OpenMP* API OpenMP* API 13

OpenMP* API OpenMP* PC 1 API OpenMP* API.OpenMP* API #pragma omp parallel if (n>limit) default (none) shared (n,a,b,c,x,y,z) private(f,i,scale) { f = 1.0; #pragma omp for nowait for (i=0; i<n; i++) z[i] = x[i] + y[i]; #pragma omp for nowait for (i=0; i<n; i++) a[i] = b[i] + c[i]; #pragma omp barrier scale = sum(a, 0, n) + sum(z, 0, n) + f; } /** Enf of parallel region **/ OpenMP* OpenMP* OpenMP* Windows* /Qopenmp Linux* -openmp OpenMP* OpenMP* OpenMP* OpenMP* Fork-Join 14

Fork Join Fork Join Fork-Join 1 OpenMP* 2 OpenMP* #pragma omp parallel (C/C++)!$omp parallel (Fortran) Fork 3 #pragma omp end parallel (C/C++)!$omp end parallel (Fortran) 4 join Fork-Join OpenMP* OpenMP* 15

1 #define num_steps 1000000000 2 double step; 3 main () 4 { int i; double x, pi, sum = 0.0; 5 6 step = 1.0/(double) num_steps; 7 8 #pragma omp parallel for private(x) reduction(+:sum) 9 for (i=1;i<= num_steps; i++){ 10 x = (i-0.5)*step; 11 sum = sum + 4.0/(1.0+x*x); 12 } 13 pi = step * sum; 14 printf (" pi = %f \n",pi); 15 } OpenMP* Linux* $ icc -openmp openmp-report2 -O3 sample1.c sample1.c(8) : (col. 1) remark: OpenMP DEFINED LOOP WAS PARALLELIZED. OpenMP* OpenMP* OpenMP* http://www.openmp.org!$omp parallel do!$omp& default(shared)!$omp& private(i,j,k,rij,d)!$omp& reduction(+ : pot, kin) do i=1,np! compute potential energy and forces f(1:nd,i) = 0.0 do j=1,np if (i.ne. j) then call dist(nd,box,pos(1,i),pos(1,j),rij,d)! attribute half of the potential energy to particle 'j' pot = pot + 0.5*v(d) do k=1,nd f(k,i) = f(k,i) - rij(k)*dv(d)/d enddo endif enddo! compute kinetic energy kin = kin + dotr8(nd,vel(1,i),vel(1,i)) enddo!$omp end parallel do kin = kin*0.5*mass subroutine dist(nd,box,r1,r2,dr,d) implicit none integer i d = 0.0 do i=1,nd dr(i) = r1(i) - r2(i) d = d + dr(i)**2. enddo d = sqrt(d) return end http://www.specbench.org SPEComp* OpenMP* OpenMP* 16

6 OpenMP* OpenMP* OpenMP* OS API 17

Windows*/Linux* OpenMP* OpenMP* OpenMP* VTune 18

7 Itanium 2 ILP MPI OpenMP* MPI Message Passing Interface OpenMP* OpenMP* 20 HPC SGI CTO 2005 6 HPC URL http://www.sstc.co.jp/ 19

HPC http://www.intel.co.jp/jp/go/hpc/ 300-2635 5-6 http://www.intel.co.jp/ Intel Intel Itanium VTune Xeon Intel Corporation * 2006 Intel Corporation. 2006 3 525J-001 JPN/0603/PDF/SE/DEG/KS