nakao

Similar documents
XcalableMP入門

1.overview

120802_MPI.ppt

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

研究背景 大規模な演算を行うためには 分散メモリ型システムの利用が必須 Message Passing Interface MPI 並列プログラムの大半はMPIを利用 様々な実装 OpenMPI, MPICH, MVAPICH, MPI.NET プログラミングコストが高いため 生産性が悪い 新しい並

XACC講習会

11042 計算機言語7回目 サポートページ:

演習準備

±é½¬£²¡§£Í£Ð£É½éÊâ

¥Ñ¥Ã¥±¡¼¥¸ Rhpc ¤Î¾õ¶·

XMPによる並列化実装2

Microsoft PowerPoint - KHPCSS pptx

Visual Python, Numpy, Matplotlib

Visual Python, Numpy, Matplotlib

C/C++ FORTRAN FORTRAN MPI MPI MPI UNIX Windows (SIMD Single Instruction Multipule Data) SMP(Symmetric Multi Processor) MPI (thread) OpenMP[5]

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

NUMAの構成

演習準備 2014 年 3 月 5 日神戸大学大学院システム情報学研究科森下浩二 1 RIKEN AICS HPC Spring School /3/5

4th XcalableMP workshop 目的 n XcalableMPのローカルビューモデルであるXMPのCoarray機能を用 いて Fiberミニアプリ集への実装と評価を行う PGAS(Pertitioned Global Address Space)言語であるCoarrayのベ ンチマ

Microsoft PowerPoint _MPI-01.pptx

(Basic Theory of Information Processing) Fortran Fortan Fortan Fortan 1

MPI usage

Microsoft PowerPoint - sps14_kogi6.pptx

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

コードのチューニング

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

untitled

Microsoft PowerPoint _MPI-03.pptx

untitled

Fujitsu Standard Tool

riit_training_honda_final_rev

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

Microsoft PowerPoint - 講義:コミュニケータ.pptx

2 T 1 N n T n α = T 1 nt n (1) α = 1 100% OpenMP MPI OpenMP OpenMP MPI (Message Passing Interface) MPI MPICH OpenMPI 1 OpenMP MPI MPI (trivial p

Microsoft PowerPoint - 演習2:MPI初歩.pptx

<4D F736F F F696E74202D D F95C097F D834F E F93FC96E5284D F96E291E85F8DE391E52E >

2 2.1 Mac OS CPU Mac OS tar zxf zpares_0.9.6.tar.gz cd zpares_0.9.6 Mac Makefile Mekefile.inc cp Makefile.inc/make.inc.gfortran.seq.macosx make

2015 I ( TA)

Sae x Sae x 1: 1. {x (i) 0 0 }N i=1 (x (i) 0 0 p(x 0) ) 2. = 1,, T a d (a) i (i = 1,, N) I, II I. v (i) II. x (i) 1 = f (x (i) 1 1, v(i) (b) i (i = 1,

並列計算導入.pptx

1F90/kouhou_hf90.dvi

XACCの概要

01_OpenMP_osx.indd

PowerPoint プレゼンテーション

目 目 用方 用 用 方

Python (Anaconda ) Anaconda 2 3 Python Python IDLE Python NumPy 6

Python C/C++ IPMU IRAF

Python ( ) Anaconda 2 3 Python Python IDLE Python NumPy 6 5 matpl

WinHPC ppt

Microsoft PowerPoint - 講義:片方向通信.pptx

Python Speed Learning

HPC146

all.dvi

Python Speed Learning

untitled

PowerPoint プレゼンテーション

para02-2.dvi

1 matplotlib matplotlib Python matplotlib numpy matplotlib Installing A 2 pyplot matplotlib 1 matplotlib.pyplot matplotlib.pyplot plt import import nu

MPI MPI MPI.NET C# MPI Version2

PowerPoint プレゼンテーション

listings-ext

num2.dvi

スライド 1

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£²¡Ë

インテル® Xeon Phi™ プロセッサー上で MPI for Python* (mpi4py) を使用する

1.ppt

. (.8.). t + t m ü(t + t) + c u(t + t) + k u(t + t) = f(t + t) () m ü f. () c u k u t + t u Taylor t 3 u(t + t) = u(t) + t! u(t) + ( t)! = u(t) + t u(

講義の流れ 並列プログラムの概要 通常のプログラムと並列プログラムの違い 並列プログラム作成手段と並列計算機の構造 OpenMP による並列プログラム作成 処理を複数コアに分割して並列実行する方法 MPI による並列プログラム作成 ( 午後 ) プロセス間通信による並列処理 処理の分割 + データの

Microsoft Word - 計算科学演習第1回3.doc

HPC143

課題 S1 解説 Fortran 編 中島研吾 東京大学情報基盤センター

3. :, c, ν. 4. Burgers : t + c x = ν 2 u x 2, (3), ν. 5. : t + u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,., ν. t +

ex01.dvi

Microsoft PowerPoint - 演習1:並列化と評価.pptx

86

3. :, c, ν. 4. Burgers : u t + c u x = ν 2 u x 2, (3), ν. 5. : u t + u u x = ν 2 u x 2, (4), c. 2 u t 2 = c2 2 u x 2, (5) (1) (4), (1 Navier Stokes,.,

I I / 47


Microsoft PowerPoint MPI.v...O...~...O.e.L.X.g(...Q..)

openmp1_Yaguchi_version_170530

スライド 1

ex01.dvi

演習 II 2 つの講義の演習 奇数回 : 連続系アルゴリズム 部分 偶数回 : 計算量理論 部分 連続系アルゴリズム部分は全 8 回を予定 前半 2 回 高性能計算 後半 6 回 数値計算 4 回以上の課題提出 ( プログラム + 考察レポート ) で単位

演習2

Microsoft PowerPoint - S1-ref-F.ppt [互換モード]

sim98-8.dvi

‘îŁñ›È−wfiÁŁÊ”À„±I --Tensorflow‡ð”g‡Á‡½fl»ŁÊ›ð’Í--

(2-1) x, m, 2 N(m, 2 ) x REAL*8 FUNCTION NRMDST (X, M, V) X,M,V REAL*8 x, m, 2 X X N(0,1) f(x) standard-norm.txt normdist1.f x=0, 0.31, 0.5

44 6 MPI 4 : #LIB=-lmpich -lm 5 : LIB=-lmpi -lm 7 : mpi1: mpi1.c 8 : $(CC) -o mpi1 mpi1.c $(LIB) 9 : 10 : clean: 11 : -$(DEL) mpi1 make mpi1 1 % mpiru

chap2.ppt

2012年度HPCサマーセミナー_多田野.pptx

超初心者用

演習問題の構成 ディレクトリ構成 MPI/ --practice_1 演習問題 1 --practice_2 演習問題 2 --practice_3 演習問題 3 --practice_4 演習問題 4 --practice_5 演習問題 5 --practice_6 演習問題 6 --sample

Fortran90/95 [9]! (1 ) " " 5 "Hello!"! 3. (line) Fortran Fortran 1 2 * (1 ) 132 ( ) * 2 ( Fortran ) Fortran ,6 (continuation line) 1

演習1: 演習準備

I

GIZMO ¤ÇÍ·¤ó¤Ç¤ß¤ë

040312研究会HPC2500.ppt

Windows [ ] [ (R)..] cmd [OK] Z:\> mkdir progi [Enter] \ ) mkdir progi ) (command ) help [Enter] help ( help ) mkdir make directory Windows ) mkdir mk

Transcription:

Fortran+Python 4 Fortran, 2018 12 12

!2

Python!3

Python 2018 IEEE spectrum https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2018!4

Python print("hello World!") if x == 10: print ("AAA") print ("BBB") else print ("CCC") print ("DDD") for x in range(3): print(x) 0, 1, 2 names = ["Taro", "Jiro", "Saburo"] for name in names: print(name) for!5

Python!6

Numpy Python import numpy as np A = np.array([1, 2, 3]) B = np.array([4, 5, 6]) numpy np 1 A B print (A + B) print (A * B) print (A * 2) print (np.dot(a, B)) [5 7 9] [4 10 18] [2 4 6] 32 = (4+10+18) dot C = np.array([[1,2], [3,4]]) D = np.array([[4,3], [2,1]]) print (np.dot(c, D)) 2 C D [[ 8 5] [20 13]]!7

import numpy as np import matplotlib.pyplot as plt x = np.arange(-3.14, 3.14, 0.01) y = np.sin(x) plt.plot(x, y) matplotlib Python -3.14 3.14 0.01!8

!9

Fortran Python / Fortran Python Cython Fortran!10

Fortran+Python / Fortran Python Fortran+Python!11

Fortran+Python ctypes!12

Python!13

http://d.hatena.ne.jp/ignisan/20121017/p1 HPFPC!14

https://docs.python.jp/3/library/ctypes.html!15

1 fmath.f90 subroutine add(a,b) bind(c) implicit none integer(8),intent(in) :: a integer(8),intent(inout):: b b = a + b end subroutine add subroutine mult(a,b) bind(c) implicit none integer(8),intent(in) :: a integer(8),intent(inout):: b b = a * b end subroutine mult Python bind(c) calc.py from ctypes import * fmath = cdll.loadlibrary("fmath.so") fmath.add.argtypes = [ POINTER(c_int64), POINTER(c_int64) ] fmath.add.restype = c_void_p fmath.mult.argtypes = [ POINTER(c_int64), POINTER(c_int64) ] fmath.mult.restype = c_void_p a = c_int64(2) b = c_int64(3) fmath.add(a, b) print ("2 + 3 = ", b.value) # 5 a = c_int64(2) b = c_int64(3) fmath.mult(a, b) print ("2 * 3 = ", b.value) # 6 fmath.add(byref(a), byref(b)) byref()!16

% gfortran -shared -fpic -o fmath.so fmath.f90 % export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH % python3 calc.py 2 + 3 = 5 2 * 3 = 6 % Python!17

2 Numpy fmath.f90 subroutine add_one(a,b,n) bind(c) implicit none integer(8),intent(in) :: N real(8),dimension(n),intent(in) :: a real(8),dimension(n),intent(inout):: b b(:) = a(:) + 1 end subroutine add_one python 0-origin dimension(n) dimension(0:n-1) calc.py from ctypes import * import numpy as np fmath = np.ctypeslib.load_library("fmath.so",".") fmath.add_one.argtypes = [ np.ctypeslib.ndpointer(dtype=np.float64), np.ctypeslib.ndpointer(dtype=np.float64), POINTER(c_int64) ] fmath.add_one.restype = c_void_p inarray = np.arange(0.,10.,dtype=np.float64) outarray = np.empty_like(inarray) size = byref(c_int64(inarray.size)) fmath.add_one(inarray, outarray, size) print (inarray) # 0,1,2,..,9 print (outarray) # 1,2,3,..,10 Numpy numpy load_library()!18

% gfortran -shared -fpic -o fmath.so fmath.f90 % python3 calc.py [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] [ 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.] % np.ctypeslib.load_library() LD_LIBRARY_PATH Python Fortran!19

3 fmath.f90 integer(8) function sum(a,n) bind(c) implicit none integer(8),intent(in) :: N integer(8),dimension(n),intent(in) :: a integer :: i sum = 0 do i=1, N sum = sum + a(i) end do end function sum calc.py from ctypes import * import numpy as np fmath = np.ctypeslib.load_library("fmath.so",".") fmath.sum.argtypes = [ np.ctypeslib.ndpointer(dtype=np.int64), POINTER(c_int64) ] fmath.sum.restype = c_int64 inarray = np.arange(0.,10.,dtype=np.int64) size = byref(c_int64(inarray.size)) print (inarray) print (fmath.sum(inarray, size)) # 45!20

% gfortran -shared -fpic -o fmath.so fmath.f90 % python3 calc.py [ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9.] 45 %!21

4 plot fmath.f90 subroutine rungekutta(dt, nend, values) bind(c) implicit none real(8), intent(in) :: dt integer(8), intent(in) :: nend double precision, intent(out) :: values(nend) double precision :: t, p1, p2, p3, p4, u integer(8) :: n u = 1d0 do n = 1, nend t = dt * real(n) p1 = u p2 = u + 0.5d0 * dt * p1 p3 = u + 0.5d0 * dt * p2 p4 = u + dt * p3 u = u + dt * 1/6d0 * (p1 + 2 * p2 + 2 * p3 + p4) values(n) = u end do end subroutine rungekutta calc.py from ctypes import * import matplotlib.pyplot as plt import numpy as np fmath = np.ctypeslib.load_library("fmath.so",".") fmath.rungekutta.argtypes = [ POINTER(c_double), POINTER(c_int64), np.ctypeslib.ndpointer(dtype=np.float64)] fmath.rungekutta.restype = c_void_p n = 500 a = np.zeros(n) dt = c_double(0.01) len = c_int64(a.size) fmath.rungekutta(dt, len, a) plt.plot(np.arange(0,5,0.01), a) plt.show() https://kyoto-geopython.github.io/kyoto-geopython/html/ /Fortran,%20C %20.html!22

% gfortran -shared -fpic -o fmath.so fmath.f90 % python3 calc.py!23

5 scal fmath.f90 subroutine scal(x, alpha, N) bind(c) implicit none integer(8), intent(in) :: N real(8), dimension(n), intent(inout) :: x real(8), intent(in) :: alpha x(:) = x(:) * alpha end subroutine scal calc.py from ctypes import * import numpy as np import time fmath = np.ctypeslib.load_library("fmath.so",".") fmath.scal.argtypes = [ np.ctypeslib.ndpointer(dtype=np.float64), POINTER(c_double), POINTER(c_int64)] fmath.scal.restype = c_void_p n = 10000000 x = np.ones(n, dtype=np.float64) alpha = c_double(0.01) len = c_int64(x.size) start_time = time.perf_counter() fmath.scal(x, alpha, len) elapsed_time = time.perf_counter() - start_time print ("elapsed_time:{0}".format(elapsed_time))!24

5 scal Fortran+Python v.s. Fortran v.s. Python 3 Fortran program main implicit none integer(8), parameter :: N = 10000000 real(8), dimension(n) :: x real(8),parameter :: alpha = 0.01 integer(8) :: t1, t2, t_rate, t_max x(:) = 1.0 call system_clock(t1) call scal(x, alpha, N) call system_clock(t2, t_rate, t_max) Python import numpy as np import time n = 10000000 x = np.ones(n) start_time = time.perf_counter() x = x * alpha elapsed_time = time.perf_counter() - start_time print ("elapsed_time:{0}".format(elapsed_time)) write(*,*) (t2-t1)/dble(t_rate) end program main!25

5 scal 3.3 GHz Intel Core i7 16 GB 2133 MHz LPDDR3 Fortran+Python Fortran Fortran Python 30!26

!27

mpi4py from mpi4py import MPI comm = MPI.COMM_WORLD rank = comm.get_rank() size = comm.get_size() print("hello World at rank {0}/{1}".format(rank, size)) MPI_Init() MPI_Finalize() % mpiexec -n 4 python3./hello.py sort Hello World at rank 0/4 Hello World at rank 1/4 Hello World at rank 2/4 Hello World at rank 3/4!28

mpi4py + numpy from mpi4py import MPI import numpy as np comm = MPI.COMM_WORLD rank = comm.get_rank() if rank == 0: buf = np.arange(100, dtype=np.float64) req = comm.isend(buf, dest=1, tag=0) elif rank == 1: buf = np.empty(100,dtype=np.float64) req = comm.irecv(buf, source=0, tag=0) req.wait() Fortran C MPI!29

MPI+Python Serial Fortran fmath.f90 subroutine scal(x, alpha, N) bind(c) implicit none integer(8), intent(in) :: N real(8), dimension(n), intent(inout) :: x real(8), intent(in) :: alpha x(:) = x(:) * alpha end subroutine scal 5 scal calc.py from ctypes import * import numpy as np from mpi4py import MPI fmath = np.ctypeslib.load_library("fmath.so",".") fmath.scal.argtypes = [ np.ctypeslib.ndpointer(dtype=np.float64), POINTER(c_double), POINTER(c_int64)] fmath.scal.restype = c_void_p comm = MPI.COMM_WORLD size = comm.get_size() n = int(10000000/size) x = np.ones(n, dtype=np.float64) alpha = c_double(0.01) len = c_int64(x.size) fmath.scal(x, alpha, len) 10000000 size!30

Python+MPI Fortran+MPI 1 fmath.f90 subroutine hello(comm) bind(c) implicit none include "mpif.h" integer(4) :: comm, size, rank, ierr call MPI_Comm_size(comm, size, ierr) call MPI_Comm_rank(comm, rank, ierr) print *, "Hello World at rank ", rank, "/", size end subroutine hello % mpiexec -n 4 python3./hello.py sort Hello World at rank 0 / 4 Hello World at rank 1 / 4 Hello World at rank 2 / 4 Hello World at rank 3 / 4 hello.py from ctypes import * import numpy as np from mpi4py import MPI fmath = np.ctypeslib.load_library("fmath.so",".") fmath.hello.argtypes = [ POINTER(c_int32) ] fmath.hello.restype = c_void_p comm = MPI.COMM_WORLD comm = comm.py2f() fmath.hello(c_int32(comm)) Python MPI py2f() Fortran MPI Fortran MPI Integer c_int32 Fortran!31

Python+MPI Fortran+MPI 2 fmath.f90 subroutine scal(x, alpha, N, comm) bind(c) implicit none integer(8), intent(in) :: N real(8), dimension(n), intent(inout) :: x real(8), intent(in) :: alpha real(8), dimension(n) :: y include "mpif.h" integer(4) :: comm, size, rank, ierr, len integer(4) :: start, end call MPI_Comm_size(comm, size, ierr) call MPI_Comm_rank(comm, rank, ierr) len = N/size start = rank * len + 1 end = start + len - 1 x(start:end) = x(start:end) * alpha call MPI_Allgather(x(start), len, MPI_REAL8, y, len, MPI_REAL8, comm, ierr) x(:) = y(:) end subroutine scal calc.py from ctypes import * import numpy as np from mpi4py import MPI fmath = np.ctypeslib.load_library("fmath.so",".") fmath.scal.argtypes = [ np.ctypeslib.ndpointer(dtype=np.float64), POINTER(c_double), POINTER(c_int64), POINTER(c_int32)] fmath.scal.restype = c_void_p comm = MPI.COMM_WORLD comm = comm.py2f() n = 10000000 x = np.ones(n, dtype=np.float64) alpha = c_double(0.01) len = c_int64(x.size) fmath.scal(x, alpha, len, c_int32(comm)) 10000000 size!32

!33

XcalableMP XMP とは 1/2 High Performance Fortran (HPF)のように 指示文を用いて並列化 MPIに代わる並列プログラミングモデルとしてPCクラスタコンソー シアム XMP規格部会が仕様を策定している FortranとCに対して 指示文とCoarray記法を用いた並列拡張 指示文なので 既存の逐次コードからの移行が容易 MPI/OpenMP/Pythonとの連携機能 既存のコードの部分的なXMP化も可能 第1回 並列Fortranに 関するシンポジウムでも 発表しています http://xcalablemp.org!34

XcalableMP 2/2 SPMD (Single Program Multiple Data) MPI XMP MPI Coarray HPF 2 Coarray!35

Python+MPI XMP fmath.f90 subroutine scal(x, N, alpha) bind(c)!$xmp nodes p(*)!$xmp template t(n)!$xmp distribute t(block) onto p real(8), dimension(n), intent(inout) :: x real(8), intent(in) :: alpha!$xmp align x(i) with t(i)!$xmp array on t(:) x(:) = x(:) * alpha end subroutine scal calc.py from mpi4py import MPI from ctypes import * import numpy as np import xmp lib = xmp.lib("fmath.so") comm = MPI.COMM_WORLD size = comm.get_size() n = 1000000 m = int(n/size) x = np.ones(m, dtype=np.float64) alpha = c_double(0.01) job = lib.call(comm, "scal", (x, n, alpha)) if comm.get_rank() == 0: print (job.elapsed_time()) // second 10000000 size!36

!37