Xeon Phi MICROSOFT* WINDOWS* 1.4 1

Similar documents
Xeon Phi 1.8

目次 はじめに... 4 目的... 4 本ガイドに含まれるトピック :... 4 本ガイドに含まれないトピック :... 4 用語... 4 システム構成... 5 インテル Xeon Phi コプロセッサー向けソフトウェア... 5 インテル メニー インテグレーテッド コア ( インテル MI

ホワイトペーパー インテル Xeon Phi コプロセッサー開発者向けクイック スタート ガイド バージョン 1.7

インテル(R) Visual Fortran Composer XE

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

01_OpenMP_osx.indd

Presentation title

02_C-C++_osx.indd

THE PARALLEL Issue UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHug

インテル(R) C++ Composer XE 2011 Windows版 入門ガイド

スパコンに通じる並列プログラミングの基礎

DPD Software Development Products Overview

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

スパコンに通じる並列プログラミングの基礎

Intel_ParallelStudioXE2013_ClusterStudioXE2013_Introduction.pptx

製品価格 ( 新規購入 ) INT6531 インテル VTune Amplifier XE 2017 for Windows Floating 1-275, ,000 INT6532 インテル VTune Amplifier XE 2017 for Linux Floating 1-27

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

インテル® VTune™ パフォーマンス・アナライザー 9.1 Windows* 版

FFTSS Library Version 3.0 User's Guide

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

workshop Eclipse TAU AICS.key

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

スパコンに通じる並列プログラミングの基礎

Contents Windows* /Linux* C++/Fortran... 3 Microsoft* embedded Visual C++* C Microsoft* Windows* CE.NET Platform Builder C IP

Nios® II HAL API を使用したソフトウェア・サンプル集 「Modular Scatter-Gather DMA Core」

インテル(R) Visual Fortran Composer XE 2011 Windows版 入門ガイド

Microsoft Word - w_mkl_build_howto.doc

system02.dvi

インテル® Parallel Studio XE 2015 Composer Edition for Linux* インストール・ガイドおよびリリースノート

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

Copyright Oracle Parkway, Redwood City, CA U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated softw

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

hotspot の特定と最適化

07-二村幸孝・出口大輔.indd

製品型番 商用版特定ユーザーライセンス INT7001 インテル System Studio 2018 FreeBSD \163,080 INT6673 インテル Media Server Studio 2017 Essentials \84,000 \90,720 Edit INT6674 インテ

インテルソウトウェア開発製品アカデミック版特定ユーザーライセンス標準価格表 株式会社アークブレイン 2016 年 5 月 10 日 ~ 製品型番 アカデミック版特定ユーザーライセンス 税別標準価格 税込標準価格 INT5744 インテル Parallel Studio XE 2016 Cluster

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

LAN Copyright c Daikoku Manabu This tutorial is licensed under a Creative Commons Attribution 2.1 Japan License

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

VNSTProductDes3.0-1_jp.pdf

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

大統一Debian勉強会 gdb+python拡張を使ったデバッグ手法

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

インテル® Parallel Studio XE 2013 Linux* 版インストール・ガイドおよびリリースノート

double float

Windows SE RAC 10g 構築手順書

<Documents Title Here>

untitled

Oracle8 Workgroup Server for Windows NTインストレーション・ガイド,リリース8.0.6

bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows bash on Ubuntu on Windows ˆ Windows10 64bit Wi

untitled

目次 1 はじめに 製品に含まれるコンポーネント 動作環境... 4 オペレーティング システム... 4 Microsoft Visual Studio* 製品 製品のダウンロード 製品版をインストールする場合 評価版を

インテル® Parallel Studio XE 2019 Composer Edition for Fortran Windows : インストール・ガイド

Si-R180 ご利用にあたって

インテル® Parallel Studio 入門ガイド

untitled

dvi

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

tebiki00.dvi

WinHPC ppt

Pentium 4

ACE Associated Computer Experts bv

untitled

Microsoft Windows Hyper-VでのVNXeシステムの使用

インテル® Parallel Studio XE 2017 Composer Edition for Fortran Windows - インストール・ガイド -

Intel® Compilers Professional Editions


Cisco Umbrella Branch Cisco Umbrella Branch Cisco ISR Umbrella Branch

GLOBALBASE joshua45 globalbase.org

SR-S224PS1 セキュアスイッチ ご利用にあたって

インテル Parallel Studio XE 2017 Composer Edition for Fortran Windows* インストール ガイド Rev (2017/06/08) エクセルソフト株式会社

Quickstart Guide 3rd Edition

NetWorker Avamar リリース8.0統合ガイド

SR-X526R1 サーバ収容スイッチ ご利用にあたって

CudaWaveField

Emacs ML let start ::= exp (1) exp ::= (2) fn id exp (3) ::= (4) (5) ::= id (6) const (7) (exp) (8) let val id = exp in

インテル® MPI ライブラリー Windows* 版

untitled

XcalableMP入門

I I / 47

インテル® Parallel Studio XE 2013 Windows* 版インストール・ガイドおよびリリースノート

( ) 1 Windows HTML ( ) ( ) ( ) WWW 10 ( )

cpp1.dvi

I117 II I117 PROGRAMMING PRACTICE II SOFTWARE DEVELOPMENT ENV. 1 Research Center for Advanced Computing Infrastructure (RCACI) / Yasuhiro Ohara

フカシギおねえさん問題の高速計算アルゴリズム


B 20 Web

アカ版特定ユーザーライセンス INT7006 INT7007 INT7008 INT6685 インテル System Studio 2018 Ultimate on \217,080 r インテル System Studio 2018 Ultimate on \217,080 r インテル Syst

Click to edit title

EMC® RepliStor® for Microsoft Windows バージョン 6.2 SP2インストール・ガイド

RouteMagic Controller RMC-MP200 / MP Version

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM

インテル® Parallel Studio XE 2019 Composer Edition for Fortran Windows 日本語版 : インストール・ガイド

SR-X324T1/316T1 サーバ収容スイッチ ご利用にあたって

Nios II ハードウェア・チュートリアル

インテル® Fortran Studio XE 2011 SP1 Windows* 版インストール・ガイドおよびリリースノート

Informatics 2015

RouteMagic Controller( RMC ) 3.6 RMC RouteMagic RouteMagic Controller RouteMagic Controller MP1200 / MP200 Version 3.6 RouteMagic Controller Version 3

P P P P P P P OS... P P P P P P

Transcription:

Xeon Phi MICROSOFT* WINDOWS* 1.4 1

Xeon Phi MICROSOFT* WINDOWS*... 4... 4 :... 4 :... 4... 4... 5 Xeon Phi... 5 ( MIC)... 7... 8... 8... 8... 12 Xeon Phi... 13 Xeon Phi... 13 Xeon Phi uos... 14... 18... 19 Xeon Phi... 20 /... 20 :... 20 :... 20... 21... 21... 21... 22... 22... 23... 23... 23... 24... 25... 25... 27 Xeon Phi... 28 Xeon Phi... 28 2

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi Cilk Plus... 29 Xeon Phi Cilk Plus... 30 Xeon Phi TBB... 30 MKL... 33 SGEMM... 33 MKL... 34 Xeon Phi... 35 Xeon Phi... 35 A: Linux*... 36... 38... 39... 40 3

Xeon Phi MICROSOFT* WINDOWS* ( MIC) Xeon Phi Microsoft* Windows* C/C++ Fortran : 1. MPSS 2. MIC 3. Xeon Phi Parallel Studio XE 2015 Composer Edition for Windows* 4. ( MKL) 5. Xeon Phi 6. (BKM) : 1. ( ) 2. PCIe* XeonPhi Xeon (OS) : Windows* 7 Enterprise SP1 (64 )Windows* 8/8.1 Enterprise (64 )Windows Server* 2008 R2 SP1 (64 )Windows Server* 2012 (64 )Windows Server* 2012 R2 (64 ) Xeon Phi uos Linux* Xeon Phi ISA / I/O 1 VPU SIMD (Single Instruction Multiple Data) CPU NAcc Xeon Phi MKL C/C++ 15.0 for Windows* Visual Fortran 15.0 for Windows* Xeon Phi 1 Intel acronyms dictionary, 8/6/2009, http://library.intel.com/dictionary/details.aspx?id=5600 4

Xeon Phi MICROSOFT* WINDOWS* SDP Xeon Phi KNC Xeon Phi Xeon Phi ( : Knights Corner) MPSS Xeon Phi SCIF Xeon Phi Xeon SCIF API PCIe ( Xeon Phi ) 2 Xeon PCIe* x16 1 2 Xeon Phi GPU Xeon Phi http://www.isus.jp/article/idz/hpc/which-systems-support-the-intel-xeon-phi-coprocessor Xeon Phi 1: 5

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi 1 : Xeon Phi Windows* : Xeon Phi : / PCIe* : <MPSS-installdir> \bin\micinfo.exe <MPSS-install-dir>\bin\MicFlash.exe <MPSS-install-dir>\bin\micctrl.exe <MPSS-install-dir>\bin\micsmc.exe <MPSS-install-dir>\service RAS MicRas MicRas Windows* <MPSS-install-dir> "c:\program Files\Intel\MPSS" OS (uos): Xeon Phi Linux* : uos Linux* SCIF http://www.isus.jp/article/mic-article/software-stack-mpss/ (COIMYO) 6

Xeon Phi MICROSOFT* WINDOWS* ( MIC) Xeon Phi 50 MIC 1GHz ( 1.3GHz) MIC x86 ISA 64 512 SIMD 4 2: MIC (VPU) VPU 32 512 512 SIMD ISA VPU MIC Xeon Phi VPU MIC SIMD ISA ( MMX SSE AVX ) 32KB L1 32KB L1 512KB L2 L2 32MB LLC http://www.isus.jp/article/idz/mic-developer/ Xeon Phi () 7

Xeon Phi MICROSOFT* WINDOWS* http://www.isus.jp/article/mic-article/software-stack-mpss/ Parallel Studio XE 2015 Composer Edition for Windows* http://www.isus.jp/article/intel-software-dev-products/intel-parallel-studio-xe/ 1. http://software.intel.com/mic-developer ( ) TOOLS & DOWNLOADSSoftware Drivers:Intel Manycore Platform Software Stack(Intel M Downloads Microsoft* Windows* Readme (readme-windows.pdf) (releasenotes-windows.txt) MPSS (MPSS_Users_Guide-windows.pdf) 2. Microsoft* Windows* 7 Enterprise SP1 (64 ) Microsoft* Windows* 8 Enterprise (64 ) Microsoft* Windows Server* 2008 R2 SP1 (64 ) Microsoft* Windows Server* 2012 (64 ) Microsoft* Windows Server* 2012 R2 (64 ) 3. 4..NET Framework 4.0 (http://msdn.microsoft.com/ja-jp/vstudio/aa496123.aspx) uos PuTTY* PuTTYgen* () 5. Readme 2.2.1 Preliminary Steps 6. 7. 1 Windows* (mpss-3.*-windows.zip) 8. zip Windows* (Intel(R) Xeon Phi(TM) coprocessor.msi Intel(R) Xeon Phi(TM) coprocessor essentials.msi) 9. Readme 2.2.2 Windows* Intel(R) Xeon Phi(TM) coprocessor.msi Xeon Phi Windows* [ ] MPSS c:\program Files\Intel\MPSS [Intel ] Xeon Phi (Intel(R) Xeon Phi(TM) coprocessor essentials.msi) 10. [ ] > [ ] > [ ] MPSS 3 Intel(R) Xeon Phi(TM) coprocessor Intel(R) Xeon Phi(TM) coprocessor essentials 8

Xeon Phi MICROSOFT* WINDOWS* 3: [ ] MPSS 11. Readme-windows.pdf 2.2.3 12. 13. Xeon Phi ([ ] > [ ] > ] [ ] ) 9

Xeon Phi MICROSOFT* WINDOWS* 4: (R) Xeon Phi(TM) 14. Xeon Phi ( ) MPSS prompt> micctrl --start 15. micinfo prompt> micinfo.exe 10

Xeon Phi MICROSOFT* WINDOWS* 5: micinfo.exe Driver Version 3.4.* MPSS Version 3.4.* Flash Version 2.1.*.* 11

Xeon Phi MICROSOFT* WINDOWS* Parallel Studio XE 2015 Composer Edition for C++ Windows* Parallel Studio XE 2015 Composer Edition for Fortran Windows* (http://www.isus.jp/article/intel-software-devproducts/intel-parallel-studio-xe/) ( MKL ) Parallel Studio XE 2015 Composer Edition for C++ Windows* Parallel Studio XE 2015 Composer Edition for Fortran Windows* [ ] (.EXE) 1. 2. Xeon Phi "set OFFLOAD_REPORT=3" (<install-dir>\samples\ja_jp\c++\mic_samples <installdir>\samples\ja_jp\fortran\mic_samples) ( "MIC:" ) http://software.intel.com/enus/articles/offload-programming-fortran-and-c-code-examples () 3. SEP Vtune Amplifier XE 2015 SEP a) <VTune-installdir>\bin64\k1om\ prompt>.\sep_micboot_install.cmd b) MIC () ( ) prompt> micctrl --start prompt> micctrl -w micctrl w micx: online c) d) prompt> micctrl --stop prompt>.\sep_micboot_uninstall.cmd prompt> micctrl --start 12

Xeon Phi MICROSOFT* WINDOWS* prompt> micctrl w Xeon Phi Xeon Phi Xeon Phi micinfo [ ] > [ ] > ] [ ] [ ] [ ] prompt> micctrl --start prompt> micctrl -w prompt> micinfo Xeon Phi Xeon Phi PuTTY* Linux* PuTTY* 2 prompt> micctrl -s <micx> MPSS prompt> micctrl r <micx> prompt> micctrl -w prompt> micctrl b <micx> prompt> micctrl -w prompt> micinfo MPSS prompt> micctrl --stop prompt> micctrl --start prompt> micctrl -w prompt> micinfo 13

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi uos IP 192.168.<coprocessor>.100 IP 192.168.<coprocessor>.99 mic<coprocessor> mic0 IP 192.168.1.100 IP 192.168.1.99 2 mic1 192.168.2.100 192.168.2.99 Linux* PuTTY* root root Linux* WinSCP* PuTTY* http://www.chiark.greenend.org.uk/~sgtatham/putty/ download.html ( ) "<MPSS-installdir>\bin" SSH PuTTYgen* <MPSS-install-dir>\bin 6: PuTTY* PuTTgen* <MPSS-install-dir>\bin PuTTYgen* [Generate] SSH ( 7) 14

Xeon Phi MICROSOFT* WINDOWS* 7: PuTTYgen* authorized_keys (.txt ) <MPSSinstall-dir>\bin [Public key pasting into OpenSSH authorized_keys file] authorized_keys [Save private key] id_rsa.ppk id_rsa.ppk <MPSS-install-dir>\bin ([ ] ) <MPSS-install-dir>\bin prompt> micctrl --addssh root f "<MPSS-install-dir>\bin\authorized_keys" "micctrl --stop" "micctrl --start" prompt> micctrl --stop prompt> micctrl --start PuTTY* PuTTY* [Host Name (or IP address)] root@192.168.1.100 (mic0 ) (mic1 root@192.168.2.100) 15

Xeon Phi MICROSOFT* WINDOWS* 8: PuTTY* [Connection] > [SSH] [Auth] [Private key file for authentication] [Browse] id_rsa.ppk 9: [Open] ( ) 16

Xeon Phi MICROSOFT* WINDOWS* 10: mic0 WinSCP* http://winscp.net () WinSCP* [File protocol] SCP [Host name] 192.168.1.100 [User name] root [Private key file] (application.mic) mic0 17

Xeon Phi MICROSOFT* WINDOWS* 11: WinSCP* [Login] GUI mic0 "ls" Xeon Phi Xeon Phi (NFS) MPSS 12.2 [ ] ] [ ] [ ] 18

Xeon Phi MICROSOFT* WINDOWS* 12: MPSS <MPSS-install-dir>\bin root micinfo - micflash - / micctrl micras Windows* Xeon Phi RAS RAS micsmc (GUI) 2 19

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi MPSS 7 8 9 10 11 Xeon Phi MPSS 6 Xeon Phi MIC SIMD (C/C++ FORTRAN ) ( ) SIMD / Xeon Phi MIC API Xeon Phi MIC : o Parallel Studio XE 2015 Composer Edition for C++ Windows* 64 MIC o Parallel Studio XE 2015 Composer Edition for Fortran Windows* 64 MIC (): o ( MKL) MIC MKL Xeon Phi Parallel Studio XE 2015 Composer Edition for C++ Windows* Parallel Studio XE 2015 Composer Edition for Fortran Windows* o ( TBB) o ( IPP) : 20

Xeon Phi MICROSOFT* WINDOWS* Debugger Extension o 64 MIC o VTune Amplifier XE 2015 for Windows*Windows* Xeon Phi Microsoft* Visual Studio* 2008 o Parallel Studio XE 2015 Composer Edition for C++ Windows* Parallel Studio XE 2015 Composer Edition for Fortran Windows*: [ ] > [ ] > [Intel Parallel Studio XE 2015 ( (R) Parallel Studio XE 2015)] > [Command Prompt ( )] > [Parallel Studio XE 2015 Composer Edition ( (R) Parallel Studio XE 2015 Composer Edition)] > [Intel 64 Visual Studio XXXX mode ( (R) 64 Visual Studio XXXX )] <install-dir>\documentation\ja_jp\ o o o compiler_c\cl\compiler_ug_c compiler_f\cl\compiler_ug_f - C++ 15.0 for Windows* Visual Fortran 15.0 for Windows* ([ ] > [ ] > [Intel Parallel Studio XE 2015 ( (R) Parallel Studio XE 2015)] > [Documentation ( )] ) MIC > MIC > MIC MIC > > MIC Release_Notes-C-2015_W_JA.pdf Release_Notes-F-2015_W_ JA.pdf - MIC MIC 4 : Parallel Studio XE 2015 Composer Edition for C++ Windows* Parallel Studio XE 2015 Composer Edition for Fortran Windows* (Release_Notes-*-2015_ W_EN.pdf) <install-dir>\documentation\ja_jp\debugger\migdb_quickstart_win.pdf () Xeon Phi Visual Studio* Windows* Visual Studio* GDB Xeon Phi 21

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi o o MKL <install-dir>\documentation\ja_jp\mkl mkl_documentation.htm ( [ ] > [ ] > [Intel Parallel Studio XE 2015 ( (R) Parallel Studio XE 2015)] > [Documentation ( )] ) Xeon Phi MKL MKL VTune Amplifier XE 2015 for Windows* Xeon Phi <VTune-install-dir>\documentation \en\tutorials\find_hotspots\c++\index.htm () (Linux* ) Web : o http://www.isus.jp/article/idz/mic-developer/ Xeon Phi o http://www.isus.jp/article/performance-special/knights-corner-open-source-softwarestack/ RESOURCES (including downloads) Xeon Phi uos GDB MIC Xeon Phi () ABI System V Application Binary Interface K1OM Architecture Processor Supplement () Xeon Phi : o o o C++: <install-dir>\samples\ja_jp\c++\mic_samples Fortran: <install-dir>\samples\ja_jp\fortran\mic_samples\ MKL: o <install-dir>\mkl\examples\examples_mic\mic_offload : o C: <install-dir>\samples\ja_jp\c++\mic_sample\leo_tutorial Xeon Phi.dll Xeon Phi release-notes-*-2015-w-ja.pdf Xeon Phi > > /Qoffload-option /Qoffload-attribute-target /Qoffload-attribute-target ( )/Qoffload - 22

Xeon Phi MICROSOFT* WINDOWS* /Qoffload:none) ( ) OFFLOAD_REPORT prompt> set OFFLOAD_REPORT=3 1 2 prompt> set OFFLOAD_REPORT=<1 2> > CPU 2 CPU Xeon Phi (C/C++) (Fortran) ( MKL) CPU Xeon Phi : Xeon Phi Xeon Phi Xeon Phi Xeon Phi ans = a[0] + a[1] + + a[n-1] : C 2 http://dictionary.reference.com/browse/heterogeneous 23

Xeon Phi MICROSOFT* WINDOWS* float reduction_serial(float *data, int size) float ret = 0.f; for (int i=0; i<size; ++i) ret += data[i]; return ret; 1: (C/C++) ( ) #pragma offload target(mic) Xeon Phi ( ) ( ) in out inout ( ) () ret 1 MIC 1 float reduction_offload(float *data, int size) float ret = 0.f; #pragma offload target(mic) in(data:length(size)) for (int i=0; i<size; ++i) ret += data[i]; return ret; 2: Xeon Phi VPU Cilk Plus MIC 32 512 1 1 24

Xeon Phi MICROSOFT* WINDOWS* sec_reduce_add() 32 512 16 float reduction_vectorreduction(float *data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) ret = sec_reduce_add(data[0:size]); // Cilk Plus return ret; 3: (C/C++) Xeon Phi C++ > MIC > MIC <install-dir>\samples\ja_jp\c++\mic_samples\ intro_samplec\samplec13.c C/C++ C/C++ > MIC > MIC > > Parallel Studio XE 2015 Composer Edition C/C++ 2 (_Cilk_shared _Cilk_offload) ( Fortran ) (_Cilk_shared ) _Cilk_offload / API: void *_Offload_shared_malloc(size_t size); _Offload_shared_free(void *p); API: void *_Offload_shared_aligned_malloc(size_t size, size_t alignment); _Offload_shared_aligned_free(void *p); 25

Xeon Phi MICROSOFT* WINDOWS* 2 1 (_Cilk_offload ) _Cilk_shared _Cilk_offload float * _Cilk_shared data; // _Cilk_shared float MIC_OMPReduction(int size) float Result; #ifdef MIC int nthreads = 60; omp_set_num_threads(nthreads); #pragma omp parallel for reduction(+:result) for (int i=0; i<size; ++i) Result += data[i]; return Result; #else printf("intel(r) Xeon Phi(TM) Coprocessor not available\n"); #endif return 0.0f; int main() size_t size = 1*1e6; int n_bytes = size*sizeof(float); float Result; data = (_Cilk_shared float *)_Offload_shared_malloc (n_bytes); for (int i=0; i<size; ++i) data[i] = i%10; Result = _Cilk_offload MIC_OMPReduction(size); Printf( Cilk Offload Result=%.0f\n,Result); _Offload_shared_free(data); return 0; 4: _Cilk_shared _Cilk_offload (C/C++) C: <install-dir>\samples\ja_jp\c++\mic_samples shrd_samplec LEO_tutorial C++: <install-dir>\samples\ja_jp\c++\mic_samples\shrd_samplecpp 26

Xeon Phi MICROSOFT* WINDOWS* C++ Fortran C++ > MIC > MIC > > Xeon Phi Xeon Phi ( NFS ) : 1. <install-dir>\samples\ja_jp\c++\openmp_samples.zip openmp_sample.c 2. /Qmic icl /Qmic -openmp openmp_sample.c 3. WinSCP* a.out mic0 ( /tmp ) 4. c:\program Files (x86)\common Files\Intel\Shared Libraries\compiler\lib \mic\lib\libiomp5.so OpenMP* /tmp 5. PuTTY* ( OpenMP* ) export LD_LIBRARY_PATH=/tmp 6. ulimit s unlimited 7. /tmp a.out cd /tmp./a.out 27

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi 1. ( TBB) 2. OpenMP* 3. Cilk Plus 4. Pthreads* Xeon Phi Xeon Phi : OpenMP* CPU OpenMP* Xeon Phi OpenMP* / OpenMP* 1 Xeon Phi OpenMP* Xeon Phi OpenMP* CPU Xeon Phi CPU Xeon Phi omp parallel Xeon Phi 4 1 uos 4 ( ) OpenMP* 1 CPU Xeon Phi float OMP_reduction_OMP(float *data, int size) float ret = 0; #pragma offload target(mic) in(size) in(data:length(size)) #pragma omp parallel for reduction(+:ret) for (int i=0; i<size; ++i) ret += data[i]; return ret; 5: OpenMP* (C/C++) OpenMP* Fortran <install-dir>\samples \ja_jp\fortran\mic_samples\leo_fortran_intro 28

Xeon Phi MICROSOFT* WINDOWS* real function FTNReductionOMP(data, size) implicit none integer :: size, i real, dimension(size) :: data real :: ret = 0.0!dir$ omp offload target(mic) in(size) in(data:length(size))!$omp parallel do reduction(+:ret) do i=1,size ret = ret + data(i) enddo!$omp end parallel do FTNReductionOMP = ret return end function FTNReductionOMP 6: OpenMP* (Fortran) Xeon Phi : OpenMP* + Cilk Plus OpenMP* Cilk Plus Cilk Plus sec_reduce_add() MIC 32 512 float OMPnthreads_CilkPlusEAN_reduction(float *data, int size) float ret=0; #pragma offload target(mic) in(data:length(size)) int nthreads = omp_get_max_threads(); int ElementsPerThread = size/nthreads; #pragma omp parallel for reduction(+:ret) for(int i=0;i<nthreads;i++) ret = sec_reduce_add( data[i*elementsperthread:elementsperthread]); // for(int i=nthreads*elementsperthread; i<size; i++) ret+=data[i]; return ret; 7: Open MP* Cilk Plus (C/C++) 29

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi : Cilk Plus Cilk Plus Cilk Plus MIC #pragma offload_attribute(push,target(mic)) #pragma offload_attribute(pop) #pragma offload_attribute(push,target(mic)) #include <cilk/cilk.h> #include <cilk/reducer_opadd.h> #pragma offload_attribute(pop) 8: (C/C++) cilk_for float ReduceCilk(float*data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) cilk::reducer_opadd<int> total; cilk_for (int i=0; i<size; ++i) total += data[i]; ret = total.get_value(); return ret; 9: cilk_for Xeon Phi : TBB Cilk Plus TBB Cilk Plus MIC #pragma offload_attribute (push,target(mic)) #include "tbb/task_scheduler_init.h" #include "tbb/blocked_range.h" #include "tbb/parallel_reduce.h" #include "tbb/task.h" #pragma offload_attribute (pop) using namespace tbb; 10: TBB (C/C++) 30

Xeon Phi MICROSOFT* WINDOWS* Xeon Phi declspec(target(mic)) parallel_reduce 1 () join 1. MIC declspec(target(mic)) #ifdef MIC class declspec(target(mic)) ReduceTBB private: float *my_data; public: float sum; void operator()( const blocked_range<size_t>& r ) float *data = my_data; for( size_t i=r.begin(); i!=r.end(); ++i) sum += data[i]; ReduceTBB( ReduceTBB& x, split) : my_data(x.my_data), sum(0) void join( const ReduceTBB& y) sum += y.sum; ReduceTBB( float data[] ) : my_data(data), sum(0) ; #endif 11: MIC TBB (C/C++) 2. Xeon Phi declspec(target(mic)) 31

Xeon Phi MICROSOFT* WINDOWS* declspec(target(mic)) float MICReductionTBB(float *data, int size) ReduceTBB redc(data); // task_scheduler_init init; parallel_reduce(blocked_range<size_t>(0, size), redc); return redc.sum; 12: MIC TBB (C/C++) 3. #pragma offload target(mic) TBB float MICReductionTBB(float *data, int size) float ret(0.f); #pragma offload target(mic) in(size) in(data:length(size)) out(ret) ret = _MICReductionTBB(data, size); return ret; 13: TBB (C/C++) : TBB /Qtbb 32

Xeon Phi MICROSOFT* WINDOWS* MKL MKL (NAcc) NAcc Xeon Phi NAcc BLAS LAPACK FFT VML VSL ( MKL MKL NAcc MIC MKL Xeon Phi SGEMM 13: MKL BLAS SGEMM sgemm 1: 2: #pragma offload Xeon Phi free_if(0) Xeon Phi #define PHI_DEV 0 #pragma offload_transfer target(mic:phi_dev) \ in(a:length(matrix_elements) free_if(0)) \ in(b:length(matrix_elements) free_if(0)) \ in(c:length(matrix_elements) free_if(0)) 14: Xeon Phi 3: sgemm Xeon Phi MKL NAcc nocopy() 2 33

Xeon Phi MICROSOFT* WINDOWS* #pragma offload target(mic:phi_dev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) \ out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 15: sgemm 4: 2 alloc_if(0) free_if(1) #pragma offload_transfer target(mic:phi_dev) \ nocopy(a:length(matrix_elements) alloc_if(0) free_if(1)) \ nocopy(b:length(matrix_elements) alloc_if(0) free_if(1)) \ nocopy(c:length(matrix_elements) alloc_if(0) free_if(1)) 16: MKL OpenMP* #pragma offload target(mic:phi_dev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data omp_set_num_threads(64); // set num threads in openmp sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 17: omp_set_num_threads() Xeon Phi MKL MKL mkl_mic_enable() MKL Xeon Phi mkl_mic_disable() _Cilk_offload #pragma offload Xeon Phi MKL 34

Xeon Phi MICROSOFT* WINDOWS* _Cilk_offload #pragma offload (_Cilk_offload #pragma offload ) <install-dir>\mkl\examples\examples_ mic\mic_ao\blasc (C ) <install-dir>\mkl\examples\examples_mic\mic_ao\blasf (Fortran ) Xeon Phi MIC <install-dir> \Documentation\ja_JP\debugger\gdb\pdf\vsmigdb_config_guide.pdf Xeon Phi VTune Amplifier XE 2015 for Windows* Xeon Phi http://www.isus.jp/article/idz/mic-developer/ 2: 35

Xeon Phi MICROSOFT* WINDOWS* A: Linux* Xeon Phi Linux* 1. exit: > exit 2. ls: ls l > ls a.out libioomp5.so 3. pwd: > pwd /root 4. cd <path>: > cd /tmp 5. ps: > ps 5847 root 0:00 /sbin/sshd 5914 micuser 2:47 /bin/coi_daemon coiuser=micuser 6. kill -9 <pid>: ID > kill -9 4555 7. top: CPU > top 8. cp <sourcefilename> <destinationfilename>: > cp file1 file2 9. rm <filename>: > rm file1 10. less <filename>: > less file2 36

Xeon Phi MICROSOFT* WINDOWS* 11. grep <pattern>: coi grep ps > ps grep coi 5914 micuser 2:47 /bin/coi_daemon coiuser=micuser 12. export: > export LD_LIBRARY_PATH=/tmp 13. Ctrl + C: 37

Xeon Phi MICROSOFT* WINDOWS* Loc Q Nguyen MBA & 38

Xeon Phi MICROSOFT* WINDOWS* Intel's Terms and Conditions of Sale ( ) 1-800- 548-4725 ( ) Web (http://www.intel.com/design/literature.htm) IntelIntel Cilk Xeon Phi VTune Xeon / Intel Corporation * 2015 Intel Corporation. 39

Xeon Phi MICROSOFT* WINDOWS* SIMD 2 ( SSE2) SIMD 3 ( SSE3) SIMD 3 (SSE3) #20110804 40