Xeon Phi 1.8

Similar documents
Xeon Phi MICROSOFT* WINDOWS* 1.4 1

ホワイトペーパー インテル Xeon Phi コプロセッサー開発者向けクイック スタート ガイド バージョン 1.7

目次 はじめに... 4 目的... 4 本ガイドに含まれるトピック :... 4 本ガイドに含まれないトピック :... 4 用語... 4 システム構成... 5 インテル Xeon Phi コプロセッサー向けソフトウェア... 5 インテル メニー インテグレーテッド コア ( インテル MI

Presentation title

01_OpenMP_osx.indd

Intel_ParallelStudioXE2013_ClusterStudioXE2013_Introduction.pptx

製品価格 ( 新規購入 ) INT6531 インテル VTune Amplifier XE 2017 for Windows Floating 1-275, ,000 INT6532 インテル VTune Amplifier XE 2017 for Linux Floating 1-27

THE PARALLEL Issue UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHug

02_C-C++_osx.indd

DPD Software Development Products Overview

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

インテルソウトウェア開発製品アカデミック版特定ユーザーライセンス標準価格表 株式会社アークブレイン 2016 年 5 月 10 日 ~ 製品型番 アカデミック版特定ユーザーライセンス 税別標準価格 税込標準価格 INT5744 インテル Parallel Studio XE 2016 Cluster

インテル® Parallel Studio XE 2015 Composer Edition for Linux* インストール・ガイドおよびリリースノート

製品型番 商用版特定ユーザーライセンス INT7001 インテル System Studio 2018 FreeBSD \163,080 INT6673 インテル Media Server Studio 2017 Essentials \84,000 \90,720 Edit INT6674 インテ

インテル(R) Visual Fortran Composer XE

Contents Windows* /Linux* C++/Fortran... 3 Microsoft* embedded Visual C++* C Microsoft* Windows* CE.NET Platform Builder C IP

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

FFTSS Library Version 3.0 User's Guide

OpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë

インテル® Parallel Studio XE 2013 Linux* 版インストール・ガイドおよびリリースノート

今から間にあう仮想化入門とXenについて

OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

インテル® VTune™ パフォーマンス・アナライザー 9.1 Windows* 版

Product Brief 高速なコードを素早く開発 インテル Parallel Studio XE 2017 インテル ソフトウェア開発ツール 概要 高速なコード : 現在および次世代のプロセッサーでスケーリングする優れたアプリケーション パフォーマンスを実現します 迅速に開発 : 高速かつ安定し

Microsoft Word - HOKUSAI_system_overview_ja.docx

Windows SE RAC 10g 構築手順書

workshop Eclipse TAU AICS.key

インテル(R) C++ Composer XE 2011 Windows版 入門ガイド

Click to edit title

アカ版特定ユーザーライセンス INT7006 INT7007 INT7008 INT6685 インテル System Studio 2018 Ultimate on \217,080 r インテル System Studio 2018 Ultimate on \217,080 r インテル Syst

Web Microsoft 2008 R2 Database Database!! Database 04 08

Microsoft PowerPoint - 03_What is OpenMP 4.0 other_Jan18

HP High Performance Computing(HPC)

スパコンに通じる並列プログラミングの基礎

sg_lenovo_os.xlsx

PRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧

スパコンに通じる並列プログラミングの基礎

スレッド化されていないアプリケーションでも大幅なパフォーマンス向上を容易に実現

Oracle Change Management Pack, Oracle Diagnostics Pack, Oracle Tuning Packインストレーション・ガイド リリース2.2

PRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

Oracle8 Workgroup Server for Windows NTインストレーション・ガイド,リリース8.0.6

I I / 47

3 4 SAP HANA 5 6 SAP HANA Xeon E7 v3 SAP HANA 6 8 OLTP OLAP 1 9 SAP S/4HANA SAP HANA Studio 13 14

製品型番 アカデミック版フローティング ライセンス インテル Parallel Studio XE 2018 Composer INT6991 \232,000 \250,560 Floating 2-Pack アカデミック日本語版インテル Parallel Studio XE 2018 Compo

Intel® Compilers Professional Editions

Pentium 4

(Version: 2017/4/18) Intel CPU 1 Intel CPU( AMD CPU) 64bit SIMD Inline Assemler Windows Visual C++ Linux gcc 2 FPU SSE2 Intel CPU do

07-二村幸孝・出口大輔.indd

名称未設定

untitled

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

1 / 1 idrac8 CPU 1 Intel Xeon E v5 Intel Pentium Intel Core i3 Intel Celeron Intel C236 Microsoft Windows Server 2008 R2 SP1 Microsoft Windows S

HP Workstation 総合カタログ

HITACHI Gigabit Fibre Channel (SUSE Linux Enterprise Server IOCard-FP2-Z-199(1)

名称未設定

Quickstart Guide 3rd Edition

Emacs ML let start ::= exp (1) exp ::= (2) fn id exp (3) ::= (4) (5) ::= id (6) const (7) (exp) (8) let val id = exp in

Red Hat Enterprise Linux 6 Portable SUSE Linux Enterprise Server 9 Portable SUSE Linux Enterprise Server 10 Portable SUSE Linux Enterprise Server 11 P

C

hotspot の特定と最適化

インテル® Parallel Studio XE 2013 Windows* 版インストール・ガイドおよびリリースノート

ESMPRO/ServerAgent Extension インストレーションガイド

Code Modernization Online training plan

ACE Associated Computer Experts bv

ExpressUpdate Agent インストレーションガイド

Ver. 3.7 Ver E v3 2.4GHz, 20M cache, 8.00GT/s QPI,, HT, 8C/16T 85W E v3 1.6GHz, 15M cache, 6.40GT/s QPI,, HT,

インテル(R) Visual Fortran Composer XE 2011 Windows版 入門ガイド

<Documents Title Here>

ESMPRO/ServerAgent Extension インストレーションガイド

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

PowerPoint プレゼンテーション

1996 Only One Technology NetJapan System Data Protection Solutions 2009 ActiveImage Protector ActiveImage Protector OS NetJapan System Protection Solu

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

スパコンに通じる並列プログラミングの基礎

Nios® II HAL API を使用したソフトウェア・サンプル集 「Modular Scatter-Gather DMA Core」

C3印刷用.PDF

untitled

new_logo.eps

WebSphere Application Server V5.0 for Linux Ver. 1.11

untitled

BMC Configuration ユーザーズガイド

PowerPoint Presentation

FUJITSU Software Systemwalker Operation Manager V13 カタログ

HPEハイパフォーマンスコンピューティング ソリューション

連載講座 : 高生産並列言語を使いこなす (5) 分子動力学シミュレーション 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 問題の定義 17 2 逐次プログラム 分子 ( 粒子 ) セル 系の状態 ステップ 18

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

倍々精度RgemmのnVidia C2050上への実装と応用

PrintWalker/LXE インストールガイド

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

Parallel Studio XE Parallel Studio XE hotspot ( )

WinHPC ppt

インテル® Fortran Studio XE 2011 SP1 Windows* 版インストール・ガイドおよびリリースノート

<Documents Title Here>

main.dvi

ExpressUpdate Agent インストレーションマニュアル

XcalableMP入門

Transcription:

Xeon Phi 1.8

Xeon Phi... 4... 4 :... 4 :... 4... 4... 5 Xeon Phi... 5 ( MIC)... 6... 7... 7... 7... 8... 10 Xeon Phi... 10 Xeon Phi... 10 Xeon Phi... 10 Xeon Phi... 11 Xeon Phi... 11 Xeon Phi uos... 11... 12 Xeon Phi... 12 /... 12 :... 12 :... 13... 13... 13... 13... 15 makefile... 15... 15... 15... 16... 16... 16... 17... 18... 19 2

Xeon Phi Xeon Phi... 20 Xeon Phi... 20 Xeon Phi Cilk Plus... 21 Xeon Phi Cilk Plus... 22 Xeon Phi TBB... 23 MKL... 24 SGEMM... 25 MKL... 26 Xeon Phi... 26 Xeon Phi... 26... 27... 28... 29... 29 3

Xeon Phi ( MIC) Xeon Phi ( ) C/C++ Fortran http://www.isus.jp/article/idz/mic-developer/ : 1. ( MPSS) 2. Xeon Phi 3. Xeon Phi Parallel Studio XE 2015 4. ( MKL) 5. Xeon Phi 6. (BKM) : 1. ( ) 2. PCIe* XeonPhi Xeon MPSS 3.4 (OS) : Red Hat* Enterprise Linux* 6.3 Red Hat* Enterprise Linux* 6.4 Red Hat* Enterprise Linux* 6.5 Red Hat* Enterprise Linux* 6.6 Red Hat* Enterprise Linux* 7.0 SUSE* Linux* Enterprise Server SLES 11 SP2 SUSE* Linux* Enterprise Server SLES 11 SP3 Xeon Phi (uos) Linux* Xeon Phi (ISA) 1 / I/O (VPU) SIMD (Single Instruction Multiple Data) CPU (NAcc) Xeon Phi MKL C/C++ Fortran Xeon Phi Xeon Phi 1 Intel acronyms dictionary, 8/6/2009, http://library.intel.com/dictionary/details.aspx?id=5600 4

Xeon Phi MPSS Xeon Phi (SCIF) Xeon Phi Xeon SCIF API PCIe ( Xeon Phi ) 2 Xeon PCIe* x16 1 2 Xeon Phi GPU Xeon Phi 1: Xeon Phi 1 : Xeon Phi Linux* : Xeon Phi : / 5

Xeon Phi PCIe* : /usr/bin/micinfo /usr/bin/micflash /usr/sbin/micctrl OS (uos): Xeon Phi Linux* : uos Linux* SCIF http://www.isus.jp/article/mic-article/software-stack-mpss/ ( MIC) Xeon Phi 61 MIC 1GHz ( 1.3GHz) MIC x86 ISA 64 512 SIMD 4 2: MIC (VPU) VPU 32 512 512 SIMD ISA VPU MIC Xeon Phi VPU MIC SIMD ISA ( MMX SSE AVX ) 6

Xeon Phi 32KB L1 32KB L1 512KB L2 L2 32MB LLC http://www.isus.jp/article/idz/mic-developer/ Xeon Phi () Xeon Phi (IDZ) http://software.intel.com/mic-developer () TOOLS & DOWNLOADSSo Drivers: Intel Manycore Platform Software Stack (Intel MPSS) 1. http://software.intel.com/mic-developer ( ) TOOLS & DOWNLOADSSoftware Drivers: Intel Manycore Platform Software Stack (Intel Linux* Readme (readme.txt) (releasenotes-linux.txt) 2. OS Red Hat* Enterprise Linux* (64 ) 6.3 2.6.32-279 Red Hat* Enterprise Linux* (64 ) 6.4 2.6.32-358 Red Hat* Enterprise Linux* (64 ) 6.5 2.6.32-431 Red Hat* Enterprise Linux* (64 ) 6.6 2.6.32-504 Red Hat* Enterprise Linux* (64 ) 7.0 3.10.0-123 SUSE* Linux* Enterprise Server SLES 11 SP2 3.0.13-0.27-default SUSE* Linux* Enterprise Server SLES 11 SP3 3.0.76-0.11-default (readme.txt 2.1 ) uos ssh : Red Hat* Linux* Linux* MPSS readme.txt 2.1 3. root 4. 1 (<mpssversion>-linux.tar) <mpss-version> mpss-3.4 5. readme.txt 2.2 RPM 6. readme.txt 2.4 7. 7

Xeon Phi 8. Xeon Phi ( ) micinfo sudo service mpss start (RHEL 7.0 "sudo systemctl start mpss" ) sudo micctrl w sudo /usr/bin/micinfo Driver VersionMPSS Version Flash Version MPSS Driver Version MPSS Version Flash Version mpss-3.4 3.4-xx 3.4 2.1.02.0390 mpss-3.3 3.3-xx 3.2 2.1.02.0390 mpss-3.2 3.2-xx 3.2 2.1.03.0386 mpss-3.1 3.1-xx 3.1 2.1.03.0386 mpss_gold_update_3-2.1.6720-13 6720-13 2.1.6720-13 2.1.02.0386 KNC_gold_update_2-2.1.5889-16 5889-16 2.1.5889-16 2.1.05.0385 KNC_gold_update_1-2.1.4982-15 4982-15 2.1.4982-15 2.1.05.0375 KNC_gold-2.1.4346-xx 4346-xx 2.1.4346-xx 2.1.01.0375 1: MPSS Driver VersionMPSS Version Flash Version http://www.xlsoft.com/jp/products/intel/products.html ( Parallel Studio XE 2015 Cluster Edition Parallel Studio XE 2015 Professional Edition ) Xeon Phi http://software.intel.com/en-us/mic-developer/ Tools and Downloads Intel Software Development Products (http://registrationcenter.intel.com) [] Parallel Studio XE 2015 Cluster Edition for Linux* http://www.isus.jp/article/intel-software-devproducts/intel-parallel-studio-xe/ Parallel Studio XE 2015 1. Parallel Studio XE Cluster Edition for Linux* Parallel Studio XE Composer Edition for Linux* VTune Amplifier XE for Linux* [ ] 8

Xeon Phi ( Parallel Studio XE Cluster Edition for Linux* ipsxe2015-cluster-edition-release-notes.pdf Parallel Studio XE Composer Edition for Linux* intel-parallel-studio-xe-2015-composer-edition-release-notes.pdf o o tar xvzf parallel_studio_xe_2015.<update>.<package_num>.tgz ( Parallel Studio XE 2015 Cluster Edition for Linux* ) tar xvf l_composer_2015.<update>.<package_num>.tgz ( Parallel Studio XE 2015 Composer Edition for Linux* ) 2. 3. Xeon Phi "setenv H_TRACE 2" "export H_TRACE=2" /opt/intel/composer_xe_2015.*.*/samples/ja_jp/c++/ mic_sample ( C/C++ ) /opt/intel/composer_xe_2015.*.*/samples/ja_jp/fortran/ mic_sample (Fortran ) ( "MIC:" ) 4. VTune Amplifier XE 2015 a) MPSS MPSS /opt/intel/vtune_ amplifier_xe/bin64/k1om/ sudo sep_micboot_install.sh b) MPSS () ( ) sudo service mpss restart sudo micctrl -r sudo micctrl -w micctrl w micx: online c) d) sudo service mpss stop sudo sep_micboot_uninstall.sh sudo service mpss restart sudo micctrl w 9

Xeon Phi Xeon Phi 1. http://software.intel.com/mic-developer ( ) TOOLS & DOWNLOADS Software Drivers: Intel Manycore Platform Software Stack(Intel MPSS) Readme (readmetxt) (releasenotes-linux.txt) 2. MPSS readme.txt 2.2 2.3 3. readme.txt 2.4 4. 5. Xeon Phi ( ) micinfo sudo service mpss start sudo micctrl -w /usr/bin/micinfo Driver VersionMPSS Version Flash Version 1 Xeon Phi Xeon Phi Xeon Phi micinfo root sudo /usr/sbin /sbin sudo service mpss start sudo micctrl -w /usr/bin/micinfo : uos MPSS 20.12 Xeon Phi Xeon Phi ssh Linux* ssh 2 sudo micctrl -status <micx> 10

Xeon Phi MPSS sudo micctrl -reset <micx> sudo micctrl -boot <micx> sudo micctrl -w /usr/bin/micinfo MPSS sudo service mpss stop sudo service mpss unload sudo service mpss start sudo micctrl -w /usr/bin/micinfo Xeon Phi SMC (System Management and Configuration) 8.3 MPSS /usr/bin/micsmc & GUI Xeon Phi micnativeloadex MIC Xeon Phi 8.5 Xeon Phi uos Linux* ssh root root Linux* scp IP 172.31.<coprocessor>.1 IP 172.31.<coprocessor>.254 mic<coprocessor> "mic0" IP 172.31.1.1 IP 172.31.1.254 2 "mic1" 172.31.2.1 172.31.2.254 11

Xeon Phi root Xeon Phi NFS MPSS /usr/bin root micinfo - micflash - / micsmc - Xeon Phi miccheck Xeon Phi micnativeloadex MIC Xeon Phi micctrl micrasd mpssflash micflash POSIX* mpssinfo micinfo POSIX* MPSS 8 Xeon Phi MIC SIMD (C/C++ Fortran ) MIC ( ) SIMD / Xeon Phi MIC API Xeon Phi MIC : o C++ XE 15.x 64 MIC o Fortran XE 15.x 64 MIC ( Parallel Studio XE 2015 ): o ( MKL) MIC 12

o o Xeon Phi ( TBB) ( IPP) ( Parallel Studio XE 2015 Cluster Edition ): o o ( MIC) MPI for Linux* Trace Collector & Analyzer (): o : SDK for OpenCL* Applications (http://www.isus.jp/article/intel-software-devproducts/intel-opencl/ ) o 64 MIC o C++ Eclipse* ( ) o VTune Amplifier XE 2015 for Linux* Linux* Xeon Phi o Inspector XE 2015 o Advisor XE 2015 source o C++/Fortran XE 15.x: intel64 /opt/intel/composerxe/bin compilervars.csh compilervars.sh source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64 compilervars ( ) o o TBB: intel64 /opt/intel/composer_xe_2015/tbb/bin tbbvars.csh tbbvars.sh MKL: intel64 /opt/intel/composer_xe_2015/mkl/bin mklvars.csh mklvars.sh /opt/intel/composer_xe_2015/documentation/ja_jp/ o compiler_c/index.htm compiler_f/index.htm - C++ XE 15.x Fortran XE 15.x 13

o o Xeon Phi MIC > MIC > MIC MIC > > MIC Release_Notes_*_2015_L_EN.pdf - MIC : (Release_Notes_*_2015_L_EN.pdf) ( ) debugger/debugger_documentation.htm () MIC Starting GDB for Intel Xeon Phi Coprocessor Applications gdb_quickstart_lin.pdf Xeon Phi o MKL /opt/intel/composer_xe_2015/documentation/ja_jp/mkl/mkl_userguide/ind ex.htm Xeon MKL Phi MKL o VTune Amplifier XE 2015 for Linux* Xeon Phi /opt/intel/vtune_amplifier_xe_2015/documentation/en/tutorials/find_lw _hotspots/c++/index.htm () Web : o http://www.isus.jp/article/idz/mic-developer/ Xeon Xeon Phi Xeon Phi () System V Application Binary Interface K1OM Architecture Processor Supplement ( ) Xeon Phi () o http://www.isus.jp/article/mic-article/xeon-phi/ : o C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/intro_sampl ec/ o Fortran: /opt/intel/composer_xe_2015/samples/ja_jp/fortran/mic_samples/ o MKL: /opt/intel/composer_xe_2015/mkl/examples/mic* o MKL : /opt/intel/composer_xe_2015/mkl/examples/mic_ao blasc blasf o MKL : /opt/intel/composer_xe_2015/mkl/examples/mic_offload 14

Xeon Phi : o o C: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples shrd_samplec LEO_tutorial C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/shrd_sample CPP Xeon Phi.so Xeon Phi MIC releasenotes-linux.txt makefile Xeon Phi > > offload-option offload-attributetarget offloadattribute-target ( )no-offload _Cilk_offload #pragma_offload ( ) csh : setenv H_TRACE 1 sh : export H_TRACE=1 csh : setenv H_TRACE 2 sh : export H_TRACE=2 1 2 csh : setenv OFFLOAD_REPORT <1 2> sh : export OFFLOAD_REPORT=<1 2> > Xeon Phi (http://software.intel.com/enus/forums/intel-many-integrated-core) ( ) 15

Xeon Phi CPU 2 CPU Xeon Phi (C/C++) (Fortran) ( MKL) CPU Xeon Phi : Xeon Phi Xeon Phi Xeon Phi Xeon Phi ans = a[0] + a[1] + + a[n-1] : C float reduction(float *data, int size) float ret = 0.f; for (int i=0; i<size; ++i) ret += data[i]; return ret; 1: (C/C++) ( ) #pragma offload target(mic) Xeon Phi ( ) 2 http://dictionary.reference.com/browse/heterogeneous 16

Xeon Phi ( ) in out inout ( ) () ret 1 MIC 1 float reduction(float *data, int size) float ret = 0.f; #pragma offload target(mic) in(data:length(size)) for (int i=0; i<size; ++i) ret += data[i]; return ret; 2: Xeon Phi VPU Cilk Plus MIC 32 512 1 1 sec_reduce_add() 32 512 16 float reduction(float *data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) ret = sec_reduce_add(data[0:size]); // Cilk Plus return ret; 3: (C/C++) Xeon Phi C++ > MIC > MIC /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/intro_samplec/samplec13.c 17

Xeon Phi C/C++ C/C++ > MIC > MIC > > XE 15.x C/C++ 2 (_Cilk_shared _Cilk_offload) ( Fortran ) (_Cilk_shared ) _Cilk_offload / API: void *_Offload_shared_malloc(size_t size); _Offload_shared_free(void *p); API: void *_Offload_shared_aligned_malloc(size_t size, size_t alignment); _Offload_shared_aligned_free(void *p); 2 1 (_Cilk_offload ) _Cilk_shared _Cilk_offload float * _Cilk_shared data; // _Cilk_shared float MIC_OMPReduction(int size) #ifdef MIC float Result; int nthreads = 32; omp_set_num_threads(nthreads); #pragma omp parallel for reduction(+:result) for (int i=0; i<size; ++i) Result += data[i]; return Result; #else printf("intel(r) Xeon Phi(TM) Coprocessor not available\n"); 18

Xeon Phi #endif return 0.0f; int main() size_t size = 1*1e6; int n_bytes = size*sizeof(float); data = (_Cilk_shared float *)_Offload_shared_malloc (n_bytes); for (int i=0; i<size; ++i) data[i] = i%10; _Cilk_offload MIC_OMPReduction(size); _Offload_shared_free(data); return 0; 4: _Cilk_shared _Cilk_offload (C/C++) C: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples shrd_samplec LEO_tutorial C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/shrd_samplecpp C++ Fortran C++ > MIC > MIC > > Xeon Phi Xeon Phi ( NFS ) : 1. openmp_sample.c /opt/intel/composer_xe_2015/samples/ja_jp/c++/openmp_samples/ 2. mmic icc mmic vec-report3 openmp openmp_sample.c 19

Xeon Phi 3. scp a.out mic0:/tmp/a.out 4. OpenMP* /tmp scp /opt/intel/composer_xe_2015/lib/mic/libiomp5.so mic0:/tmp/libiomp5.so 5. ssh ( OpenMP* ) ssh mic0 export LD_LIBRARY_PATH=/tmp 6. ulimit s unlimited 7. /tmp a.out cd /tmp./a.out Xeon Phi 1. ( TBB) 2. OpenMP* 3. Cilk Plus 4. Pthreads* Xeon Phi Xeon Phi : OpenMP* CPU OpenMP* Xeon Phi OpenMP* / OpenMP* 1 Xeon Phi OpenMP* Xeon Phi OpenMP* CPU Xeon Phi CPU Xeon Phi omp parallel Xeon Phi 4 1 uos 4 ( ) 20

Xeon Phi OpenMP* 1 CPU Xeon Phi float OMP_reduction(float *data, int size) float ret = 0; #pragma offload target(mic) in(size) in(data:length(size)) #pragma omp parallel for reduction(+:ret) for (int i=0; i<size; ++i) ret += data[i]; return ret; 5: OpenMP* (C/C++) real function FTNReductionOMP(data, size) implicit none integer :: size real, dimension(size) :: data real :: ret = 0.0!dir$ omp offload target(mic) in(size) in(data:length(size))!$omp parallel do reduction(+:ret) do i=1,size ret = ret + data(i) enddo!$omp end parallel do FTNReductionOMP = ret return end function FTNReductionOMP 6: OpenMP* (Fortran) Xeon Phi : OpenMP* + Cilk Plus OpenMP* Cilk Plus Cilk Plus sec_reduce_add() MIC 32 512 21

Xeon Phi float OMPnthreads_CilkPlusEAN_reduction(float *data, int size) float ret=0; #pragma offload target(mic) in(data:length(size)) int nthreads = omp_get_max_threads(); int ElementsPerThread = size/nthreads; #pragma omp parallel for reduction(+:ret) for(int i=0;i<nthreads;i++) ret =_sec_reduce_add( data[i*elementsperthread:elementsperthread]); // for(int i=nthreads*elementsperthread; i<size; i++) ret+=data[i]; return ret; 7: Open MP* Cilk Plus (C/C++) Xeon Phi : Cilk Plus Cilk Plus MIC #pragma offload_attribute(push,target(mic)) #pragma offload_attribute(pop) #pragma offload_attribute(push,target(mic)) #include <cilk/cilk.h> #include <cilk/reducer_opadd.h> #pragma offload_attribute(pop) 8: (C/C++) cilk_for float ReduceCilk(float*data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) cilk::reducer_opadd<int> total; cilk_for (int i=0; i<size; ++i) total += data[i]; ret = total.get_value(); return ret; 9: cilk_for 22

Xeon Phi Xeon Phi : TBB Cilk Plus TBB Cilk Plus MIC #pragma offload_attribute (push,target(mic)) #include "tbb/task_scheduler_init.h" #include "tbb/blocked_range.h" #include "tbb/parallel_reduce.h" #include "tbb/task.h" #pragma offload_attribute (pop) using namespace tbb; 10: TBB (C/C++) Xeon Phi attribute ((target(mic))) parallel_reduce 1 () join 1. MIC attribute ((target(mic))) #ifdef MIC class attribute ((target(mic))) ReduceTBB private: float *my_data; public: float sum; void operator()( const blocked_range<size_t>& r ) float *data = my_data; for( size_t i=r.begin(); i!=r.end(); ++i) sum += data[i]; ReduceTBB( ReduceTBB& x, split) : my_data(x.my_data), sum(0) void join( const ReduceTBB& y) sum += y.sum; ReduceTBB( float data[] ) : my_data(data), sum(0) ; #endif 11: MIC TBB (C/C++) 23

Xeon Phi 2. Xeon Phi attribute ((target(mic))) attribute ((target(mic))) float MICReductionTBB(float *data, int size) ReduceTBB redc(data); // task_scheduler_init init; parallel_reduce(blocked_range<size_t>(0, size), redc); return redc.sum; 12: MIC TBB (C/C++) 3. #pragma offload target(mic) TBB float MICReductionTBB(float *data, int size) float ret(0.f); #pragma offload target(mic) in(size) in(data:length(size)) out(ret) ret = _MICReductionTBB(data, size); return ret; 13: TBB (C/C++) : TBB ltbb tbb MKL MKL (NAcc) NAcc Xeon Phi NAcc BLAS LAPACK FFT VML VSL ( MKL MKL NAcc MIC MKL Xeon Phi 3.1: MKL 24

SGEMM Xeon Phi BLAS SGEMM sgemm 1: 2: #pragma offload Xeon Phi free_if(0) Xeon Phi #define PHI_DEV 0 #pragma offload target(mic:phi_dev) \ in(a:length(matrix_elements) free_if(0)) \ in(b:length(matrix_elements) free_if(0)) \ in(c:length(matrix_elements) free_if(0)) 14: Xeon Phi 3: sgemm Xeon Phi MKL NAcc nocopy() 2 #pragma offload target(mic:phi_dev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) \ out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 15: sgemm 4: 2 alloc_if(0) free_if(1) #pragma offload target(mic:phi_dev) \ in(a:length(matrix_elements) alloc_if(0) free_if(1)) \ in(b:length(matrix_elements) alloc_if(0) free_if(1)) \ in(c:length(matrix_elements) alloc_if(0) free_if(1)) 16: MKL OpenMP* 25

Xeon Phi #pragma offload target(mic:phidev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data omp_set_num_threads(64); // set num threads in openmp sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 17: omp_set_num_threads() Xeon Phi MKL MKL mkl_mic_enable() MKL Xeon Phi mkl_mic_disable() _Cilk_offload #pragma offload Xeon Phi MKL _Cilk_offload #pragma offload (_Cilk_offload #pragma offload ) <install-dir>/opt/intel/composer_xe_2015/ mkl/examples/mic_ao/blasc (C ) /opt/intel/composer_xe_2015/mkl/examples/mic_ ao/blasf (Fortran ) Xeon Phi MIC http://software.intel.com/mic-developer PROGRAMMING Debugging Intel Xeon Phi Application on Linux* Xeon Phi VTune Amplifier XE 2015 for Linux* Xeon Phi /opt/intel/vtune_amplifier_xe_2015/documentation/help/ index.htm () Getting Started> Intel Xeon Phi Coprocessor Analysis Workflow 26

Xeon Phi Sudha Udanapalli Thiagarajan 2008 2010 2010 ISV MIC Charles Congdon & DEC Alpha Oracle* RDBMS Windows* NT OpenVMS* 64 Sumedh Naik 2009 2012 2012 Xeon Phi Loc Q Nguyen MBA & 27

Xeon Phi Intel's Terms and Conditions of Sale ( ) 1-800-548-4725 ( ) Web (http://www.intel.com/design/literature.htm) IntelIntel Cilk Xeon Phi Vtune Xeon / Intel Corporation * 2015 Intel Corporation. 28

Xeon Phi * www.intel.com/benchmarks () SIMD 2 ( SSE2) SIMD 3 ( SSE3) SIMD 3 (SSE3) #20110804 29