Xeon Phi 1.8
Xeon Phi... 4... 4 :... 4 :... 4... 4... 5 Xeon Phi... 5 ( MIC)... 6... 7... 7... 7... 8... 10 Xeon Phi... 10 Xeon Phi... 10 Xeon Phi... 10 Xeon Phi... 11 Xeon Phi... 11 Xeon Phi uos... 11... 12 Xeon Phi... 12 /... 12 :... 12 :... 13... 13... 13... 13... 15 makefile... 15... 15... 15... 16... 16... 16... 17... 18... 19 2
Xeon Phi Xeon Phi... 20 Xeon Phi... 20 Xeon Phi Cilk Plus... 21 Xeon Phi Cilk Plus... 22 Xeon Phi TBB... 23 MKL... 24 SGEMM... 25 MKL... 26 Xeon Phi... 26 Xeon Phi... 26... 27... 28... 29... 29 3
Xeon Phi ( MIC) Xeon Phi ( ) C/C++ Fortran http://www.isus.jp/article/idz/mic-developer/ : 1. ( MPSS) 2. Xeon Phi 3. Xeon Phi Parallel Studio XE 2015 4. ( MKL) 5. Xeon Phi 6. (BKM) : 1. ( ) 2. PCIe* XeonPhi Xeon MPSS 3.4 (OS) : Red Hat* Enterprise Linux* 6.3 Red Hat* Enterprise Linux* 6.4 Red Hat* Enterprise Linux* 6.5 Red Hat* Enterprise Linux* 6.6 Red Hat* Enterprise Linux* 7.0 SUSE* Linux* Enterprise Server SLES 11 SP2 SUSE* Linux* Enterprise Server SLES 11 SP3 Xeon Phi (uos) Linux* Xeon Phi (ISA) 1 / I/O (VPU) SIMD (Single Instruction Multiple Data) CPU (NAcc) Xeon Phi MKL C/C++ Fortran Xeon Phi Xeon Phi 1 Intel acronyms dictionary, 8/6/2009, http://library.intel.com/dictionary/details.aspx?id=5600 4
Xeon Phi MPSS Xeon Phi (SCIF) Xeon Phi Xeon SCIF API PCIe ( Xeon Phi ) 2 Xeon PCIe* x16 1 2 Xeon Phi GPU Xeon Phi 1: Xeon Phi 1 : Xeon Phi Linux* : Xeon Phi : / 5
Xeon Phi PCIe* : /usr/bin/micinfo /usr/bin/micflash /usr/sbin/micctrl OS (uos): Xeon Phi Linux* : uos Linux* SCIF http://www.isus.jp/article/mic-article/software-stack-mpss/ ( MIC) Xeon Phi 61 MIC 1GHz ( 1.3GHz) MIC x86 ISA 64 512 SIMD 4 2: MIC (VPU) VPU 32 512 512 SIMD ISA VPU MIC Xeon Phi VPU MIC SIMD ISA ( MMX SSE AVX ) 6
Xeon Phi 32KB L1 32KB L1 512KB L2 L2 32MB LLC http://www.isus.jp/article/idz/mic-developer/ Xeon Phi () Xeon Phi (IDZ) http://software.intel.com/mic-developer () TOOLS & DOWNLOADSSo Drivers: Intel Manycore Platform Software Stack (Intel MPSS) 1. http://software.intel.com/mic-developer ( ) TOOLS & DOWNLOADSSoftware Drivers: Intel Manycore Platform Software Stack (Intel Linux* Readme (readme.txt) (releasenotes-linux.txt) 2. OS Red Hat* Enterprise Linux* (64 ) 6.3 2.6.32-279 Red Hat* Enterprise Linux* (64 ) 6.4 2.6.32-358 Red Hat* Enterprise Linux* (64 ) 6.5 2.6.32-431 Red Hat* Enterprise Linux* (64 ) 6.6 2.6.32-504 Red Hat* Enterprise Linux* (64 ) 7.0 3.10.0-123 SUSE* Linux* Enterprise Server SLES 11 SP2 3.0.13-0.27-default SUSE* Linux* Enterprise Server SLES 11 SP3 3.0.76-0.11-default (readme.txt 2.1 ) uos ssh : Red Hat* Linux* Linux* MPSS readme.txt 2.1 3. root 4. 1 (<mpssversion>-linux.tar) <mpss-version> mpss-3.4 5. readme.txt 2.2 RPM 6. readme.txt 2.4 7. 7
Xeon Phi 8. Xeon Phi ( ) micinfo sudo service mpss start (RHEL 7.0 "sudo systemctl start mpss" ) sudo micctrl w sudo /usr/bin/micinfo Driver VersionMPSS Version Flash Version MPSS Driver Version MPSS Version Flash Version mpss-3.4 3.4-xx 3.4 2.1.02.0390 mpss-3.3 3.3-xx 3.2 2.1.02.0390 mpss-3.2 3.2-xx 3.2 2.1.03.0386 mpss-3.1 3.1-xx 3.1 2.1.03.0386 mpss_gold_update_3-2.1.6720-13 6720-13 2.1.6720-13 2.1.02.0386 KNC_gold_update_2-2.1.5889-16 5889-16 2.1.5889-16 2.1.05.0385 KNC_gold_update_1-2.1.4982-15 4982-15 2.1.4982-15 2.1.05.0375 KNC_gold-2.1.4346-xx 4346-xx 2.1.4346-xx 2.1.01.0375 1: MPSS Driver VersionMPSS Version Flash Version http://www.xlsoft.com/jp/products/intel/products.html ( Parallel Studio XE 2015 Cluster Edition Parallel Studio XE 2015 Professional Edition ) Xeon Phi http://software.intel.com/en-us/mic-developer/ Tools and Downloads Intel Software Development Products (http://registrationcenter.intel.com) [] Parallel Studio XE 2015 Cluster Edition for Linux* http://www.isus.jp/article/intel-software-devproducts/intel-parallel-studio-xe/ Parallel Studio XE 2015 1. Parallel Studio XE Cluster Edition for Linux* Parallel Studio XE Composer Edition for Linux* VTune Amplifier XE for Linux* [ ] 8
Xeon Phi ( Parallel Studio XE Cluster Edition for Linux* ipsxe2015-cluster-edition-release-notes.pdf Parallel Studio XE Composer Edition for Linux* intel-parallel-studio-xe-2015-composer-edition-release-notes.pdf o o tar xvzf parallel_studio_xe_2015.<update>.<package_num>.tgz ( Parallel Studio XE 2015 Cluster Edition for Linux* ) tar xvf l_composer_2015.<update>.<package_num>.tgz ( Parallel Studio XE 2015 Composer Edition for Linux* ) 2. 3. Xeon Phi "setenv H_TRACE 2" "export H_TRACE=2" /opt/intel/composer_xe_2015.*.*/samples/ja_jp/c++/ mic_sample ( C/C++ ) /opt/intel/composer_xe_2015.*.*/samples/ja_jp/fortran/ mic_sample (Fortran ) ( "MIC:" ) 4. VTune Amplifier XE 2015 a) MPSS MPSS /opt/intel/vtune_ amplifier_xe/bin64/k1om/ sudo sep_micboot_install.sh b) MPSS () ( ) sudo service mpss restart sudo micctrl -r sudo micctrl -w micctrl w micx: online c) d) sudo service mpss stop sudo sep_micboot_uninstall.sh sudo service mpss restart sudo micctrl w 9
Xeon Phi Xeon Phi 1. http://software.intel.com/mic-developer ( ) TOOLS & DOWNLOADS Software Drivers: Intel Manycore Platform Software Stack(Intel MPSS) Readme (readmetxt) (releasenotes-linux.txt) 2. MPSS readme.txt 2.2 2.3 3. readme.txt 2.4 4. 5. Xeon Phi ( ) micinfo sudo service mpss start sudo micctrl -w /usr/bin/micinfo Driver VersionMPSS Version Flash Version 1 Xeon Phi Xeon Phi Xeon Phi micinfo root sudo /usr/sbin /sbin sudo service mpss start sudo micctrl -w /usr/bin/micinfo : uos MPSS 20.12 Xeon Phi Xeon Phi ssh Linux* ssh 2 sudo micctrl -status <micx> 10
Xeon Phi MPSS sudo micctrl -reset <micx> sudo micctrl -boot <micx> sudo micctrl -w /usr/bin/micinfo MPSS sudo service mpss stop sudo service mpss unload sudo service mpss start sudo micctrl -w /usr/bin/micinfo Xeon Phi SMC (System Management and Configuration) 8.3 MPSS /usr/bin/micsmc & GUI Xeon Phi micnativeloadex MIC Xeon Phi 8.5 Xeon Phi uos Linux* ssh root root Linux* scp IP 172.31.<coprocessor>.1 IP 172.31.<coprocessor>.254 mic<coprocessor> "mic0" IP 172.31.1.1 IP 172.31.1.254 2 "mic1" 172.31.2.1 172.31.2.254 11
Xeon Phi root Xeon Phi NFS MPSS /usr/bin root micinfo - micflash - / micsmc - Xeon Phi miccheck Xeon Phi micnativeloadex MIC Xeon Phi micctrl micrasd mpssflash micflash POSIX* mpssinfo micinfo POSIX* MPSS 8 Xeon Phi MIC SIMD (C/C++ Fortran ) MIC ( ) SIMD / Xeon Phi MIC API Xeon Phi MIC : o C++ XE 15.x 64 MIC o Fortran XE 15.x 64 MIC ( Parallel Studio XE 2015 ): o ( MKL) MIC 12
o o Xeon Phi ( TBB) ( IPP) ( Parallel Studio XE 2015 Cluster Edition ): o o ( MIC) MPI for Linux* Trace Collector & Analyzer (): o : SDK for OpenCL* Applications (http://www.isus.jp/article/intel-software-devproducts/intel-opencl/ ) o 64 MIC o C++ Eclipse* ( ) o VTune Amplifier XE 2015 for Linux* Linux* Xeon Phi o Inspector XE 2015 o Advisor XE 2015 source o C++/Fortran XE 15.x: intel64 /opt/intel/composerxe/bin compilervars.csh compilervars.sh source /opt/intel/composer_xe_2015/bin/compilervars.sh intel64 compilervars ( ) o o TBB: intel64 /opt/intel/composer_xe_2015/tbb/bin tbbvars.csh tbbvars.sh MKL: intel64 /opt/intel/composer_xe_2015/mkl/bin mklvars.csh mklvars.sh /opt/intel/composer_xe_2015/documentation/ja_jp/ o compiler_c/index.htm compiler_f/index.htm - C++ XE 15.x Fortran XE 15.x 13
o o Xeon Phi MIC > MIC > MIC MIC > > MIC Release_Notes_*_2015_L_EN.pdf - MIC : (Release_Notes_*_2015_L_EN.pdf) ( ) debugger/debugger_documentation.htm () MIC Starting GDB for Intel Xeon Phi Coprocessor Applications gdb_quickstart_lin.pdf Xeon Phi o MKL /opt/intel/composer_xe_2015/documentation/ja_jp/mkl/mkl_userguide/ind ex.htm Xeon MKL Phi MKL o VTune Amplifier XE 2015 for Linux* Xeon Phi /opt/intel/vtune_amplifier_xe_2015/documentation/en/tutorials/find_lw _hotspots/c++/index.htm () Web : o http://www.isus.jp/article/idz/mic-developer/ Xeon Xeon Phi Xeon Phi () System V Application Binary Interface K1OM Architecture Processor Supplement ( ) Xeon Phi () o http://www.isus.jp/article/mic-article/xeon-phi/ : o C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/intro_sampl ec/ o Fortran: /opt/intel/composer_xe_2015/samples/ja_jp/fortran/mic_samples/ o MKL: /opt/intel/composer_xe_2015/mkl/examples/mic* o MKL : /opt/intel/composer_xe_2015/mkl/examples/mic_ao blasc blasf o MKL : /opt/intel/composer_xe_2015/mkl/examples/mic_offload 14
Xeon Phi : o o C: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples shrd_samplec LEO_tutorial C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/shrd_sample CPP Xeon Phi.so Xeon Phi MIC releasenotes-linux.txt makefile Xeon Phi > > offload-option offload-attributetarget offloadattribute-target ( )no-offload _Cilk_offload #pragma_offload ( ) csh : setenv H_TRACE 1 sh : export H_TRACE=1 csh : setenv H_TRACE 2 sh : export H_TRACE=2 1 2 csh : setenv OFFLOAD_REPORT <1 2> sh : export OFFLOAD_REPORT=<1 2> > Xeon Phi (http://software.intel.com/enus/forums/intel-many-integrated-core) ( ) 15
Xeon Phi CPU 2 CPU Xeon Phi (C/C++) (Fortran) ( MKL) CPU Xeon Phi : Xeon Phi Xeon Phi Xeon Phi Xeon Phi ans = a[0] + a[1] + + a[n-1] : C float reduction(float *data, int size) float ret = 0.f; for (int i=0; i<size; ++i) ret += data[i]; return ret; 1: (C/C++) ( ) #pragma offload target(mic) Xeon Phi ( ) 2 http://dictionary.reference.com/browse/heterogeneous 16
Xeon Phi ( ) in out inout ( ) () ret 1 MIC 1 float reduction(float *data, int size) float ret = 0.f; #pragma offload target(mic) in(data:length(size)) for (int i=0; i<size; ++i) ret += data[i]; return ret; 2: Xeon Phi VPU Cilk Plus MIC 32 512 1 1 sec_reduce_add() 32 512 16 float reduction(float *data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) ret = sec_reduce_add(data[0:size]); // Cilk Plus return ret; 3: (C/C++) Xeon Phi C++ > MIC > MIC /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/intro_samplec/samplec13.c 17
Xeon Phi C/C++ C/C++ > MIC > MIC > > XE 15.x C/C++ 2 (_Cilk_shared _Cilk_offload) ( Fortran ) (_Cilk_shared ) _Cilk_offload / API: void *_Offload_shared_malloc(size_t size); _Offload_shared_free(void *p); API: void *_Offload_shared_aligned_malloc(size_t size, size_t alignment); _Offload_shared_aligned_free(void *p); 2 1 (_Cilk_offload ) _Cilk_shared _Cilk_offload float * _Cilk_shared data; // _Cilk_shared float MIC_OMPReduction(int size) #ifdef MIC float Result; int nthreads = 32; omp_set_num_threads(nthreads); #pragma omp parallel for reduction(+:result) for (int i=0; i<size; ++i) Result += data[i]; return Result; #else printf("intel(r) Xeon Phi(TM) Coprocessor not available\n"); 18
Xeon Phi #endif return 0.0f; int main() size_t size = 1*1e6; int n_bytes = size*sizeof(float); data = (_Cilk_shared float *)_Offload_shared_malloc (n_bytes); for (int i=0; i<size; ++i) data[i] = i%10; _Cilk_offload MIC_OMPReduction(size); _Offload_shared_free(data); return 0; 4: _Cilk_shared _Cilk_offload (C/C++) C: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples shrd_samplec LEO_tutorial C++: /opt/intel/composer_xe_2015/samples/ja_jp/c++/mic_samples/shrd_samplecpp C++ Fortran C++ > MIC > MIC > > Xeon Phi Xeon Phi ( NFS ) : 1. openmp_sample.c /opt/intel/composer_xe_2015/samples/ja_jp/c++/openmp_samples/ 2. mmic icc mmic vec-report3 openmp openmp_sample.c 19
Xeon Phi 3. scp a.out mic0:/tmp/a.out 4. OpenMP* /tmp scp /opt/intel/composer_xe_2015/lib/mic/libiomp5.so mic0:/tmp/libiomp5.so 5. ssh ( OpenMP* ) ssh mic0 export LD_LIBRARY_PATH=/tmp 6. ulimit s unlimited 7. /tmp a.out cd /tmp./a.out Xeon Phi 1. ( TBB) 2. OpenMP* 3. Cilk Plus 4. Pthreads* Xeon Phi Xeon Phi : OpenMP* CPU OpenMP* Xeon Phi OpenMP* / OpenMP* 1 Xeon Phi OpenMP* Xeon Phi OpenMP* CPU Xeon Phi CPU Xeon Phi omp parallel Xeon Phi 4 1 uos 4 ( ) 20
Xeon Phi OpenMP* 1 CPU Xeon Phi float OMP_reduction(float *data, int size) float ret = 0; #pragma offload target(mic) in(size) in(data:length(size)) #pragma omp parallel for reduction(+:ret) for (int i=0; i<size; ++i) ret += data[i]; return ret; 5: OpenMP* (C/C++) real function FTNReductionOMP(data, size) implicit none integer :: size real, dimension(size) :: data real :: ret = 0.0!dir$ omp offload target(mic) in(size) in(data:length(size))!$omp parallel do reduction(+:ret) do i=1,size ret = ret + data(i) enddo!$omp end parallel do FTNReductionOMP = ret return end function FTNReductionOMP 6: OpenMP* (Fortran) Xeon Phi : OpenMP* + Cilk Plus OpenMP* Cilk Plus Cilk Plus sec_reduce_add() MIC 32 512 21
Xeon Phi float OMPnthreads_CilkPlusEAN_reduction(float *data, int size) float ret=0; #pragma offload target(mic) in(data:length(size)) int nthreads = omp_get_max_threads(); int ElementsPerThread = size/nthreads; #pragma omp parallel for reduction(+:ret) for(int i=0;i<nthreads;i++) ret =_sec_reduce_add( data[i*elementsperthread:elementsperthread]); // for(int i=nthreads*elementsperthread; i<size; i++) ret+=data[i]; return ret; 7: Open MP* Cilk Plus (C/C++) Xeon Phi : Cilk Plus Cilk Plus MIC #pragma offload_attribute(push,target(mic)) #pragma offload_attribute(pop) #pragma offload_attribute(push,target(mic)) #include <cilk/cilk.h> #include <cilk/reducer_opadd.h> #pragma offload_attribute(pop) 8: (C/C++) cilk_for float ReduceCilk(float*data, int size) float ret = 0; #pragma offload target(mic) in(data:length(size)) cilk::reducer_opadd<int> total; cilk_for (int i=0; i<size; ++i) total += data[i]; ret = total.get_value(); return ret; 9: cilk_for 22
Xeon Phi Xeon Phi : TBB Cilk Plus TBB Cilk Plus MIC #pragma offload_attribute (push,target(mic)) #include "tbb/task_scheduler_init.h" #include "tbb/blocked_range.h" #include "tbb/parallel_reduce.h" #include "tbb/task.h" #pragma offload_attribute (pop) using namespace tbb; 10: TBB (C/C++) Xeon Phi attribute ((target(mic))) parallel_reduce 1 () join 1. MIC attribute ((target(mic))) #ifdef MIC class attribute ((target(mic))) ReduceTBB private: float *my_data; public: float sum; void operator()( const blocked_range<size_t>& r ) float *data = my_data; for( size_t i=r.begin(); i!=r.end(); ++i) sum += data[i]; ReduceTBB( ReduceTBB& x, split) : my_data(x.my_data), sum(0) void join( const ReduceTBB& y) sum += y.sum; ReduceTBB( float data[] ) : my_data(data), sum(0) ; #endif 11: MIC TBB (C/C++) 23
Xeon Phi 2. Xeon Phi attribute ((target(mic))) attribute ((target(mic))) float MICReductionTBB(float *data, int size) ReduceTBB redc(data); // task_scheduler_init init; parallel_reduce(blocked_range<size_t>(0, size), redc); return redc.sum; 12: MIC TBB (C/C++) 3. #pragma offload target(mic) TBB float MICReductionTBB(float *data, int size) float ret(0.f); #pragma offload target(mic) in(size) in(data:length(size)) out(ret) ret = _MICReductionTBB(data, size); return ret; 13: TBB (C/C++) : TBB ltbb tbb MKL MKL (NAcc) NAcc Xeon Phi NAcc BLAS LAPACK FFT VML VSL ( MKL MKL NAcc MIC MKL Xeon Phi 3.1: MKL 24
SGEMM Xeon Phi BLAS SGEMM sgemm 1: 2: #pragma offload Xeon Phi free_if(0) Xeon Phi #define PHI_DEV 0 #pragma offload target(mic:phi_dev) \ in(a:length(matrix_elements) free_if(0)) \ in(b:length(matrix_elements) free_if(0)) \ in(c:length(matrix_elements) free_if(0)) 14: Xeon Phi 3: sgemm Xeon Phi MKL NAcc nocopy() 2 #pragma offload target(mic:phi_dev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) \ out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 15: sgemm 4: 2 alloc_if(0) free_if(1) #pragma offload target(mic:phi_dev) \ in(a:length(matrix_elements) alloc_if(0) free_if(1)) \ in(b:length(matrix_elements) alloc_if(0) free_if(1)) \ in(c:length(matrix_elements) alloc_if(0) free_if(1)) 16: MKL OpenMP* 25
Xeon Phi #pragma offload target(mic:phidev) \ in(transa, transb, N, alpha, beta) \ nocopy(a: alloc_if(0) free_if(0)) nocopy(b: alloc_if(0) free_if(0)) out(c:length(matrix_elements) alloc_if(0) free_if(0)) // output data omp_set_num_threads(64); // set num threads in openmp sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N, &beta, C, &N); 17: omp_set_num_threads() Xeon Phi MKL MKL mkl_mic_enable() MKL Xeon Phi mkl_mic_disable() _Cilk_offload #pragma offload Xeon Phi MKL _Cilk_offload #pragma offload (_Cilk_offload #pragma offload ) <install-dir>/opt/intel/composer_xe_2015/ mkl/examples/mic_ao/blasc (C ) /opt/intel/composer_xe_2015/mkl/examples/mic_ ao/blasf (Fortran ) Xeon Phi MIC http://software.intel.com/mic-developer PROGRAMMING Debugging Intel Xeon Phi Application on Linux* Xeon Phi VTune Amplifier XE 2015 for Linux* Xeon Phi /opt/intel/vtune_amplifier_xe_2015/documentation/help/ index.htm () Getting Started> Intel Xeon Phi Coprocessor Analysis Workflow 26
Xeon Phi Sudha Udanapalli Thiagarajan 2008 2010 2010 ISV MIC Charles Congdon & DEC Alpha Oracle* RDBMS Windows* NT OpenVMS* 64 Sumedh Naik 2009 2012 2012 Xeon Phi Loc Q Nguyen MBA & 27
Xeon Phi Intel's Terms and Conditions of Sale ( ) 1-800-548-4725 ( ) Web (http://www.intel.com/design/literature.htm) IntelIntel Cilk Xeon Phi Vtune Xeon / Intel Corporation * 2015 Intel Corporation. 28
Xeon Phi * www.intel.com/benchmarks () SIMD 2 ( SSE2) SIMD 3 ( SSE3) SIMD 3 (SSE3) #20110804 29