Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c
|
|
- ようた えんの
- 5 years ago
- Views:
Transcription
1 Vol.214-HPC-145 No /7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP 1 Tokyo Institute of Technology 2 CREST JST CREST 3 RIKEN AICS c 214 Information Processing Society of Japan OpenACC[6] OpenACC CPU CPU CPU GPU ([3]) CUDA OpenACC Array of Structure (AoS) Structure of Array (SoA) - OpenACC 1
2 Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] clause] ] structured block!$acc end directive-name 1 OpenACC 2. Shuai [1] CUDA OpenCL API Dymaxion++ Dymaxion++ 2 Reshape P lace Reshape 3 trasnpose diagonal indirect PCI-E P lace GPU on-chip High-level OpenACC CUDA GPU CUDA OpenCL OpenACC OpenACC NVIDIA Cray PGI C/C++ Fortran OpenMP GPU CUDA OpenCL GPU GPU OpenACC GPU OpenACC hmpp[2], PGI [9] OpenMP CUDA OpenMPC[5] OpenACC 1 data kernels 3. Sung [8] GPU Array-of-Structure-of-Tiled-Array(ASTA) CUDA OpenCL Low-level Array of Structures ASTA High-level c 214 Information Processing Society of Japan CPU TSUBAME CPU2 ( 12 ) Intel Xeon Phi 24 Xeon Phi KMP AFFINITY=compact OpenMP Intel CPU, Intel Xeon Phi, NVIDIA Kepler [4] OpenACC 1 2 ( 5) 2 row major 5 2 Structure of Arrays (SoA) 2 Array of Structures (AoS) CPU, Xeon Phi, Kepler 2
3 Vol.214-HPC-145 No /7/3 CPU GPU MIC 1 Intel Xeon X567 6cores 2.93 GHz 2 sockets 54 GB Memory NVIDIA Kepler K2X 2688 CUDA cores 6GB Memory Intel Xeon Phi 712X 61 cores 16GB Memory MB/s AoS Copy: AoS Scale: AoS Add: AoS Triad: SoA Copy: SoA Scale: 2 CPU icc -O3 -openmp GPU pgcc -O3 -ta=nvidia,cc35,kepler MIC icc -O3 -mmic -openmp -opt-prefetch-distance=4,1 -opt-streaming-stores always -opt-streaming-cache-evict= 2, 3, 4 Copy : C = A Scale : B = scalar C Add : C = A+B T riad : A = B +scalar C A B C 1M double 2 SoA Array AoS Structure GPU AoS 16 9% MIC SoA AoS CPU AoS [11] C static allocation OpenMP GPU OpenACC 4 P MIC [1] 4 a, b, c c 214 Information Processing Society of Japan MB/s MB/s AoS: Structure 要素数 SoA: Array の本数 SoA Add: SoA Triad: Intel Xeon CPU AoS: Structure 要素数 SoA: Array の本数 AoS Copy: AoS Scale: AoS Add: AoS Triad: SoA Copy: SoA Scale: SoA Add: SoA Triad: Intel Xeon Phi AoS: Structure 要素数 SoA: Array の本数 AoS Copy: AoS Scale: AoS Add: AoS Triad: SoA Copy: SoA Scale: SoA Add: SoA Triad: NVIDIA K2X GPU M SoA M オリジナル AoS M AoS SoA row major 6 SOAOS AoS 3 3
4 Vol.214-HPC-145 No /7/3 #if SOAOS static float a[mimax][mjmax][mkmax][4], b[mimax][mjmax][mkmax][3], c[mimax][mjmax][mkmax][3]; #endif #if SOSOA static float a[4][mimax][mjmax][mkmax], b[3][mimax][mjmax][mkmax], c[3][mimax][mjmax][mkmax]; #endif #if AOS static float abc[mimax][mjmax][mkmax][1]; #endif #if SOA static float abc[1][mimax][mjmax][mkmax]; #endif SOAOS GFlops OMP MIC GPU SOSOA AOS SOA 7 AoS Structure of Array of Structure (SOAOS) 3 AoS SoA SOSOA 3 AoS SoA 7 OpenMP 12 MIC 24 GPU PGI 64 4 OpenMP MIC GPU 1% 4% UPACS UPACS c 214 Information Processing Society of Japan 8 ( ) ( ) UPACS 2 (Convection) (Viscosity) 8 9 cellfacetype UPACS 25.% 37.7% 2 UPACS 9 1, 11 AoS, SoA CPU GPU 12, 13 AoS UPACS cellfacetype CPU GPU 12, 13 GPU SoA 4
5 Vol.214-HPC-145 No /7/3 type cellfacetype real(8) end type real(8), dimension(3) :: real(8), dimension(5) :: :: area, nt nv q_r, q_l, flux real(8) :: shockfix type(cellfacetype), dimension(:,:,:), pointer :: cface allocate(cface(-1:in+1,-1:jn+1,-1:kn+1)) 9 real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1) :: area,nt real(8),dimension(3,-1:in+1,-1:jn+1,-1:kn+1) :: nv real(8),dimension(5,-1:in+1,-1:jn+1,-1:kn+1) :: q_r,q_l,flux real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1) :: shockfix 1 Array of Structures real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1) :: area,nt real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1,3) :: nv real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1,5) :: q_r,q_l,flux real(8),dimension(-1:in+1,-1:jn+1,-1:kn+1) :: shockfix Elapsed )me[sec] 12 Elapsed )me[sec] Structure of Arrays Original AoS SoA Viscosity Convec?on UPACS CPU 5. Original AoS SoA Viscosity Convec:on UPACS GPU c 214 Information Processing Society of Japan CPU - OpenACC CPU 5.1 (1) (2) 2 (1) acc data acc kernels acc data acc data - acc kernels 5
6 Vol.214-HPC-145 No /7/3 #pragma acc trans transpose array_name \ [start : length][start : length][start : length], [1, 3, 2] structured block 14 acc trans (2) acc loop acc loop GPU gang, worker, vector clause 5.2 acc trans( 14) acc trans (1) (2) (3) (optional) (2) C 1 (3) array[i][j][k] 14 [1,3,2] array[i][j][k] array[i][k][j] 6. acc trans -to- ROSE Compiler Infrastructure[7] ( 1 ) Rose c 214 Information Processing Society of Japan #pragma bcc trans transpose(foo_a[:1][:1][:3],[1,3,2]) #pragma acc data copy (foo_a[:1][:1][:3], \ foo_b[:1][:1][:3]) #pragma acc kernels #pragma acc loop gang independent for(k = ;k < 1;k++) #pragma acc loop vector independent for(j = ;j < 1;j++) for(i = ;i < 3;i++) foo_b[k][j][i] = foo_a[k][j][i]; 15 acc trans AST ( 2 ) #pragma acc trans ( 3 ) ( 4 ) 1 ( 5 ) / ( 6 ) #pragma acc trans acc data ( 7 ) acc data data acc trans 7. 6 SOAOS AOS 6
7 Vol.214-HPC-145 No /7/3 #pragma bcc trans transpose \ ( foo_a [ : 1 ] [ : 1 ] [ : 3 ], [ 1, 3, 2 ] ) double *foo_a_generated 1_3_2; foo_a_generated 1_3_2 = ((void *)\ (malloc(sizeof(double ) * 1 * 1 * 3))); transpos_foo_a_1_3_2(((double *)\ foo_a_generated 1_3_2),((double *)foo_a)); MFlops A[i][j][k][4], B[i][j][k][3], C[i][j][k][3] ABC[i][j][k][1] #pragma acc data copy (foo_a_generated 1_3_2[:1 * 1 * 3]\, foo_b[:1][:1][:3]) #pragma acc kernels #pragma acc loop gang independent for (k = ; k < 1; k++) #pragma acc loop vector independent for (j = ; j < 1; j++) for (i = ; i < 3; i++) foo_b[k][j][i] = foo_a_generated 1_3_2 \ [(( * 1 + k) * 3 + i) * 1 + j]; retranspos_foo_a_1_3_2(((double *)foo_a), \ ((double *)foo_a_generated 1_3_2)); free(foo_a_generated 1_3_2); 16 acc trans 2 CPU, Xeon Phi, GPU 17, 18, Xeon Phi K2X GPU [1,2,4,3] A[I][J][K][4] A[I][J][4][K] Xeon Phi GPU 27% 24% CPU acc trans Xeon Phi 7 7GFlops static c 214 Information Processing Society of Japan 17 MFlops MFlops 72 7 [1234] A[i][j][k][4] [1243] A[i][j][4][k] [1423] A[i][4][j][k] [4123] A[4][i][j][k] on CPU ( ) [1234] A[i][j][k][4] 18 [1234] A[i][j][k][4] 19 [1243] A[i][j][4][k] [1423] A[i][4][j][k] [4123] A[4][i][j][k] on Intel Xeon Phi [1243] A[i][j][4][k] [1423] A[i][4][j][k] [4123] A[4][i][j][k] on K2X A[i][j][k][4], B[i][j][k][3], C[i][j][k][3] ABC[i][j][k][1] A[i][j][k][4], B[i][j][k][3], C[i][j][k][3] ABC[i][j][k][1] 1 UPACS 8. OpenACC Intel 7
8 Xeon Phi 27% K2X GPU 24% Array of Structures, Structure of Arrays Vol.214-HPC-145 No /7/3 [1] Che, S., Sheaffer, J. W. and Skadron, K.: Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems, Proceedings of 211 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 11, New York, NY, USA, ACM, pp. 13:1 13:11 (online), DOI: / (211). [2] Dolbeau, R., Bihan, S. and Bodin, F.: A Hybrid Multicore Parallel Programming Environment, High Performance Computing (Valero, M., Joe, K., Kitsuregawa, M. and Tanaka, H., eds.), Lecture Notes in Computer Science, Vol. 194, Springer Berlin / Heidelberg, pp (27). [3] Hoshino, T., Maruyama, N., Matsuoka, S. and Takaki, R.: CUDA vs OpenACC: Performance Case Studies with Kernel Benchmarks and a Memory-Bound CFD Application, Cluster Computing and the Grid, IEEE International Symposium on, Vol., pp (online), DOI: (213). [4] in High Performance Computers, S. S. M. B.: [5] Lee, S. and Eigenmann, R.: OpenMPC: Extended OpenMP Programming and Tuning for GPUs, Proceedings of the 21 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 1, Washington, DC, USA, IEEE Computer Society, pp (online), DOI: 1.119/SC (21). [6] OpenACC-standard.org: The OpenACC Application Programming Interface, (online), available from (211). [7] Schordan, M. and Quinlan, D.: A Source-To-Source Architecture for User-Defined Optimizations, Modular Programming Languages (Böszörményi, L. and Schojer, P., eds.), Lecture Notes in Computer Science, Vol. 2789, Springer Berlin Heidelberg, pp (23). [8] Sung, I.-J., Liu, G. and Hwu, W.-M.: DL: A data layout transformation system for heterogeneous computing, Innovative Parallel Computing (InPar), 212, pp (online), DOI: 1.119/InPar (212). [9] Wolfe, M.: Implementing the PGI Accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU 1, New York, NY, USA, ACM, pp (online), DOI: (21). [1] Xeon Phi (213). [11] c 214 Information Processing Society of Japan 8
1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N
GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa
More informationGPU n Graphics Processing Unit CG CAD
GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac
More informationIPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla
GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler
More informationCPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2
FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT
More informationGPGPU
GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the
More information23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h
23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),
More informationIPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of
GPU 1,a) 1,b) 1,c) 1,d) GPU 1 GPU Structure Of Array Array Of Structure 1. MPS(Moving Particle Semi-Implicit) [1] SPH(Smoothed Particle Hydrodynamics) [] DEM(Distinct Element Method)[] [] 1 Tokyo Institute
More information07-二村幸孝・出口大輔.indd
GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More information1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU
GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationIPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1
SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani
More informationXACCの概要
2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx
More information2. Eades 1) Kamada-Kawai 7) Fruchterman 2) 6) ACE 8) HDE 9) Kruskal MDS 13) 11) Kruskal AGI Active Graph Interface 3) Kruskal 5) Kruskal 4) 3. Kruskal
1 2 3 A projection-based method for interactive 3D visualization of complex graphs Masanori Takami, 1 Hiroshi Hosobe 2 and Ken Wakita 3 Proposed is a new interaction technique to manipulate graph layouts
More information,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation
1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1
More informationGPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 理化学研究所 共通コードプロジェクト
GPU チュートリアル :OpenACC 篇 Himeno benchmark を例題として 高エネルギー加速器研究機構 (KEK) 松古栄夫 (Hideo Matsufuru) 1 December 2018 HPC-Phys 勉強会 @ 理化学研究所 共通コードプロジェクト Contents Hands On 環境について Introduction to GPU computing Introduction
More information3.1 Thalmic Lab Myo * Bluetooth PC Myo 8 RMS RMS t RMS(t) i (i = 1, 2,, 8) 8 SVM libsvm *2 ν-svm 1 Myo 2 8 RMS 3.2 Myo (Root
1,a) 2 2 1. 1 College of Information Science, School of Informatics, University of Tsukuba 2 Faculty of Engineering, Information and Systems, University of Tsukuba a) oharada@iplab.cs.tsukuba.ac.jp 2.
More informationMicrosoft PowerPoint - GPU_computing_2013_01.pptx
GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格
More informationPC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P
PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 PC PC PC PC PC Key Words:Grid, PC Cluster, Distributed
More informationSlides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];
More information倍々精度RgemmのnVidia C2050上への実装と応用
.. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,
More informationIPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP
Android 1 1 1 1 1 Dominic Hillenbrand 1 1 1 ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GPIO API GPIO API GPIO MPEG2 Optical Flow MPEG2 1PE 0.97[W] 0.63[W] 2PE 1.88[w] 0.46[W] 3PE 2.79[W] 0.37[W] Optical
More information! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2
! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale
More informationMATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~
MATLAB における並列 分散コンピューティング ~ Parallel Computing Toolbox & MATLAB Distributed Computing Server ~ MathWorks Japan Application Engineering Group Takashi Yoshida 2016 The MathWorks, Inc. 1 System Configuration
More informationHPC146
2 3 4 5 6 int array[16]; #pragma xmp nodes p(4) #pragma xmp template t(0:15) #pragma xmp distribute t(block) on p #pragma xmp align array[i] with t(i) array[16] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Node
More informationFINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012
FINAL PROGRAM 25th Annual Workshop SWoPP 2012 2012 / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 8 1 ( ) 8 3 ( ) 680-0017 101-5 http://www.torikenmin.jp/kenbun/
More information01_OpenMP_osx.indd
OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS
More informationSecond-semi.PDF
PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:
More informationARTED Xeon Phi Xeon Phi 2. ARTED ARTED (Ab-initio Real-Time Electron Dynamics simulator) RTRS- DFT (Real-Time Real-Space Density Functional Theory, )
Xeon Phi 1,a) 1,3 2 2,3 Intel Xeon Phi PC RTRSDFT ( ) ARTED (Ab-initio Real-Time Electron Dynamics simulator) Xeon Phi OpenMP Intel E5-2670v2 (Ivy-Bridge 10 ) CPU Xeon Phi Symmetric CPU 32 1.68 Symmetric
More information[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP
InfiniBand ACP 1,5,a) 1,5,b) 2,5 1,5 4,5 3,5 2,5 ACE (Advanced Communication for Exa) ACP (Advanced Communication Primitives) HPC InfiniBand ACP InfiniBand ACP ACP InfiniBand Open MPI 20% InfiniBand Implementation
More informationsupercomputer2010.ppt
nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km
More information211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G
211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double
More informationuntitled
c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright
More informationmain.dvi
PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1
More informationHBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU
GPU MapReduce 1 1 1, 2, 3 MapReduce GPGPU GPU GPU MapReduce CPU GPU GPU CPU GPU CPU GPU Map K-Means CPU 2GPU CPU 1.02-1.93 Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments Koichi
More informationTSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日
TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.
More information2ndD3.eps
CUDA GPGPU 2012 UDX 12/5/24 p. 1 FDTD GPU FDTD GPU FDTD FDTD FDTD PGI Acceralator CUDA OpenMP Fermi GPU (Tesla C2075/C2070, GTX 580) GT200 GPU (Tesla C1060, GTX 285) PC GPGPU 2012 UDX 12/5/24 p. 2 FDTD
More informationDO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速
1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES
More informationXcalableMP入門
XcalableMP 1 HPC-Phys@, 2018 8 22 XcalableMP XMP XMP Lattice QCD!2 XMP MPI MPI!3 XMP 1/2 PCXMP MPI Fortran CCoarray C++ MPIMPI XMP OpenMP http://xcalablemp.org!4 XMP 2/2 SPMD (Single Program Multiple Data)
More informationスライド 1
GTC Japan 2013 PGI Accelerator Compiler 新 OpenACC 2.0 の機能と PGI アクセラレータコンパイラ 2013 年 7 月 加藤努株式会社ソフテック 本日の話 OpenACC ディレクティブで出来ることを改めて知ろう! OpenACC 1.0 の復習 ディレクティブ操作で出来ることを再確認 OpenACC 2.0 の新機能 プログラミングの自由度の向上へ
More informationHPC (pay-as-you-go) HPC Web 2
,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570
More information卒業論文
PC OpenMP SCore PC OpenMP PC PC PC Myrinet PC PC 1 OpenMP 2 1 3 3 PC 8 OpenMP 11 15 15 16 16 18 19 19 19 20 20 21 21 23 26 29 30 31 32 33 4 5 6 7 SCore 9 PC 10 OpenMP 14 16 17 10 17 11 19 12 19 13 20 1421
More informationマルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装
2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4
More informationフカシギおねえさん問題の高速計算アルゴリズム
JST ERATO 2013/7/26 Joint work with 1 / 37 1 2 3 4 5 6 2 / 37 1 2 3 4 5 6 3 / 37 : 4 / 37 9 9 6 10 10 25 5 / 37 9 9 6 10 10 25 Bousquet-Mélou (2005) 19 19 3 1GHz Alpha 8 Iwashita (Sep 2012) 21 21 3 2.67GHz
More informationÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë
2015 5 21 OpenMP Hello World Do (omp do) Fortran (omp workshare) CPU Richardson s Forecast Factory 64,000 L.F. Richardson, Weather Prediction by Numerical Process, Cambridge, University Press (1922) Drawing
More informationGPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1
GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla
More informationHP High Performance Computing(HPC)
ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4
More informationLyra 2 2 2 X Y X Y ivis Designer Lyra ivisdesigner Lyra ivisdesigner 2 ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) (1) (2) (3) (4) (5) Iv Studio [8] 3 (5) (4) (1) (
1,a) 2,b) 2,c) 1. Web [1][2][3][4] [5] 1 2 a) ito@iplab.cs.tsukuba.ac.jp b) misue@cs.tsukuba.ac.jp c) jiro@cs.tsukuba.ac.jp [6] Lyra[5] ivisdesigner[6] [7] 2 Lyra ivisdesigner c 2012 Information Processing
More information10D16.dvi
D IEEJ Transactions on Industry Applications Vol.136 No.10 pp.686 691 DOI: 10.1541/ieejias.136.686 NW Accelerating Techniques for Sequence Alignment based on an Extended NW Algorithm Jin Okaze, Non-member,
More informationHPEハイパフォーマンスコンピューティング ソリューション
HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System
More informationdevelop
SCore SCore 02/03/20 2 1 HA (High Availability) HPC (High Performance Computing) 02/03/20 3 HA (High Availability) Mail/Web/News/File Server HPC (High Performance Computing) Job Dispatching( ) Parallel
More informationスパコンに通じる並列プログラミングの基礎
2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:
More informationuntitled
PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit
More informationiphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU
More informationUser-defined Logic Application Memory Manager (Replacement) Application Specific Prefetcher (ASP) Application Kernel On-chip RAM (BRAM) On-chip RAM I/
RTL 1,2,a) 1,b) CPU Verilog HDL RTL 1. CPU GPU Verilog HDL VHDL RTL HDL Vivado HLS Impulse C CPU 1 2 a) takamaeda@arch.cs.titech.ac.jp b) kise@cs.titech.ac.jp RTL RTL RTL Verilog HDL RTL 2. 1 HDL 1 User-defined
More informationスパコンに通じる並列プログラミングの基礎
2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17
More informationIEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm
Neutron Visual Sensing Techniques Making Good Use of Computer Science J-PARC CT CT-PET TB IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm cm cm barn cm thn/ cm s n/ cm
More information先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo
CPU, CPU, Memory-bound CPU,, Memory-bound ( ) Performance Monitoring Counter(PMC), PMC (nmi watchdog), PMC CPU., PMC, CPU, Memory-bound, CPU-bound,, CPU,, PMC,,,, CPU, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS
More information02_C-C++_osx.indd
C/C++ OpenMP* / 2 C/C++ OpenMP* OpenMP* 9.0 1... 2 2... 3 3OpenMP*... 5 3.1... 5 3.2 OpenMP*... 6 3.3 OpenMP*... 8 4OpenMP*... 9 4.1... 9 4.2 OpenMP*... 9 4.3 OpenMP*... 10 4.4... 10 5OpenMP*... 11 5.1
More informationOpenACCによる並列化
実習 OpenACC による ICCG ソルバーの並列化 1 ログイン Reedbush へのログイン $ ssh reedbush.cc.u-tokyo.ac.jp l txxxxx Module のロード $ module load pgi/17.3 cuda ログインするたびに必要です! ワークディレクトリに移動 $ cdw ターゲットプログラム /srcx OpenACC 用のディレクトリの作成
More information2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055
1 1 1 2 DCRA 1. 1.1 1) 1 Tactile Interface with Air Jets for Floating Images Aya Higuchi, 1 Nomin, 1 Sandor Markon 1 and Satoshi Maekawa 2 The new optical device DCRA can display floating images in free
More informationIPSJ SIG Technical Report Vol.2013-ICS-172 No /11/12 1,a), 1,b) Anomaly Detection 1. 1 Nagoya Institute of Technology 1 Presently with Nagoya In
1,a), 1,b) Anomaly Detection 1. 1 Nagoya Institute of Technology 1 Presently with Nagoya Institute of Technology a) otsuka.takanobu@nitech.ac.jp b) ito.takayuki@nitech.ac.jp Anomaly Detection 2 3 4 5 6
More information1. Graphics Processing Units (GPU) General-Purpose GPU (GPGPU) GPU TSUBAME 2.0[1] GPU 515 GFlops 150 GB/s GPU [2], [3], [4], [5], [6], [7], [8] GPU NV
AMR GPU 1,a) 1,b) 1,c) 2011 11 4, 2011 12 1 Adaptive mesh refinement (AMR) GPU CPU GPU AMR GPU AMR GPU CPU GPU AMR AMR GPU C++ 3 GPU Framework for Block-type AMR method on GPU computing Shimokawabe Takashi
More information1 OpenCL Work-Item Private Memory Workgroup Local Memory Compute Device Global/Constant Memory Host Host Memory OpenCL CUDA CUDA Compute Unit MP Proce
GPGPU (VI) GPGPU 1 GPGPU CUDA CUDA GPGPU GPGPU CUDA GPGPU ( ) CUDA GPGPU 2 OpenCL OpenCL GPGPU Apple Khronos Group OpenCL Working Group [1] CUDA GPU NVIDIA GPU *1 OpenCL NVIDIA AMD GPU CPU DSP(Digital
More informationスパコンに通じる並列プログラミングの基礎
2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6
More informationIPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS
Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) HA-PACS 2012 2 HA-PACS TCA (Tightly Coupled Accelerators) TCA PEACH2 1. (Graphics Processing Unit) HPC GP(General Purpose ) TOP500 [1] CPU PCI Express (PCIe)
More informationスライド 1
GPU クラスタによる格子 QCD 計算 広大理尾崎裕介 石川健一 1.1 Introduction Graphic Processing Units 1 チップに数百個の演算器 多数の演算器による並列計算 ~TFLOPS ( 単精度 ) CPU 数十 GFLOPS バンド幅 ~100GB/s コストパフォーマンス ~$400 GPU の開発環境 NVIDIA CUDA http://www.nvidia.co.jp/object/cuda_home_new_jp.html
More informationShonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral
MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**
More informationB 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1
TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU 2010 11 TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME 1.0 30 2.4PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of
More informationHPC143
研究背景 GPUクラスタ 高性能 高いエネルギー効率 低価格 様々なHPCアプリケーションで用いられている TCA (Tightly Coupled Accelerators) 密結合並列演算加速機構 筑波大学HA-PACSクラスタ アクセラレータ GPU 間の直接通信 低レイテンシ 今後のHPCアプリは強スケーリングも重要 TCAとアクセラレータを搭載したシステムに おけるプログラミングモデル 例
More informationRun-Based Trieから構成される 決定木の枝刈り法
Run-Based Trie 2 2 25 6 Run-Based Trie Simple Search Run-Based Trie Network A Network B Packet Router Packet Filtering Policy Rule Network A, K Network B Network C, D Action Permit Deny Permit Network
More informationfiš„v8.dvi
(2001) 49 2 333 343 Java Jasp 1 2 3 4 2001 4 13 2001 9 17 Java Jasp (JAva based Statistical Processor) Jasp Jasp. Java. 1. Jasp CPU 1 106 8569 4 6 7; fuji@ism.ac.jp 2 106 8569 4 6 7; nakanoj@ism.ac.jp
More informationuntitled
AMD HPC GP-GPU Opteron HPC 2 1 AMD Opteron 85 FLOPS 10,480 TOP500 16 T2K 95 FLOPS 10,800 140 FLOPS 15,200 61 FLOPS 7,200 3 Barcelona 4 2 AMD Opteron CPU!! ( ) L1 5 2003 2004 2005 2006 2007 2008 2009 2010
More informationuntitled
taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前
More information1重谷.PDF
RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999
More information(a) (b) (c) Fig. 2 2 (a) ; (b) ; (c) (a)configuration of the proposed system; (b)processing flow of the system; (c)the system in use 1 GPGPU (
1 1 1 (a) (b) imperceptible A Realtime and Adaptive Technique for Projection onto Non-Flat Surfaces Using a Mobile Projector Camera System Eiji Seki, 1 Dao Vinh Ninh 1 and Masanori Sugimoto 1 In this paper,
More informationuntitled
(SPLE) 2009/10/23 SRA yosikazu@sra.co.jp First, a Message from My Employers 2 SRA CMMI ] SPICE 3 And Now, Today s Feature Presentation 4 Engineering SPLE SPLE SPLE SPLE SPLE 5 6 SPL Engineering Engineeringi
More information( 1) 3. Hilliges 1 Fig. 1 Overview image of the system 3) PhotoTOC 5) 1993 DigitalDesk 7) DigitalDesk Koike 2) Microsoft J.Kim 4). 2 c 2010
1 2 2 Automatic Tagging System through Discussing Photos Kazuma Mishimagi, 1 Masashi Toda 2 and Toshio Kawashima 2 Many media forms can be stored easily at present. Photographs, for example, can be easily
More informationMicrosoft Word - 0_0_表紙.doc
2km Local Forecast Model; LFM Local Analysis; LA 2010 11 2.1.1 2010a LFM 2.1.1 2011 3 11 2.1.1 2011 5 2010 6 1 8 3 1 LFM LFM MSM LFM FT=2 2009; 2010 MSM RMSE RMSE MSM RMSE 2010 1 8 3 2010 6 2010 6 8 2010
More informationIPSJ SIG Technical Report Vol.2017-HPC-158 No /3/9 OpenACC MPS 1,a) 1 Moving Particle Semi-implicit (MPS) MPS MPS OpenACC GPU 2 4 GPU NVIDIA K2
OpenACC MPS 1,a) 1 Movng Partcle Sem-mplct (MPS) MPS MPS OpenACC GPU 2 4 GPU NVIDIA K20c GTX1080 P100(PCIe) P100(NVlnk) 5 OpenACC 3.5 3 Fortran 29.0 74.5 GPU 1. MPS [1] 1 MPS MPS CUDA GPU [2] [3] [4] OpenACC
More informationAmazon EC2 IaaS (Infrastructure as a Service) HPCI HPCI ( VM) VM VM HPCI VM OS VM HPCI HPC HPCI RENKEI-PoP 2 HPCI HPCI 1 HPCI HPCI HPC CS
HPCI 1 2 3 4 5 1, 6 5 24 HPCI HPC OS HPC RENKEI-PoP Design of Advanced Software Deployment Infrastructure in HPCI Wide-area Distributed Environment Shinichiro Takizawa, 1 Masaharu Munetomo, 2 Atsuya Uno,
More informationHPC pdf
GPU 1 1 2 2 1 1024 3 GPUGraphics Unit1024 3 GPU GPU GPU GPU 1024 3 Tesla S1070-400 1 GPU 2.6 Accelerating Out-of-core Cone Beam Reconstruction Using GPU Yusuke Okitsu, 1 Fumihiko Ino, 1 Taketo Kishi, 2
More information09中西
PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)
More informationメモリ階層構造を考慮した大規模グラフ処理の高速化
, CREST ERATO 0.. (, CREST) ERATO / 8 Outline NETAL (NETwork Analysis Library) NUMA BFS raph500, reenraph500 Kronecker raph Level Synchronized parallel BFS Hybrid Algorithm for Parallel BFS NUMA Hybrid
More informationCUDA 連携とライブラリの活用 2
1 09:30-10:00 受付 10:00-12:00 Reedbush-H ログイン GPU 入門 13:30-15:00 OpenACC 入門 15:15-16:45 OpenACC 最適化入門と演習 17:00-18:00 OpenACC の活用 (CUDA 連携とライブラリの活用 ) CUDA 連携とライブラリの活用 2 3 OpenACC 簡単にGPUプログラムが作成できる それなりの性能が得られる
More informationDual Stack Virtual Network Dual Stack Network RS DC Real Network 一般端末 GN NTM 端末 C NTM 端末 B IPv4 Private Network IPv4 Global Network NTM 端末 A NTM 端末 B
root Android IPv4/ 1 1 2 1 NAT Network Address Translation IPv4 NTMobile Network Traversal with Mobility NTMobile Android 4.0 VPN API VpnService root VpnService IPv4 IPv4 VpnService NTMobile root IPv4/
More information27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM U
YouTube 2016 2 16 27 YouTube YouTube UGC User Generated Content CDN Content Delivery Networks LRU Least Recently Used UGC YouTube CGM Consumer Generated Media CGM CGM UGC UGC YouTube k-means YouTube YouTube
More informationIPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc
iphone 1 1 1 iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Processing Unit)., AR Realtime Natural Feature Tracking Library for iphone Makoto
More informationDEIM Forum 2017 H ,
DEIM Forum 217 H5-4 113 8656 7 3 1 153 855 4 6 1 3 2 1 2 E-mail: {satoyuki,haya,kgoda,kitsure}@tkl.iis.u-tokyo.ac.jp,.,,.,,.,, 1.. 1956., IBM IBM RAMAC 35 IBM 35 24 5, 5MB. 1961 IBM 131,,, IBM 35 13.,
More informationCore1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0
AMBA 1 1 1 1 FabScalar FabScalar AMBA AMBA FutureBus Improvement of AMBA Bus Frame-work for Heterogeneos Multi-processor Seto Yusuke 1 Takahiro Sasaki 1 Kazuhiko Ohno 1 Toshio Kondo 1 Abstract: The demand
More informationOpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))
OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1 ( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5]
More informationTHE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] I/O Abstr
THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] 466-8555 E-mail: fukushima@nitech.ac.jp I/O Abstract [Invited] High-Performance Computing Programming
More informationIPSJ SIG Technical Report 1, Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1
1, 2 1 1 1 Instrument Separation in Reverberant Environments Using Crystal Microphone Arrays Nobutaka ITO, 1, 2 Yu KITANO, 1 Nobutaka ONO 1 and Shigeki SAGAYAMA 1 This paper deals with instrument separation
More informationTCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of
IEEE802.11 [1]Bluetooth [2] 1 1 (1) [6] Ack (Ack) BEC FEC (BEC) BEC FEC 100 20 BEC FEC 6.19% 14.1% High Throughput and Highly Reliable Transmission in MANET Masaaki Kosugi 1 and Hiroaki Higaki 1 1. LAN
More informationPowerPoint Presentation
ヘテロジニアスな環境におけるソフトウェア開発 Agenda 今日の概要 ヘテロジニアスな環境の登場 ホモジニアスからヘテロジニアスへ ヘテロジニアスなアーキテクチャ GPU CUDA OpenACC, XeonPhi 自分のプログラムを理解するために デバッガ 共通の操作体験 TotalView 続きはブースで より速く ホモジーニアスな並列 HPC 銀河生成 金融のリスク計算 車の衝突解析 製薬
More informationI I / 47
1 2013.07.18 1 I 2013 3 I 2013.07.18 1 / 47 A Flat MPI B 1 2 C: 2 I 2013.07.18 2 / 47 I 2013.07.18 3 / 47 #PJM -L "rscgrp=small" π-computer small: 12 large: 84 school: 24 84 16 = 1344 small school small
More informationストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY
SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6
More information( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I
GPGPU (II) GPGPU CUDA 1 GPGPU CUDA(CUDA Unified Device Architecture) CUDA NVIDIA GPU *1 C/C++ (nvcc) CUDA NVIDIA GPU GPU CUDA CUDA 1 CUDA CUDA 2 CUDA NVIDIA GPU PC Windows Linux MaxOSX CUDA GPU CUDA NVIDIA
More informationGPUコンピューティング講習会パート1
GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の
More information