CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2
|
|
- まさとし うづき
- 5 years ago
- Views:
Transcription
1 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT O(n 2 ) FFT O(n log n) n DFT FFT [4, 16] FFT Cooley-Tukey [6] Stockham [5, 13] スーパーコンピューティングニュース
2 CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT FFT [16, 4] FFT FFT radix p p 1 Y (k) = X(j)Ω j ωp jk (2) j=0 Ω twiddle factor [4] 1 p FFT X(j) Ω j p DFT[10] (2) [12, 15] memory hierarchy 1 locality スーパーコンピューティングニュース
3 CPU L1 Cache L2 Cache Main Memory 2: RISC RISC L1 Cache L2 Cache 3 スーパーコンピューティングニュース
4 C SUBROUTINE ZAXPY(N,A,X,Y) IMPLICIT REAL*8 (A-H,O-Z) COMPLEX*16 A,X(*),Y(*) DO I=1,N Y(I)=Y(I)+A*X(I) END DO RETURN END 3: ZAXPY ZAXPY FFT FFT ZAXPY A X plus Y 4 Intel Xeon 3.06 GHz FSB 533 MHz 512 KB L2 cache PC2100 DDR- SDRAM Intel C Compiler 8.0 Intel Pentium4 SIMD Single Instruction Multiple Data SSE2 [8] x87 Intel MKL Math Kernel Library Version [9] BLAS Basic Linear Algebra Subprograms ZAXPY 3 1 iteration 4 load 4 store 2 4 L2 N 8192 SSE2 with SSE2 3 GFLOPS Xeon 3.06 GHz 6.12 GFLOPS L2 x87 4 Six-Step FFT six-step FFT [3, 16] six-step FFT FFT スーパーコンピューティングニュース
5 スーパーコンピューティングニュース
6 スーパーコンピューティングニュース
7 1 COMPLEX*16 X(N1,N2),Y(N2,N1),U(N2,N1) 2 DO I=1,N1 3 DO J=1,N2 4 Y(J,I)=X(I,J) 5 END DO 6 END DO 7 DO I=1,N1 8 CALL IN CACHE FFT(Y(1,I),N2) 9 END DO 10 DO I=1,N1 11 DO J=1,N2 12 Y(J,I)=Y(J,I)*U(J,I) 13 END DO 14 END DO 15 DO J=1,N2 16 DO I=1,N1 17 X(I,J)=Y(J,I) 18 END DO 19 END DO 20 DO J=1,N2 21 CALL IN CACHE FFT(X(1,J),N1) 22 END DO 23 DO I=1,N1 24 DO J=1,N2 25 Y(J,I)=X(I,J) 26 END DO 27 END DO 5: six-step FFT 6 six-step FFT 7 6 NB NP WORK 7 X WORK Y 1 16 WORK WORK X WORK multicolumn FFT six-step FFT two-pass [3, 16] six-step FFT n FFT O(n log n) O(n) Step 2 Step 4 column FFT L1 n column FFT L1 [1, 2] column FFT L1 column FFT two-pass three-pass FFT six-step FFT スーパーコンピューティングニュース
8 1 COMPLEX*16 X(N1,N2),Y(N2,N1),U(N1,N2) 2 COMPLEX*16 WORK(N2+NP,NB) 3 DO II=1,N1,NB 4 DO JJ=1,N2,NB 5 DO I=II,II+NB-1 6 DO J=JJ,JJ+NB-1 7 WORK(J,I-II+1)=X(I,J) 8 END DO 9 END DO 10 END DO 11 DO I=1,NB 12 CALL IN CACHE FFT(WORK(1,I),N2) 13 END DO 14 DO J=1,N2 15 DO I=II,II+NB-1 16 X(I,J)=WORK(J,I-II+1)*U(I,J) 17 END DO 18 END DO 19 END DO 20 DO JJ=1,N2,NB 21 DO J=JJ,JJ+NB-1 22 CALL IN CACHE FFT(X(1,J),N1) 23 END DO 24 DO I=1,N1 25 DO J=JJ,JJ+NB-1 26 Y(J,I)=X(I,J) 27 END DO 28 END DO 29 END DO 6: six-step FFT out-of-place Stockham [5, 13] Step 2 4 multicolumn FFT O( n) FFT Step O( n) WORK 6 In-Cache FFT multicolumn FFT column FFT in-cache FFT Stockham [5, 13] Stockham Cooley-Tukey [6] Cooley-Tukey 2 [6] 2 Stockham n = 2lm l m 2 l n/2 スーパーコンピューティングニュース
9 1. Partial transpose NB 2. NB individual N2-point FFTs NB NB N N Array X N Array WORK N padding NP 3. Partial transpose N2 NB N1 Array WORK N Array X NB 4. NB individual N1-point FFTs N1 NB N2 Array X 7: six-step FFT 2 m 1 2 X Y X Y Y X ω p = e 2πi/p c 0 = X(k + jm) c 1 = X(k + jm + lm) Y (k + 2jm) = c 0 + c 1 Y (k + 2jm + m) = ω j 2l (c 0 c 1 ) 0 j < l 0 k < m 2 FFT 2 FFT 4 8 FFT 2 FFT [14] n = 2 p (p 2) FFT n = 4 q 8 r (0 q 2, r 0) 4 8 FFT n 4 2 FFT six-step FFT multicolumn FFT [3, 16] 5 six-step FFT DO OpenMP[11]!$OMP DO スーパーコンピューティングニュース
10 1: Intel Xeon GHz FFTE 4.0 SSE3 n 1 CPU, 1 core 1 CPU, 2 cores 2 CPUs, 4 cores Time MFLOPS Time MFLOPS Time MFLOPS six-step FFT 3 20 DO WORK MPI FFT 7 Six-Step FFT six-step FFT FFT FFTE version FFT FFTW version [7] n = 2 m m FFT 10 FFT Intel Xeon GHz 4 GB DDR2- SDRAM 32 KB L1 instruction cache 32 KB L1 data cache 4 MB L2 Cache Linux fc6 Intel Fortran version 9.1 Intel C version 9.1 -O3 -xp -openmp 1 FFTE version 4.0 FFTW version six-step FFT FFTE 2CPUs 4cores n FFTW 8 FFT スーパーコンピューティングニュース
11 スーパーコンピューティングニュース
12 [8] Intel Corporation. IA-32 Intel Architecture Software Developer s Manual Volume 2: Instruction Set Reference, [9] Intel Corporation. Intel Math Kernel Library Reference Manual, [10] H. J. Nussbaumer. Fast Fourier Transform and Convolution Algorithms. Springer-Verlag, New York, second corrected and updated edition, [11] OpenMP. Simple, portable, scalable smp programming. [12] R. C. Singleton. An algorithm for computing the mixed radix fast Fourier transform. IEEE Trans. Audio Electroacoust., 17:93 103, [13] P. N. Swarztrauber. FFT algorithms for vector computers. Parallel Computing, 1:45 63, [14] D. Takahashi. A parallel 1-D FFT algorithm for the Hitachi SR8000. Parallel Computing, 29(6): , [15] C. Temperton. Self-sorting mixed-radix fast Fourier transforms. J. Comput. Phys., 52:1 23, [16] C. Van Loan. Computational Frameworks for the Fast Fourier Transform. SIAM Press, Philadelphia, PA, スーパーコンピューティングニュース
untitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationインテル(R) Visual Fortran Composer XE
Visual Fortran Composer XE 1. 2. 3. 4. 5. Visual Studio 6. Visual Studio 7. 8. Compaq Visual Fortran 9. Visual Studio 10. 2 https://registrationcenter.intel.com/regcenter/ w_fcompxe_all_jp_2013_sp1.1.139.exe
More informationFFTSS Library Version 3.0 User's Guide
: 19 10 31 FFTSS 3.0 Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, (CREST),,. http://www.ssisc.org/ Contents 1 4 2 (DFT) 4 3 4 3.1 UNIX............................................
More informationストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY
SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6
More informationItanium2ベンチマーク
HPC CPU mhori@ile.osaka-u.ac.jp Special thanks Timur Esirkepov HPC 2004 2 25 1 1. CPU 2. 3. Itanium 2 HPC 2 1 Itanium2 CPU CPU 3 ( ) Intel Itanium2 NEC SX-6 HP Alpha Server ES40 PRIMEPOWER SR8000 Intel
More informationインテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド
Visual Fortran Composer XE 2013 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.1 (2012/12/10) Copyright 1998-2013 XLsoft Corporation. All Rights Reserved. 1 / 53 ... 3... 4... 4... 5 Visual Studio... 9...
More informationSecond-semi.PDF
PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:
More information,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation
1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1
More information1重谷.PDF
RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999
More informationmain.dvi
PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1
More informationmain.dvi
4 DFT DFT Fast Fourier Transform: FFT 4.1 DFT IDFT X(k) = 1 n=0 x(n)e j2πkn (4.1) 1 x(n) = 1 X(k)e j2πkn (4.2) k=0 x(n) X(k) DFT 2 ( 1) 2 4 2 2(2 1) 2 O( 2 ) 4.2 FFT 4.2.1 radix2 FFT 1 (4.1) 86 4. X(0)
More informationインテル(R) Visual Fortran Composer XE 2011 Windows版 入門ガイド
Visual Fortran Composer XE 2011 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.0 (2010/12/20) Copyright 1998-2011 XLsoft Corporation. All Rights Reserved. 1 / 36 ... 3... 4... 5... 7 /... 7... 9 /... 9...
More information数値計算:フーリエ変換
( ) 1 / 72 1 8 2 3 4 ( ) 2 / 72 ( ) 3 / 72 ( ) 4 / 72 ( ) 5 / 72 sample.m Fs = 1000; T = 1/Fs; L = 1000; t = (0:L-1)*T; % Sampling frequency % Sample time % Length of signal % Time vector y=1+0.7*sin(2*pi*50*t)+sin(2*pi*120*t)+2*randn(size(t));
More information23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h
23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),
More informationMicrosoft PowerPoint - sales2.ppt
最適化とは何? CPU アーキテクチャに沿った形で最適な性能を抽出できるようにする技法 ( 性能向上技法 ) コンパイラによるプログラム最適化 コンパイラメーカの技量 経験量に依存 最適化ツールによるプログラム最適化 KAP (Kuck & Associates, Inc. ) 人によるプログラム最適化 アーキテクチャのボトルネックを知ること 3 使用コンパイラによる性能の違い MFLOPS 90
More informationフカシギおねえさん問題の高速計算アルゴリズム
JST ERATO 2013/7/26 Joint work with 1 / 37 1 2 3 4 5 6 2 / 37 1 2 3 4 5 6 3 / 37 : 4 / 37 9 9 6 10 10 25 5 / 37 9 9 6 10 10 25 Bousquet-Mélou (2005) 19 19 3 1GHz Alpha 8 Iwashita (Sep 2012) 21 21 3 2.67GHz
More informationuntitled
taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前
More informationSQUFOF NTT Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) N UBASIC 50 / 200 [
SQUFOF SQUFOF NTT 2003 2 17 16 60 Shanks SQUFOF SQUFOF Pentium III Pentium 4 SQUFOF 2.03 (Pentium 4 2.0GHz Willamette) 60 1 1.1 N 62 16 24 UBASIC 50 / 200 [ 01] 4 large prime 943 2 1 (%) 57 146 146 15
More informationEGunGPU
Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,
More informationストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出
SIMD 2(SSE2) / 2.0 2000 7 : 248602J-001 01/10/30 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999-2001 01/10/30 2 1...5 2...5 2.1...5 2.1.1...5 2.1.2...8 3...9 3.1...9 3.2...9 4...9
More information16soukatsu_p1_40.ai
2 2016 DATA. 01 3 DATA. 02 4 DATA. 03 5 DATA. 04 6 DATA. 05 7 DATA. 06 8 DATA. 07 9 DATA. 08 DATA. 09 DATA. 10 DATA. 11 DATA. 12 DATA. 13 DATA. 14 10 11 12 13 COLUMN 1416 17 18 19 DATA. 15 20 DATA. 16
More information01_OpenMP_osx.indd
OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS
More informationDO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速
1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES
More information<B54CB5684E31A4E9C0CBA4E5AA6BC160BEE3B27AA544A5552E706466>
N1 2 3 1 16 17 18 19 20 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 4 2 38 39 40 41 42 44 45 46 47 48 50 51 52 53 54 55 56 57 58 59 60 61 5 3 62 63 64 65 66 68 69 70 70 72 74 75 76 77 78 80 81 82 83
More informationdevelop
SCore SCore 02/03/20 2 1 HA (High Availability) HPC (High Performance Computing) 02/03/20 3 HA (High Availability) Mail/Web/News/File Server HPC (High Performance Computing) Job Dispatching( ) Parallel
More informationuntitled
PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit
More information2
( ) 1 2 3 1.CPU, 2.,,,,,, 3. register, register, 4.L1, L2, (L3), (L4) 4 register L1 cache L2 cache Main Memory,, L2, L1 CPU L2, L1, CPU 5 , 6 dgem2vu 7 ? Wiedemann algorithm u 0, w 0, s i, s i = u 0 Ai
More information倍々精度RgemmのnVidia C2050上への実装と応用
.. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,
More informationmate10„”„õŒì4
2002.10 1 2 3 4 2 LINE UP 31w 79w 3 4 LINE UP Windows XP Windows 98 Pentium 1.70GHz Pentium 1.80GHz Pentium 2A GHz Pentium 2.40GHz Pentium 2.53GHz 0 50 100 150 200 250 Processor:Pentium 4 processor 1.50
More information., White-Box, White-Box. White-Box.,, White-Box., Maple [11], 2. 1, QE, QE, 1 Redlog [7], QEPCAD [9], SyNRAC [8] 3 QE., 2 Brown White-Box. 3 White-Box
White-Box Takayuki Kunihiro Graduate School of Pure and Applied Sciences, University of Tsukuba Hidenao Iwane ( ) / Fujitsu Laboratories Ltd. / National Institute of Informatics. Yumi Wada Graduate School
More information卒業論文
PC OpenMP SCore PC OpenMP PC PC PC Myrinet PC PC 1 OpenMP 2 1 3 3 PC 8 OpenMP 11 15 15 16 16 18 19 19 19 20 20 21 21 23 26 29 30 31 32 33 4 5 6 7 SCore 9 PC 10 OpenMP 14 16 17 10 17 11 19 12 19 13 20 1421
More information2012年度HPCサマーセミナー_多田野.pptx
! CCS HPC! I " tadano@cs.tsukuba.ac.jp" " 1 " " " " " " " 2 3 " " Ax = b" " " 4 Ax = b" A = a 11 a 12... a 1n a 21 a 22... a 2n...... a n1 a n2... a nn, x = x 1 x 2. x n, b = b 1 b 2. b n " " 5 Gauss LU
More informationXcalableMP入門
XcalableMP 1 HPC-Phys@, 2018 8 22 XcalableMP XMP XMP Lattice QCD!2 XMP MPI MPI!3 XMP 1/2 PCXMP MPI Fortran CCoarray C++ MPIMPI XMP OpenMP http://xcalablemp.org!4 XMP 2/2 SPMD (Single Program Multiple Data)
More informationOpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a))
OpenMP (1) 1, 12 1 UNIX (FUJITSU GP7000F model 900), 13 1 (COMPAQ GS320) FUJITSU VPP5000/64 1 (a) (b) 1: ( 1(a)) E-mail: {nanri,amano}@cc.kyushu-u.ac.jp 1 ( ) 1. VPP Fortran[6] HPF[3] VPP Fortran 2. MPI[5]
More information単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~
CPU ICT mizutani@ic.daito.ac.jp 2014 SI: Systèm International d Unités SI SI 10 1 da 10 1 d 10 2 h 10 2 c 10 3 k 10 3 m 10 6 M 10 6 µ 10 9 G 10 9 n 10 12 T 10 12 p 10 15 P 10 15 f 10 18 E 10 18 a 10 21
More informationimai@eng.kagawa-u.ac.jp No1 No2 OS Wintel Intel x86 CPU No3 No4 8bit=2 8 =256(Byte) 16bit=2 16 =65,536(Byte)=64KB= 6 5 32bit=2 32 =4,294,967,296(Byte)=4GB= 43 64bit=2 64 =18,446,744,073,709,551,615(Byte)=16EB
More information211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G
211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double
More information. (.8.). t + t m ü(t + t) + c u(t + t) + k u(t + t) = f(t + t) () m ü f. () c u k u t + t u Taylor t 3 u(t + t) = u(t) + t! u(t) + ( t)! = u(t) + t u(
3 8. (.8.)............................................................................................3.............................................4 Nermark β..........................................
More informationスパコンに通じる並列プログラミングの基礎
2016.06.06 2016.06.06 1 / 60 2016.06.06 2 / 60 Windows, Mac Unix 0444-J 2016.06.06 3 / 60 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 0444-J 2016.06.06 4 / 60 ( : ) 6 6 ( ) 6 10 6 16 SX-ACE 6 17
More informationRaVioli SIMD
RaVioli SIMD 17 17115074 i RaVioli SIMD PC PC PC PC CPU RaVioli RaVioli CPU RaVioli CPU SIMD RaVioli RaVioli SIMD RaVioli SIMD RaVioli SIMD 1 1 2 RaVioli 2 2.1 RaVioli.......................................
More informationVXPRO R1400® ご提案資料
Intel Core i7 プロセッサ 920 Preliminary Performance Report ノード性能評価 ノード性能の評価 NAS Parallel Benchmark Class B OpenMP 版での性能評価 実行スレッド数を 4 で固定 ( デュアルソケットでは各プロセッサに 2 スレッド ) 全て 2.66GHz のコアとなるため コアあたりのピーク性能は同じ 評価システム
More informationVol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c
Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP
More informationIPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP
Android 1 1 1 1 1 Dominic Hillenbrand 1 1 1 ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GPIO API GPIO API GPIO MPEG2 Optical Flow MPEG2 1PE 0.97[W] 0.63[W] 2PE 1.88[w] 0.46[W] 3PE 2.79[W] 0.37[W] Optical
More informationCCS HPCサマーセミナー 並列数値計算アルゴリズム
大規模系での高速フーリエ変換 2 高橋大介 daisuke@cs.tsukuba.ac.jp 筑波大学計算科学研究センター 2016/6/2 計算科学技術特論 B 1 講義内容 並列三次元 FFT における自動チューニング 二次元分割を用いた並列三次元 FFT アルゴリズム GPU クラスタにおける並列三次元 FFT 2016/6/2 計算科学技術特論 B 2 並列三次元 FFT における 自動チューニング
More informationスパコンに通じる並列プログラミングの基礎
2018.09.10 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 1 / 59 furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 2 / 59 Windows, Mac Unix 0444-J furihata@cmc.osaka-u.ac.jp ( ) 2018.09.10 3 / 59 Part I Unix GUI CUI:
More information橡3_2石川.PDF
PC RWC 01/10/31 2 1 SCore 1,024 PC SCore III PC 01/10/31 3 SCore SCore Aug. 1995 Feb. 1996 Oct. 1996 1997-1998 Oct. 1999 Oct. 2000 April. 2001 01/10/31 4 2 SCore University of Bonn, Germany University
More informationスパコンに通じる並列プログラミングの基礎
2018.06.04 2018.06.04 1 / 62 2018.06.04 2 / 62 Windows, Mac Unix 0444-J 2018.06.04 3 / 62 Part I Unix GUI CUI: Unix, Windows, Mac OS Part II 2018.06.04 4 / 62 0444-J ( : ) 6 4 ( ) 6 5 * 6 19 SX-ACE * 6
More informationUntitled
VASP 2703 2006 3 VASP 100 PC 3,4 VASP VASP VASP FFT. (LAPACK,BLAS,FFT), CPU VASP. 1 C LAPACK,BLAS VASP VASP VASP VASP bench.hg VASP CPU CPU CPU northwood LAPACK lmkl lapack64, BLAS lmkl p4 LA- PACK liblapack,
More information( ) 1
/ (2014 04 09 ) E-mail: sekido@amp.i.kyoto-u.ac.jp (2014 04 09 ) 1 3 20%7528 50%15 30% 1572372888 (2014 04 09 ) 2 http://www-is.amp.i.kyoto-u.ac.jp/lab/sekido/ (2014 04 09 ) 3 (2014 04 09 ) 4 2006 IT J.
More information1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N
GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa
More information56 OS OS OS OS 1 OS HDD OS 1 OS HDD HDD OS OS OSOS HDD 図 1 二重キャッシュ環境 3. 負の参照の時間的局所性 3.1 参照の局所性 Locality of Reference Temporal locality Spatial localit
116 26 4 1 2 2 1 3 An Analysis of Locality of Reference in Virtualized Environment Hiroki SUGIMOTO 1, Kousuke TAKEUCHI 2, Kouya HINAGAWA 2 and Saneyasu YAMAGUCHI 1 3 Abstract As cloud computing has spread
More informationuntitled
c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright
More informationIntel Memory Protection Extensions(Intel MPX) x86, x CPU skylake 2015 Intel Software Development Emulator 本資料に登場する Intel は Intel Corp. の登録
Monthly Research Intel Memory Protection Extensions http://www.ffri.jp Ver 1.00.01 1 Intel Memory Protection Extensions(Intel MPX) x86, x86-64 2015 2 CPU skylake 2015 Intel Software Development Emulator
More informationDPD Software Development Products Overview
2 2007 Intel Corporation. Core 2 Core 2 Duo 2006/07/27 Core 2 precise VTune Core 2 Quad 2006/11/14 VTune Core 2 ( ) 1 David Levinthal 3 2007 Intel Corporation. PC Core 2 Extreme QX6800 2.93GHz, 1066MHz
More informationThe 3 key challenges in programming for MC
Aug 3 06 Software &Solutions group Intel Intel Centrino Intel NetBurst Intel XScale Itanium Pentium Xeon Intel Core VTune Intel Corporation Intel NetBurst Pentium Xeon Pentium M Core 64 2 Intel Software
More informationTHE PARALLEL Issue UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHug
THE PARALLEL Issue 5 2010 11 UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHugh JAMES REINDERS 3 Parallel Studio XE Cluster Studio
More information21 20 20413525 22 2 4 i 1 1 2 4 2.1.................................. 4 2.1.1 LinuxOS....................... 7 2.1.2....................... 10 2.2........................ 15 3 17 3.1.................................
More information(Basic Theory of Information Processing) 1
(Basic Theory of Information Processing) 1 10 (p.178) Java a[0] = 1; 1 a[4] = 7; i = 2; j = 8; a[i] = j; b[0][0] = 1; 2 b[2][3] = 10; b[i][j] = a[2] * 3; x = a[2]; a[2] = b[i][3] * x; 2 public class Array0
More informationnakao
Fortran+Python 4 Fortran, 2018 12 12 !2 Python!3 Python 2018 IEEE spectrum https://spectrum.ieee.org/static/interactive-the-top-programming-languages-2018!4 Python print("hello World!") if x == 10: print
More informationARTED Xeon Phi Xeon Phi 2. ARTED ARTED (Ab-initio Real-Time Electron Dynamics simulator) RTRS- DFT (Real-Time Real-Space Density Functional Theory, )
Xeon Phi 1,a) 1,3 2 2,3 Intel Xeon Phi PC RTRSDFT ( ) ARTED (Ab-initio Real-Time Electron Dynamics simulator) Xeon Phi OpenMP Intel E5-2670v2 (Ivy-Bridge 10 ) CPU Xeon Phi Symmetric CPU 32 1.68 Symmetric
More informationGPGPU
GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the
More informationPRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧
SPECint / SPECfp 測定結果一覧 しおり より 測定結果を確認したいモデル名を選択してください 07 年 6 月 8 日更新 分類 モデル名 更新日 前版からの変更 ラックサーバ RX00 S7 (0 年 5 月以降発表モデル ) 0 年 0 月 3 日 RX00 S7 (0 年 6 月発表モデル ) RX00
More information07-二村幸孝・出口大輔.indd
GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia
More informationSource: Intel.Config: Pentium III Processor-Intel Seattle SE440BX-2, 128MB PC100 CL2 SDRAM Intel 440BX-2 Chipset Platform- Diamond Viper 550 /
2002.1 4 1 2 3 Source: Intel.Config: Pentium III Processor-Intel Seattle SE440BX-2, 128MB PC100 CL2 SDRAM Intel 440BX-2 Chipset Platform- Diamond Viper 550 / nvidia TNT 2x AGP with 16MB memory, nvidia
More informationContents Windows* /Linux* C++/Fortran... 3 Microsoft* embedded Visual C++* C Microsoft* Windows* CE.NET Platform Builder C IP
Windows*/Linux* VTune TM Contents... 1... 2 Windows* /Linux* C++/Fortran... 3 Microsoft* embedded Visual C++* C++... 9 Microsoft* Windows* CE.NET Platform Builder C++... 11 IPP... 13 PCA IPP... 15 GPP...
More information4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司
4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司 3 1 1 日本原子力研究開発機構システム計算科学センター 2 理科学研究所計算科学研究機構 3 東京大学新領域創成科学研究科
More informationÊÂÎó·×»»¤È¤Ï/OpenMP¤Î½éÊâ¡Ê£±¡Ë
2015 5 21 OpenMP Hello World Do (omp do) Fortran (omp workshare) CPU Richardson s Forecast Factory 64,000 L.F. Richardson, Weather Prediction by Numerical Process, Cambridge, University Press (1922) Drawing
More informationスライド 1
swk(at)ic.is.tohoku.ac.jp 2 Outline 3 ? 4 S/N CCD 5 Q Q V 6 CMOS 1 7 1 2 N 1 2 N 8 CCD: CMOS: 9 : / 10 A-D A D C A D C A D C A D C A D C A D C ADC 11 A-D ADC ADC ADC ADC ADC ADC ADC ADC ADC A-D 12 ADC
More information16.16%
2017 (411824) 16.16% Abstract Multi-core processor is common technique for high computing performance. In many multi-core processor architectures, all processors share L2 and last level cache memory. Thus,
More informationohp1.dvi
2008 1 2008.10.10 1 ( 2 ) ( ) ( ) 1 2 1.5 3 2 ( ) 50:50 Ruby ( ) Ruby http://www.ruby-lang.org/ja/ Windows Windows 3 Web Web http://lecture.ecc.u-tokyo.ac.jp/~kuno/is08/ / ( / ) / @@@ ( 3 ) @@@ :!! ( )
More information名称未設定-1
Storyteller Software Quick Manual Storyteller Software Version 3.0.6 (0 年 3 月 5 日 現 在 ) 0.3.4 Storyteller 3 4 5 6 7 8 9 0 3 4 Storyteller POINT 3 5 3 6 7 3 4 8 3 9 3 4 5 0 POINT POINT 3 3 3 Storyteller
More information1 Fourier Fourier Fourier Fourier Fourier Fourier Fourier Fourier Fourier analog digital Fourier Fourier Fourier Fourier Fourier Fourier Green Fourier
Fourier Fourier Fourier etc * 1 Fourier Fourier Fourier (DFT Fourier (FFT Heat Equation, Fourier Series, Fourier Transform, Discrete Fourier Transform, etc Yoshifumi TAKEDA 1 Abstract Suppose that u is
More informationPentium 4
Pentium 4 Pentium 4... 2... 2... 2... 3... 3... 3... 3... 4 TMPGEnc Plus2.5 Ver.2.59... 5... 8... 9... 9 VTune TM... 9 C++/Fortran... 9 1 Pentium 4 HT Xeon TM Pentium 4 3.06GHz HT Pentium 4 NetBurst TM
More informationPRIMERGY 性能情報 SPECint2006 / SPECfp2006 測定結果一覧
SPECint / SPECfp 測定結果一覧 しおり より 測定結果を確認したいモデル名を選択してください 07 年 8 月 30 日更新 分類 モデル名 更新日 前版からの変更 ラックサーバ RX00 S7 (0 年 5 月以降発表モデル ) 0 年 0 月 3 日 RX00 S7 (0 年 6 月発表モデル ) RX00
More informationProLiant ML110 Generation 4 システム構成図
HP ProLiant ML110 Generation 5 2010 4 16 1 OVERVIEW ProLiant ML110 Generation 5 ProLiant ML110 Generation 5 1, 2 LED LED ( ) ( ) ( ) Lights-Out 100c ( ) 2 3 6 USB SATA ML110 G5 ProLiant ML110 G5 SATA /
More informationsmpp_resume.dvi
6 mmiki@mail.doshisha.ac.jp Parallel Processing Parallel Pseudo-parallel Concurrent 1) 1/60 1) 1997 5 11 IBM Deep Blue Deep Blue 2) PC 2000 167 Rank Manufacturer Computer Rmax Installation Site Country
More informationRun-Based Trieから構成される 決定木の枝刈り法
Run-Based Trie 2 2 25 6 Run-Based Trie Simple Search Run-Based Trie Network A Network B Packet Router Packet Filtering Policy Rule Network A, K Network B Network C, D Action Permit Deny Permit Network
More information先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo
CPU, CPU, Memory-bound CPU,, Memory-bound ( ) Performance Monitoring Counter(PMC), PMC (nmi watchdog), PMC CPU., PMC, CPU, Memory-bound, CPU-bound,, CPU,, PMC,,,, CPU, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS
More information[1] [2] [3] (RTT) 2. Android OS Android OS Google OS 69.7% [4] 1 Android Linux [5] Linux OS Android Runtime Dalvik Dalvik UI Application(Home,T
LAN Android Transmission-Control Middleware on multiple Android Terminals in a WLAN Environment with consideration of Round Trip Time Ai HAYAKAWA, Saneyasu YAMAGUCHI, and Masato OGUCHI Ochanomizu University
More informationAMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted
DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has
More informationテストコスト抑制のための技術課題-DFTとATEの観点から
2 -at -talk -talk -drop 3 4 5 6 7 Year of Production 2003 2004 2005 2006 2007 2008 Embedded Cores Standardization of core Standard format Standard format Standard format Extension to Extension to test
More informationOpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë
2012 5 24 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) PU PU PU 2 16 OpenMP FORTRAN/C/C++ MPI OpenMP 1997 FORTRAN Ver. 1.0 API 1998 C/C++ Ver. 1.0 API 2000 FORTRAN
More information並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. このサンプルページの内容は, 初版 1 刷発行時のものです.
並列計算の数理とアルゴリズム サンプルページ この本の定価 判型などは, 以下の URL からご覧いただけます. http://www.morikita.co.jp/books/mid/080711 このサンプルページの内容は, 初版 1 刷発行時のものです. Calcul scientifique parallèle by Frédéric Magoulès and François-Xavier
More informationIntel® Compilers Professional Editions
2007 6 10.0 * 10.0 6 5 Software &Solutions group 10.0 (SV) C++ Fortran OpenMP* OpenMP API / : 200 C/C++ Fortran : OpenMP : : : $ cat -n main.cpp 1 #include 2 int foo(const char *); 3 int main()
More informationOpenMP¤òÍѤ¤¤¿ÊÂÎó·×»»¡Ê£±¡Ë
2011 5 26 scalar Open MP Hello World Do (omp do) (omp workshare) (shared, private) π (reduction) scalar magny-cours, 48 scalar scalar 1 % scp. ssh / authorized keys 133. 30. 112. 246 2 48 % ssh 133.30.112.246
More informationインテル(R) C++ Composer XE 2011 Windows版 入門ガイド
C++ Composer XE 2011 Windows* エクセルソフト株式会社 www.xlsoft.com Rev. 1.2 (2011/05/03) Copyright 1998-2011 XLsoft Corporation. All Rights Reserved. 1 / 70 ... 4... 5... 6... 8 /... 8... 10 /... 11... 11 /... 13
More informationHP Compaq Business Desktop dx7300シリーズ
本カタログは 旧製品もしくはすでに販売終了した製品のカタログです 最新版のカタログ 現在販売している製品のカタログは下記サイトにございます www.hp.com/jp/catalog その他ご不明な点は下記お問い合わせ窓口までご連絡ください HP Directplus 9 00 19 00 5/1 10 00 17 00 03-6416-6222 HP 9 00 19 00 10 00 17 00
More information自動残差修正機能付き GBiCGSTAB$(s,L)$法 (科学技術計算アルゴリズムの数理的基盤と展開)
1733 2011 149-159 149 GBiCGSTAB $(s,l)$ GBiCGSTAB(s,L) with Auto-Correction of Residuals (Takeshi TSUKADA) NS Solutions Corporation (Kouki FUKAHORI) Graduate School of Information Science and Technology
More information120802_MPI.ppt
CPU CPU CPU CPU CPU SMP Symmetric MultiProcessing CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU CP OpenMP MPI MPI CPU CPU CPU CPU CPU CPU CPU CPU CPU CPU MPI MPI+OpenMP CPU CPU CPU CPU CPU CPU CPU CP
More informationHPC (pay-as-you-go) HPC Web 2
,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570
More informationSonicStage Ver. 2.0
3-263-346-01(1) SonicStage Ver. 2.0 SonicStage SonicStage 2004 Sony Corporation Windows SonicStage OpenMG Net MD ATRAC ATRAC3 ATRAC3plus Microsoft Windows Windows NT Windows Media Microsoft Corporation
More information次世代スーパーコンピュータのシステム構成案について
6 19 4 27 1. 2. 3. 3.1 3.2 A 3.3 B 4. 5. 2007/4/27 4 1 1. 2007/4/27 4 2 NEC NHF2 18 9 19 19 2 28 10PFLOPS2.5PB 30MW 3,200 18 12 12 SimFold, GAMESS, Modylas, RSDFT, NICAM, LatticeQCD, LANS HPL, NPB-FT 19
More informationtutorial_lc.dvi
00 Linux v.s. RT Linux v.s. ART-Linux Linux RT-Linux ART-Linux Linux kumagai@emura.mech.tohoku.ac.jp 1 1.1 Linux Yes, No.,. OS., Yes. Linux,.,, Linux., Linux.,, Linux. Linux.,,. Linux,.,, 0..,. RT-Linux
More informationSlides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];
More informationmain.dvi
y () 5 C Fortran () Fortran 32bit 64bit 2 0 1 2 1 1bit bit 3 0 0 2 1 3 0 1 2 1 bit bit byte 8bit 1byte 3 0 10010011 2 1 3 0 01001011 2 1 byte Fortran A A 8byte double presicion y ( REAL*8) A 64bit 4byte
More informationiphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU
More informationMicrosoft PowerPoint - stream.ppt [互換モード]
STREAM 1 Quad Opteron: ccnuma Arch. AMD Quad Opteron 2.3GHz Quad のソケット 4 1 ノード (16コア ) 各ソケットがローカルにメモリを持っている NUMA:Non-Uniform Access ローカルのメモリをアクセスして計算するようなプログラミング, データ配置, 実行時制御 (numactl) が必要 cc: cache-coherent
More information1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)
GNU MP BNCpack tkouya@cs.sist.ac.jp 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #
More informationProLiant BL20p Generation 4 システム構成図
HP ProLiant BL p-class Server BL20p Generation 4 2007 11 15 1 OVERVIEW ProLiantBL20p Generation 4 HP BladeSystem p-class Hardware Component BladeSystem p-class BladeSystem p-class BladeSystem p-class ()
More information