untitled
|
|
- ためひと うすい
- 5 years ago
- Views:
Transcription
1 c NUMA (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited
2 Intel Xeon X5460 Harpertown CPU 2 CPU 4 1 8(=2 4 1) 2 2-way Intel Xeon X5460 NUMA UMA (Uniform memory access) 2 UMA 3 NUMA UMA 2 CPU Intel Xeon X CPU CPU (RAM) RAM NUMA NUMA NUMA CPU NUMA 3 CPU Intel Xeon E NUMA 4 4. STREAM 1 1 STREAM: Sustainable Memory Bandwidth in High Performance Computers stream/ Intel Xeon E SandyBridge-EP CPU 4 CPU (= 4 8 2) 3 4-way Intel Xeon E STREAM 4 1 Triad n a, b, c R n r a b + rc 1 bytes 4 OpenMP Triad C/C++ 4 OpenMP Triad 5 4-way Intel Xeon E n n = {2 10,...,2 30 } Triad (GB/s) 2 20 STREAM 20 16, 32, 64 95, 98, 92 GB/s Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.
3 5 STREAM TRIAD 2 Hyper-threading Linux numactl NUMA numactl NUMA node NUMA node 3 --physcpubind --membind NUMA ID NUMA ID Linux /proc/cpuinfo processor ID physical id NUMA ID Portable Hardware Locality (HWLOC) [1] 6 n = {2 10, 2 11,...,2 30 } NUMA NUMA NUMA Triad NUMA 6 NUMA 0 NUMA (GB/s) 12 GB/s NUMA 3GB/s NUMA 1/4 4.2 numactl --localalloc 32 NUMA 0, KBytes NUMA NUMA numactl --interleave 32 NUMA 0, Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited
4 NUMA 0, 1 4 Local allocation 5. NUMA 4.4 7(a) 7(b) STREAM TRIAD (GB/s) NUMA 1, 2, 4 n = {2 10,...,2 30 } NUMA (Local-allocation) (Interleaving) 2 20 NUMA Local allocation 13 GB/s, 21 GB/s, 24 GB/s Interleaving 13 GB/s, 6 GB/s, 8 GB/s Interleaving TRIAD STREAM 4 Local allocation Interleaving 6 NUMA numactl Linux sched_setaffinity() sched_getaffinity() mbind() sched_setaffinity() sched_setaffinity() mbind() NUMA NUMA 5.1 STREAM TRIAD TRIAD a, b, c 1 NUMA 7 STREAM TRIAD: (GB/s) Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.
5 8 TRIAD 9 NUMA 8 TRIAD NUMA 1, 2, 4 24, 48, 96 GB/s (Breath-first search; BFS) BFS G =(V,E) n = V m = E O(n + m) HPC Graph500 1 Graph500 2 BFS SCALE edgefactor =m/n 16 (a) (b) (c) (a) n=2 SCALE m=n edgefactor Kronecker graph (b) (c) 64 BFS 1 (traversed edges per second; TEPS) (c) 64 TEPS Green Graph500 3 Graph500 TEPS TEPS/W 9 1 BFS (Level) Level-synchronized BFS Beamer [3] Top-down Bottom-up Small-world Top-down Bottom-up Beamer Kronecker graph 4-way Intel Xeon E GTEPS (10 9 TEPS) NUMA GTEPS [4] Bottom-up Small-world 2.68 [5] [4, 5] CSR (Compressed Sparse Row) 2 Graph500: 3 Green Graph500: Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited
6 1 (n, m) TEPS Madduri Cray MTA-2 (40 procs) (2 21,2 30 ) 0.5 G Agarwal [2] Intel Xeon X (2 20,2 26 ) 1.3 G Beamer [3] Intel Xeon E (2 28,2 32 ) 5.1 G Yasui [4] Intel Xeon E (2 26,2 30 ) 11.1 G Yasui [5] Intel Xeon E (2 27,2 31 ) 29.0 G V k = { [ )} kn (k +1)n v j V j, l l A Top-down v V A F k (v) Bottom-up w V k A B k (w) l 1 A F k (v) A B k (w) A F k (v)={w w {V k A(v)}}, v V, A B k (w)={v v A(w)}, w V k. NUMA Graph NUMA BFS Graph500 10(a) NUMA 10(b) NUMA l G l {G k}, (k = {0, 1,...,l 1}) NUMA k V k A k V k SGI UV Kronecker GTEPS Green Graph Big Data category 4-way Intel Xeon E , GTEPS 59.1 MTEPS/W 1 UV SDPARA (SemiDefinite Programming Algorithm PARAllel version) [6] SDPA (Semidefinite Programming Algorithms) ZDD (Zero-suppressed decision diagram) [7] [8] NUMA ULIBC (Ubiquity Library for Intelligently Binding Cores) 4 jun isc.php Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.
7 6. NUMA NUMA (JST) CREST SGI Silicon Graphics International Corp. [1] F. Broquedis, J. Clet-Ortega, S. Moreaud, N. Furmento, B. Goglin, G. Mercier, S. Thibault and R. Namyst, hwloc: A generic framework for managing hardware affinities in HPC applications, Proc. IEEE Int. Conf. PDP2010, [2] V. Agarwal, F. Petrini, D. Pasetto and D. A. Bader, Scalable graph exploration on multicore processors, Proc. ACM/IEEE Int. Conf. SC10, [3] S. Beamer, K. Asanović and D. A. Patterson, Direction-optimizing breadth-first search, Proc. ACM/IEEE Int. Conf. SC12, [4] Y. Yasui, K. Fujisawa and K. Goto, NUMAoptimized parallel breadth-first search on multicore single-node system, Proc. IEEE Int. Conf. BigData 2013, [5] Y. Yasui, K. Fujisawa and Y. Sato, Fast and energy-efficient breadth-first search on a single NUMA system, Proc. IEEE Int. Conf. ISC 14, [6] K. Fujisawa, T. Endo, Y. Yasui, H. Sato, N. Matsuzawa, S. Matsuoka and H. Waki, Peta-scale general solver for semidefinite programming problems with over two million constraints, Proc. IEEE Int. Conf. IPDPS 2014, [7] ULIBC 2014 (HPCS2014) HPCS [8]Y.Yasui,K.Fujisawa,K.Goto,N.Kamiyamaand M. Takamatsu, NETAL: High-performance implementation of network analysis library considering computer memory hierarchy, J. Oper. Res. Soc. Jpn., 54, , Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited
メモリ階層構造を考慮した大規模グラフ処理の高速化
, CREST ERATO 0.. (, CREST) ERATO / 8 Outline NETAL (NETwork Analysis Library) NUMA BFS raph500, reenraph500 Kronecker raph Level Synchronized parallel BFS Hybrid Algorithm for Parallel BFS NUMA Hybrid
More informationuntitled
c 816 Web 1. 30 [1] [2] [3, 4] [5] 10 [6] 185 8540 2 8 38 [5] [5, 7] [5] 3 (1) 608 18 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. (2) (3) 2. 2.1 Web 2014 1 2013 12 2.2
More informationCPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2
FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT
More informationuntitled
c Society5.0 Society5.0 Society5.0 Society5.0 2017 Society5.0 SDGs SIP PRISM Society5.0 2017 SIP ImPACT PRISM SDGs 1. Society5.0 2016 9 Society5.0 OR [1] Society5.0 2. Society5.0 2.1 Society5.0 Society5.0
More informationuntitled
A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }
More informationMicrosoft PowerPoint - stream.ppt [互換モード]
STREAM 1 Quad Opteron: ccnuma Arch. AMD Quad Opteron 2.3GHz Quad のソケット 4 1 ノード (16コア ) 各ソケットがローカルにメモリを持っている NUMA:Non-Uniform Access ローカルのメモリをアクセスして計算するようなプログラミング, データ配置, 実行時制御 (numactl) が必要 cc: cache-coherent
More informationuntitled
c Twitter 1. Twitter 140 SNS 1,392 Facebook 2 14 [4]. 2011 Twitter 58 1 [1]. Twitter Twitter [4] Twitter SNS [5]. [1]. 432 8561 3 5 1 13.5.22 14.2.10 [6] Web [13] SIR [10] SIR SIR 2 2014 4 Copyright c
More informationVol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c
Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP
More informationor58_8_455.dvi
c Voice of CustomerVOC CS VOC Facebook Twitter SNS VOC SNS 1 VOC 1. WEB 2. 151 8583 2 2 1 VOC SNS 3. 3.1 4 (1) FAX (2) HP Twitter (3) (4) (1) (3) (4) 1 WEB 2013 8 Copyright c by ORSJ. Unauthorized reproduction
More informationor57_12_673.dvi
c ID ID ID 1 POS ID-POS ID-POS ID ID RFM LTV 8 7 ID KPI ID ID-POS ID KPI 1. ID 1.1 ID ID ID-POS ID IC ID IC ID nanaco Edy Suica PASMO IC 1 ID ID 100 0005 1 6 5 ID ID POS Web IC 1 ID 1 Twitter Facebook
More information2 HI LO ZDD 2 ZDD 2 HI LO 2 ( ) HI (Zero-suppress ) Zero-suppress ZDD ZDD Zero-suppress 1 ZDD abc a HI b c b Zero-suppress b ZDD ZDD 5) ZDD F 1 F = a
ZDD 1, 2 1, 2 1, 2 2 2, 1 #P- Knuth ZDD (Zero-suppressed Binary Decision Diagram) 2 ZDD ZDD ZDD Knuth Knuth ZDD ZDD Path Enumeration Algorithms Using ZDD and Their Performance Evaluations Toshiki Saitoh,
More informationIPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1
SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani
More information,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation
1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1
More information1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N
GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa
More informationsoturon.dvi
12 Exploration Method of Various Routes with Genetic Algorithm 1010369 2001 2 5 ( Genetic Algorithm: GA ) GA 2 3 Dijkstra Dijkstra i Abstract Exploration Method of Various Routes with Genetic Algorithm
More information07-二村幸孝・出口大輔.indd
GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia
More informationスライド 1
swk(at)ic.is.tohoku.ac.jp 2 Outline 3 ? 4 S/N CCD 5 Q Q V 6 CMOS 1 7 1 2 N 1 2 N 8 CCD: CMOS: 9 : / 10 A-D A D C A D C A D C A D C A D C A D C ADC 11 A-D ADC ADC ADC ADC ADC ADC ADC ADC ADC A-D 12 ADC
More informationor58_11_651.dvi
c 1. 2. 480 1195 1 1 OECD 2010 [1] 33 OECD 2009 3,265 3,035 8,233 913 1 1 4 OECD 2 3 OECD 1,000 2.2 OECD 3.1 34 5 OECD 14 [2] 1 2013 11 Copyright c by ORSJ. Unauthorized reproduction of this article is
More information\\ \Data_in4\TeX\OR\63-7\07\or63_7_401.dvi
c CO 2 2 CO 2 CO 2 CO 2 IPCC 1. CO 2 2015 400 ppm CO 2 CO 2 2 2.5 16.2 8.2 [1] CO 2 305 0005 1 1 1 3F 1134 mamoru@sk.tsukuba.ac.jp 206 000 626 2 2 507 brother.hide10@gmail.com 305 005 1 1 1 IIIS 4F kojima.kazunori.ga@un.tsukuba.ac.jp
More informationGPGPU
GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the
More informationLinux @ S9 @ CPU #0 CPU #1 FIB Table Neighbor Table 198.51.100.0/24 fe540072d56f 203.0.113.0/24 fe54003c1fb2 TX Ring TX Ring TX Buf. Dsc. RX Buf. Dsc. TX Buf. Dsc. RX Buf. Dsc. Packet NIC #0 NIC #1 CPU
More informationuntitled
PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit
More information211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G
211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS211 211/1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD 4 4 8 quad-double
More informationHP High Performance Computing(HPC)
ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4
More informationworkshop Eclipse TAU AICS.key
11 AICS 2016/02/10 1 Bryzgalov Peter @ HPC Usability Research Team RIKEN AICS Copyright 2016 RIKEN AICS 2 3 OS X, Linux www.eclipse.org/downloads/packages/eclipse-parallel-application-developers/lunasr2
More information単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~
CPU ICT mizutani@ic.daito.ac.jp 2014 SI: Systèm International d Unités SI SI 10 1 da 10 1 d 10 2 h 10 2 c 10 3 k 10 3 m 10 6 M 10 6 µ 10 9 G 10 9 n 10 12 T 10 12 p 10 15 P 10 15 f 10 18 E 10 18 a 10 21
More informationuntitled
c OR&SA OR&SA 2 OR&SA (Polarity) OR&SA 1. 1) 2) OR&SA 2 3) 2 OR&SA 2014 7 [1] 1) 2) 3) 153 8648 2 2 1 4) 5) 6) 1980 1990 2000 2. 234 36 Copyright c by ORSJ. Unauthorized reproduction of this article is
More informationFINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012
FINAL PROGRAM 25th Annual Workshop SWoPP 2012 2012 / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 8 1 ( ) 8 3 ( ) 680-0017 101-5 http://www.torikenmin.jp/kenbun/
More information09中西
PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)
More informationuntitled
OS 2007/4/27 1 Uni-processor system revisited Memory disk controller frame buffer network interface various devices bus 2 1 Uni-processor system today Intel i850 chipset block diagram Source: intel web
More informationPublish/Subscribe KiZUNA P2P 2 Publish/Subscribe KiZUNA 2. KiZUNA 1 Skip Graph BF Skip Graph BF Skip Graph Skip Graph Skip Graph DDLL 2.1 Skip Graph S
KiZUNA: P2P 1,a) 1 1 1 P2P KiZUNA KiZUNA Pure P2P P2P 1 Skip Graph ALM(Application Level Multicast) Pub/Sub, P2P Skip Graph, Bloom Filter KiZUNA: An Implementation of Distributed Microblogging Service
More informationuntitled
c 645 2 1. GM 1959 Lindsey [1] 1960 Howard [2] Howard 1 25 (Markov Decision Process) 3 3 2 3 +1=25 9 Bellman [3] 1 Bellman 1 k 980 8576 27 1 015 0055 84 4 1977 D Esopo and Lefkowitz [4] 1 (SI) Cover and
More informationi
24 19 19115096 i 1 1 2 2 2.1..................................... 2 2.2....................... 3 2.3................................... 3 2.3.1.................. 4 2.4............................... 4
More information12 DCT A Data-Driven Implementation of Shape Adaptive DCT
12 DCT A Data-Driven Implementation of Shape Adaptive DCT 1010431 2001 2 5 DCT MPEG H261,H263 LSI DDMP [1]DDMP MPEG4 DDMP MPEG4 SA-DCT SA-DCT DCT SA-DCT DDMP SA-DCT MPEG4, DDMP,, SA-DCT,, ο i Abstract
More information01_OpenMP_osx.indd
OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS
More informationGPU n Graphics Processing Unit CG CAD
GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac
More informationB 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1
TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU 2010 11 TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME 1.0 30 2.4PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of
More information[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP
InfiniBand ACP 1,5,a) 1,5,b) 2,5 1,5 4,5 3,5 2,5 ACE (Advanced Communication for Exa) ACP (Advanced Communication Primitives) HPC InfiniBand ACP InfiniBand ACP ACP InfiniBand Open MPI 20% InfiniBand Implementation
More information橡3_2石川.PDF
PC RWC 01/10/31 2 1 SCore 1,024 PC SCore III PC 01/10/31 3 SCore SCore Aug. 1995 Feb. 1996 Oct. 1996 1997-1998 Oct. 1999 Oct. 2000 April. 2001 01/10/31 4 2 SCore University of Bonn, Germany University
More information最新Linuxデバイスドライバ開発応用-修正版-PDF.PDF
Linux Kernel Conference 2004 Linux - / - info@devdrv.co.jp 2004/10/14 Device Drivers Limited 1 Device Drivers Limited 2 IF Device Drivers Limited 3 Linux Device Drivers Limited 4 2.6 2.6 2.6 Device Drivers
More informationCisco 1711/1712セキュリティ アクセス ルータの概要
CHAPTER 1 Cisco 1711/1712 Cisco 1711/1712 Cisco 1711/1712 1-1 1 Cisco 1711/1712 Cisco 1711/1712 LAN Cisco 1711 1 WIC-1-AM WAN Interface Card WIC;WAN 1 Cisco 1712 1 ISDN-BRI S/T WIC-1B-S/T 1 Cisco 1711/1712
More information1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU
GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD
More informationuntitled
c 2020 70 800 1. 1 1 1,600 1 [1, 2] 112 8551 1 13 27 taguchi@ise.chuo-u.ac.jp 1 1.1 17 [1] 14 30 1,600 1.2 [2] 54 37 2017 1 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.
More informationdevelop
SCore SCore 02/03/20 2 1 HA (High Availability) HPC (High Performance Computing) 02/03/20 3 HA (High Availability) Mail/Web/News/File Server HPC (High Performance Computing) Job Dispatching( ) Parallel
More informationuntitled
1 4 4 6 8 10 30 13 14 16 16 17 18 19 19 96 21 23 24 3 27 27 4 27 128 24 4 1 50 by ( 30 30 200 30 30 24 4 TOP 10 2012 8 22 3 1 7 1,000 100 30 26 3 140 21 60 98 88,000 96 3 5 29 300 21 21 11 21
More informationbit bit bit VAST N d i d 1 <d 2 <...<d k <...<d N d k VAST d k 3 d k 3 d k 2 d k 1 d k 4 w w=4 ) HW HW 32bit γ δ [4] PForDelta [3] HW CPU VAST VAST VA
DEIM Forum 2013 F10-6 VAST CPU NTT, 180-0012 3-9-11 E-mail: {yamamuro.takeshi,onizuka.makoto,konishi.fumikazu}@lab.ntt.co.jp CPU HW HW HW VAST VAST SIMD CPU TLB bit VAST VAST VAST VAST CPU SIMD VAST-Tree
More informationFabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%
2013 (409812) FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076% Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture.
More informationShonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral
MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**
More informationポストペタスケール高性能計算に資するシステムソフトウェア技術の創出 平成 23 年度採択研究代表者 H27 年度 実績報告書 藤澤克樹 九州大学マス フォア インダストリ研究所 教授 ポストペタスケールシステムにおける超大規模グラフ最適化基盤 1. 研究実施体制 (1) 大規模最適化 グループ( 九
ポストペタスケール高性能計算に資するシステムソフトウェア技術の創出 平成 23 年度採択研究代表者 H27 年度 実績報告書 藤澤克樹 九州大学マス フォア インダストリ研究所 教授 ポストペタスケールシステムにおける超大規模グラフ最適化基盤 1. 研究実施体制 (1) 大規模最適化 グループ( 九州大学 ) 1 研究代表者 : 藤澤克樹 ( 九州大学マス フォア インダストリ研究所 教授 ) 2
More informationHPC (pay-as-you-go) HPC Web 2
,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570
More informationICDE2013study.ppt
ICDE2013 勉強会 R10: Main Memory Query Processing 担当 : 山室健 1 概要 } このセクションの特徴 } in-memory を前提としたクエリ最適化 (Hash Join の高速化や MV による資源の利活用 ) に関する話題 } 紹介する論文リスト } 1. Efficient Many-Core Query Execution in Main Memory
More information2004 Copyright by Tatsuo Minohara Programming with Mac OS X in Lambda 21 - page 2
Living with Mac OS X in Lambda 21 2004 Copyright by Tatsuo Minohara Programming with Mac OS X in Lambda 21 - page 1 2004 Copyright by Tatsuo Minohara Programming with Mac OS X in Lambda 21 - page 2 2004
More informationMacOSXLambdaJava.aw
Living with Mac OS X in Lambda 21 2005 Copyright by Tatsuo Minohara Programming with Mac OS X in Lambda 21 - page 1 2005 Copyright by Tatsuo Minohara Programming with Mac OS X in Lambda 21 - page 2 2005
More informationuntitled
Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941
More information1重谷.PDF
RSCC RSCC RSCC BMT 1 6 3 3000 3000 200310 1994 19942 VPP500/32PE 19992 VPP700E/128PE 160PE 20043 2 2 PC Linux 2048 CPU Intel Xeon 3.06GHzDual) 12.5 TFLOPS SX-7 32CPU/256GB 282.5 GFLOPS Linux 3 PC 1999
More information1, 4,a) 1, 4 1, 4 1, , 4 3, 4 HPC HPC HPC Slurm 1. HPC Tianhe MW MW [1] MW CREST a)
Title 電力制約を考慮した資源管理を行うリソースマネージャの実装と評価 Author(s) 坂本, 龍一 ; タン, カオ ; 和, 遠 ; 近藤, 正章 ; 深沢, 圭田, 将嗣 ; 稲富, 雄一 ; 井上, 弘士 Citation 情報処理学会研究報告 = IPSJ SIG Technical Rep 2015-HPC-151(1): 1-8 Issue Date 2015-09-23 URL
More information[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis
1,a) 2 2 2 1 2 3 24 Motion Frame Omission for Cartoon-like Effects Abstract: Limited animation is a hand-drawn animation style that holds each drawing for two or three successive frames to make up 24 frames
More information21 20 20413525 22 2 4 i 1 1 2 4 2.1.................................. 4 2.1.1 LinuxOS....................... 7 2.1.2....................... 10 2.2........................ 15 3 17 3.1.................................
More information先進的計算基盤システムシンポジウム SACSIS2012 Symposium on Advanced Computing Systems and Infrastructures SACSIS /5/18 CPU, CPU., Memory-bound CPU,., Memory-bo
CPU, CPU, Memory-bound CPU,, Memory-bound ( ) Performance Monitoring Counter(PMC), PMC (nmi watchdog), PMC CPU., PMC, CPU, Memory-bound, CPU-bound,, CPU,, PMC,,,, CPU, NPB 8, 5% CPU, CPU, 3%, 5% CPU, IS
More informationHPEハイパフォーマンスコンピューティング ソリューション
HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System
More informationFINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing
FINAL PROGRAM 22th Annual Workshop SWoPP 2009 2009 / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2009 8 4 ( ) 8 6 ( ) 981-0933 1-2-45 http://www.forestsendai.jp
More informationHTM RaR HTM 2. 2) 3) HTM 2 3 Yoo 4) HTM Adaptive Transaction Scheduling Akpinar 5) HTM Gaona 6) HTM 3. Read-after-Read HTM 3.1 Read-after-Read Read Wr
1 1, 1 1 1 1 Readafter-Read Read-after-Read 66.9% A Speed-Up Technique for Hardware Transactional Memories by Reducing Concurrency Considering Conflicting Addresses Koshiro Hashimoto, 1 Masamichi Eto,
More informationor58_10_599.dvi
c 1. 450 m 14 26 1 =1.852 km/h 300 m 1 2 34 m 1933 2 30 m 135 8533 2 1 6 1 2009 12 29 m 1 (Weather Routing) [1] 2013 10 Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited. 27
More informationプロセッサ・アーキテクチャ
2. NII51002-8.0.0 Nios II Nios II Nios II 2-3 2-4 2-4 2-6 2-7 2-9 I/O 2-18 JTAG Nios II ISA ISA Nios II Nios II Nios II 2 1 Nios II Altera Corporation 2 1 2 1. Nios II Nios II Processor Core JTAG interface
More information卒業論文
PC OpenMP SCore PC OpenMP PC PC PC Myrinet PC PC 1 OpenMP 2 1 3 3 PC 8 OpenMP 11 15 15 16 16 18 19 19 19 20 20 21 21 23 26 29 30 31 32 33 4 5 6 7 SCore 9 PC 10 OpenMP 14 16 17 10 17 11 19 12 19 13 20 1421
More informationuntitled
c 1. 2 2011 2012 0.248 0.252 1 Data Envelopment Analysis DEA 4 2 180 8633 3 3 1 IT DHARMA Ltd. 272 0122 1 14 12 13.10.7 14.5.27 DEA-AR (Assurance Region) 1 DEA 1 1 [1] 2011 2012 220 446 [2] 2. [2] 1 1
More informationDEIM Forum 2010 D Development of a La
DEIM Forum 2010 D5-3 432-8011 3-5-1 E-mail: {cs06062,cs06015}@s.inf.shizuoka.ac.jp, {yokoyama,fukuta,ishikawa}@.inf.shizuoka.ac.jp Development of a Large-scale Visualization System Based on Sensor Network
More informationuntitled
c OR 21 OR 1. 21 21 IoT OR OR OR 260 8672 1 8 1 OR 2. 2.1 public health [1] communicable (infectious) diseases vehicle burden HIV/AIDS (SARS) 258 60 Copyright c by ORSJ. Unauthorized reproduction of this
More information倍々精度RgemmのnVidia C2050上への実装と応用
.. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,
More informationPassMark PerformanceTest ™
KRONOS S ライン 性能ベンチマーク オーバークロックモニター OCCT OverClock Checking Tool i7z (A better i7 (and now i3, i5) reporting tool for Linux) KRONOS S800 CATIA Benchmark Aerospace - 8/17 passengers Jet - Mid Fuse DELL Precision
More informationor58_8_462.dvi
c Twitter2013 30 2013 Twitter Twitter Twitter API 1. Twitter 2006 140 SNS Facebook mixi [1] No.345 2012 12 2013 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ALBERT 151 0053 2 22 17 15 16 17 18 EV 19 20 ALBERT 2013
More informationSlides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments
計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];
More information情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf
Gfarm/Pwrake NICT 1 1 1 1 2 2 3 4 5 5 5 6 NICT 10TB 100TB CPU I/O HPC I/O NICT Gfarm Gfarm Pwrake A Parallel Processing Technique on the NICT Science Cloud via Gfarm/Pwrake KEN T. MURATA 1 HIDENOBU WATANABE
More information<95DB8C9288E397C389C88A E696E6462>
2011 Vol.60 No.2 p.138 147 Performance of the Japanese long-term care benefit: An International comparison based on OECD health data Mie MORIKAWA[1] Takako TSUTSUI[2] [1]National Institute of Public Health,
More informationIPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS
1 1 RTOS DefensiveZone DefensiveZone MPU RTOS RTOS OS Lightweight partitioning architecture for automotive systems Suzuki Takehito 1 Honda Shinya 1 Abstract: Partitioning using protection RTOS has high
More information東京大学情報基盤センターFX10スパコンシステム(Oakleaf-FX)活用事例
FX10 Oakleaf-FX Practical use of FX10 Supercomputer System (Oakleaf-FX) of Information Technology Center, The University of Tokyo 坂口吉生 小倉崇浩 あらまし FUJITSU Supercomputer PRIMEHPC FX10 Oakleaf-FX 2012 4 Oakleaf-FX
More informationIPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla
GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler
More informationCloud[2] (48 ) Xeon Phi (50+ ) IBM Cyclops[9] (64 ) Cavium Octeon II (32 ) Tilera Tile-GX (100 ) PE [11][7] 2 Nsim[10] 8080[1] SH-2[5] SH [8
1600 1,a) 1,b) 8080 SH-2 8080 SH-2 Simulation of a Many-Core Architecture with 16 Million Processing Cores Hisanobu Tomari 1,a) Kei Hiraki 1,b) Abstract: 8080 and SH-2 processors are evaluated as building
More informationIPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS
Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) HA-PACS 2012 2 HA-PACS TCA (Tightly Coupled Accelerators) TCA PEACH2 1. (Graphics Processing Unit) HPC GP(General Purpose ) TOP500 [1] CPU PCI Express (PCIe)
More informationor57_4_175.dvi
c Excel Excel Excel Excel Microsoft Excel 1. OR Microsoft Excel Excel 1 Excel Excel Excel or 2007 Excel OR Excel Excel LP Excel LP Excel 112 8551 1 13 27 1 Excel Excel Excel 2010 Excel OpenOffice Calc
More informationVXPRO R1400® ご提案資料
Intel Core i7 プロセッサ 920 Preliminary Performance Report ノード性能評価 ノード性能の評価 NAS Parallel Benchmark Class B OpenMP 版での性能評価 実行スレッド数を 4 で固定 ( デュアルソケットでは各プロセッサに 2 スレッド ) 全て 2.66GHz のコアとなるため コアあたりのピーク性能は同じ 評価システム
More informationAV 1000 BASE-T LAN 90 IEEE ac USB (3 ) LAN (IEEE 802.1X ) LAN AWS (Amazon Web Services) AP 3 USB wget iperf3 wget 40 MBytes 2 wget 40 MByt
1 BYOD LAN 1 2 3 4 1 BYOD 1 Gb/s LAN BYOD LAN LAN Access Point (AP) IEEE 802.11n BYOD LAN AP wget iperf3 1 AP [2] 2 IEEE 802.11ac [3] AP 4 AV (207 m 2 ) ( 1 2 )[4, 5] AP Wave2 Aruba AP-335 Aruba LAN 7210
More informationuntitled
AMD HPC GP-GPU Opteron HPC 2 1 AMD Opteron 85 FLOPS 10,480 TOP500 16 T2K 95 FLOPS 10,800 140 FLOPS 15,200 61 FLOPS 7,200 3 Barcelona 4 2 AMD Opteron CPU!! ( ) L1 5 2003 2004 2005 2006 2007 2008 2009 2010
More information23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h
23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),
More informationIPSJ SIG Technical Report Vol.2015-HPC-150 No /8/6 I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien Prototyping F
I/O Jianwei Liao 1 Gerofi Balazs 1 1 Guo-Yuan Lien 1 1 1 1 1 30 30 100 30 30 2 Prototyping File I/O Arbitrator Middleware for Real-Time Severe Weather Prediction System Jianwei Liao 1 Gerofi Balazs 1 Yutaka
More informationScaleGraph
超大規模半正定値計画問題に対する高性能汎用ソルバの開発と評価 数理計画問題 ( 最適化問題 ) と 2015 年予想 ( 目標 ) 非常に応用が広範 ( 企業 社会 公共政策 ) 高性能なソルバーを作ること自体が最適化問題 センサーデータによる最適化問題の複雑 & 巨大化 半正定計画問題 (SDP) と混合整数計画問題 (MIP) が 2 大注目数理計画問題 汎用ソルバーの必要性 ( 個別の問題に対する仮定やチューニングは効果が低い
More informationC++ TPDPL(Template Parallel Distributed Processing Library) C X10 1) Place Activity X10 Place 2) 2.2 C++ C/C++OpenMP MPI C/C++ OpenMP
C++ 1 2 2 CPU S.C. () PC C++ TPDPL(Template Parallel Distributed Processing Library) PE(Processing Element ) S.C.(T2K ) An Implementation of C++ Task Mapping Library and Evaluation on Heterogeneous Environments
More informationHPC可視化_小野2.pptx
大 小 二 生 高 方 目 大 方 方 方 Rank Site Processors RMax Processor System Model 1 DOE/NNSA/LANL 122400 1026000 PowerXCell 8i BladeCenter QS22 Cluster 2 DOE/NNSA/LLNL 212992 478200 PowerPC 440 BlueGene/L 3 Argonne
More informationSecond-semi.PDF
PC 2000 2 18 2 HPC Agenda PC Linux OS UNIX OS Linux Linux OS HPC 1 1CPU CPU Beowulf PC (PC) PC CPU(Pentium ) Beowulf: NASA Tomas Sterling Donald Becker 2 (PC ) Beowulf PC!! Linux Cluster (1) Level 1:
More informationDual Stack Virtual Network Dual Stack Network RS DC Real Network 一般端末 GN NTM 端末 C NTM 端末 B IPv4 Private Network IPv4 Global Network NTM 端末 A NTM 端末 B
root Android IPv4/ 1 1 2 1 NAT Network Address Translation IPv4 NTMobile Network Traversal with Mobility NTMobile Android 4.0 VPN API VpnService root VpnService IPv4 IPv4 VpnService NTMobile root IPv4/
More informationuntitled
taisuke@cs.tsukuba.ac.jp http://www.hpcs.is.tsukuba.ac.jp/~taisuke/ CP-PACS HPC PC post CP-PACS CP-PACS II 1990 HPC RWCP, HPC かつての世界最高速計算機も 1996年11月のTOP500 第一位 ピーク性能 614 GFLOPS Linpack性能 368 GFLOPS (地球シミュレータの前
More informationEstimation of Photovoltaic Module Temperature Rise Motonobu Yukawa, Member, Masahisa Asaoka, Non-member (Mitsubishi Electric Corp.) Keigi Takahara, Me
Estimation of Photovoltaic Module Temperature Rise Motonobu Yukawa, Member, Masahisa Asaoka, Non-member (Mitsubishi Electric Corp.) Keigi Takahara, Member (Okinawa Electric Power Co.,Inc.) Toshimitsu Ohshiro,
More information4.1 % 7.5 %
2018 (412837) 4.1 % 7.5 % Abstract Recently, various methods for improving computial performance have been proposed. One of these various methods is Multi-core. Multi-core can execute processes in parallel
More information28 NTMobile Java Proposal and Implementation of Java Wrapper for NTMobile ( : ) :
28 NTMobile Java Proposal and Implementation of Java Wrapper for NTMobile ( : 130441077) : 29 2 10 NTMobile Network Traversal with Mobility NTMobile Linux NTMobile C Java NTMobile Java Java JNA Java Native
More informationIPSJ SIG Technical Report Vol.2015-ARC-215 No.13 Vol.2015-OS-133 No /5/ ,a) % 13.9% 1. Transactional Memory: TM [1] TM TM 1 Nag
1 1 1 1,a) 16 67.2% 13.9% 1. Transactional Memory: TM [1] TM TM 1 Nagoya Institute of Technology, Nagoya, Aichi, 466-8555, Japan a) tsumura@computer.org Hardware Transactional Memory: HTM HTM Read Write
More informationMicrosoft PowerPoint - CCS学際共同boku-08b.ppt
マルチコア / マルチソケットノードに おけるメモリ性能のインパクト 研究代表者朴泰祐筑波大学システム情報工学研究科 taisuke@cs.tsukuba.ac.jp アウトライン 近年の高性能 PC クラスタの傾向と問題 multi-core/multi-socket ノードとメモリ性能 メモリバンド幅に着目した性能測定 multi-link network 性能評価 まとめ 近年の高性能 PC
More information