IPSJ SIG Technical Report Vol.2016-HPC-153 No /3/1 FPGA 1,a) FPGA(Field Programmable Gate Array) FPGA HPC OpenCL FPGA HPC FPGA FEM CG Open

Size: px
Start display at page:

Download "IPSJ SIG Technical Report Vol.2016-HPC-153 No /3/1 FPGA 1,a) FPGA(Field Programmable Gate Array) FPGA HPC OpenCL FPGA HPC FPGA FEM CG Open"

Transcription

1 FPGA 1,a) FPGA(Field Programmable Gate Array) FPGA HPC OpenCL FPGA HPC FPGA FEM CG OpenCL FPGA 1. CPU(Central Processing Unit) GPU(Graphics Processing Unit) HPC FPGA (Field Programmable Gate Array) FPGA FPGA FPGA Catapult[1] HPC FPGA [3], [4] FPGA FPGA Verilog FPGA FPGA OpenCL[2] FPGA Verilog OpenCL HPC FPGA [5], [6] CPU GPU [7], [8] FPGA [9], [10] HPC FPGA OpenCL FPGA 2 FPGA 3 OpenCL FPGA 4 1 a) ohshima@cc.u-tokyo.ac.jp c 2016 Information Processing Society of Japan 1

2 1 FPGA FPGA: Altera Stratix V GS D5 (5SGSMD5K2F40C2) #Logic units (ALMs) 172,600 #RAM blocks (M20K) 2,014 #DSP blocks 1,590 (27 27) : Bittware S5-PCIe-HQ GSMD5 DDR DDR PCIe I/F (4 + 4) MB 25.6 GB/sec Gen3 x8 (OpenCL Gen2 x8 ) Altera Quartus II FPGA 2.1 FPGA OpenCL SDK FPGA Altera Stratix V GS D5 Stratix V Altera FPGA 1 FPGA 1 Adaptive Logic Module (ALM) 172, Look Up Table (LUT) 2 FPGA 2,014 20Kbit RAM (M20K) 640bit Memory Logic Array Block (MLAB) 8,630 Digital Signal Processor (DSP) 27 1,590 DSP Stratix V ALM RAM *1 [11][12] FPGA Bittware PCI Express S5-PCIe-HQ (s5phq d5) ( 1) FPGA Verilog HDL VHDL C Fotran FPGA HPC FPGA *1 Arria 10, Stratix 10 DSP 1 Bittware S5-PCIe-HQ (Bittware QDR II+ ) OpenCL FPGA HPC Altera FPGA Stratix V OpenCL Verilog OpenCL FPGA Altera Stratix V CPU FPGA ARM IP FPGA (Xilinx Zynq, Altera Arria SoC ) OpenCL FPGA CPU PCI Express OpenCL FPGA Altera Stratix V FPGA PCI Express GPU I/O PCI Express *2 PCI Express FPGA OpenCL FPGA PCI Express FPGA (Partial reconfiguration) FPGA PCI Express *2 Intel Altera QPI c 2016 Information Processing Society of Japan 2

3 DDR PCI Express OpenCL OpenCL Verilog HDL IP 1 Intel Xeon E5 (Haswell ) 2 FPGA FPGA --report -c FPGA & 2.2 OpenCL FPGA OpenCL Khronos GPU FPGA DSP(Digital Signal Processor) HPC AMD GPU CPU Xeon Phi NVIDIA GPU OpenCL C/++ (API ) GPU OpenCL CUDA[13] OpenCL 2.0 OpenCL CPU OpenMP 2 FPGA OpenCL GPU CUDA OpenCL FPGA FPGA Altera Altera OpenCL SDK[14] FPGA SDK Stratix V Altera OpenCL Altera 2013 SDK OpenCL ( ) FPGA ( ) GPU API FPGA (FPGA ) 2 OpenCL FPGA FPGA FPGA ( ) kernel global CUDA CUDA Driver API OpenCL CUDA FPGA FPGA OpenCL FPGA GPU OpenCL GPU c 2016 Information Processing Society of Japan 3

4 FPGA GPU OpenCL FPGA GPU 2.3 Altera FPGA Altera [15] [16] (SIMD ) FPGA OpenCL global DDR local RAM ( local ) (SIMD, ) FPGA GPU OpenCL clenqueuendrangekernel FPGA FPGA ID API ID CUDA GPU GPU FPGA OpenCL FPGA attribute num_simd_work_items(4) SIMD 4 num_compute_units(4) OpenCL for while Altera OpenCL Compiler (AOC) 3 for FPGA 1 0 (single stream) for c 2016 Information Processing Society of Japan 4

5 ======================================================================================================================== *** Optimization Report *** ======================================================================================================================== Kernel: cg File:Ln ======================================================================================================================== Loop for.body [1]:30 Pipelined execution inferred Loop for.body5 [1]:37 Pipelined execution inferred. Successive iterations launched every 2 cycles due to: Pipeline structure Loop for.body18 [1]:39 Pipelined execution inferred. Successive iterations launched every 8 cycles due to: Data dependency on variable Largest Critical Path Contributor: 96%: Fadd Operation [1]: Loop for.body37 [1]:45 Pipelined execution inferred. Successive iterations launched every 8 cycles due to: Data dependency on variable BNorm2 [1]:46 Largest Critical Path Contributor: 96%: Fadd Operation [1]:46 3 AOC c 2016 Information Processing Society of Japan 5

6 2.4.1 CPU FPGA OpenCL FPGA [5], [6] Rodinia ppopen-hpc OpenCL FPGA OpenCL FPGA FPGA FPGA OpenCL 2 CG(Conjugate Gradient) C FEM(Finite Element Method) CG (float ) 4 (CG ) 3 7 OpenMP FPGA CPU-FPGA Intel Xeon E5 2 FPGA(Stratix V) 1 {r0} = {b} - [A]{xini} 2 loop 3 solve {z} = [Minv]{r} 4 RHO = {r}{z} 5 if ITER=1 {p} = {z} 6 else BETA = RHO / RHO1 7 {q} = [A]{p} 8 ALPHA = RHO / {p}{q} 9 {x} = {x} + ALPHA * {p} 10 {r} = {r} - ALPHA * {q} 11 endloop 4 CG OpenCL FPGA 3.2 FPGA OpenCL CG kernel CPU-FPGA global OpenCL API (clenqueuereadbuffer, clenqueuewritebuffer) CPU-FPGA global clenqueuendrangekernel API 1 FPGA (FPGA ) -g -W -v --board s5phq_d5 -g -W warning -v --board s5phq_d5 FPGA CPU -O2 OpenCL CPU const restrict const restrict const restrict c 2016 Information Processing Society of Japan 6

7 5 / 6 local 2 local (MHz) Logic utilization 60% 68% 39% Dedicated logic registers 31% 34% 18% Memory blocks 61% 71% 34% DSP blocks 2% 2% 2% 5 ( -1 ) 1000 E v2 CPU gcc O2 CPU OpenCL FPGA restrict warning: declaring kernel argument with no restrict may lead to low kernel performance 2 local FPGA DDR global DDR global local global local local OpenCL FPGA local 2 DSP local 3.4 (SIMD ) SIMD FPGA attribute local SIMD SIMD num_simd_work_items reqd_work_group_size attribute CG SIMD c 2016 Information Processing Society of Japan 7

8 3 (MHz) Logic utilization 68% 63% Dedicated logic registers 34% 31% Memory blocks 71% 68% DSP blocks 2% 2% (msec) SIMD ( ) SIMD Compiler Warning: Kernel Vectorization: branching is thread ID dependent... cannot vectorize. Compiler Warning: Kernel cg : limiting to 2 concurrent work-groups because threads might reach barrier out-of-order ( ) SIMD SIMD num_compute_units attribute 3 SIMD FPGA 2 FPGA SIMD 4. SIMD SIMD FPGA OpenCL CG Compiler Warning: Kernel cg : limiting to 2 concurrent OpenCL work-groups because threads might reach barrier out-of-order. OpenCL FPGA GPU CRS(Compressed Row Storage) SIMD Compiler Warning: Kernel Vectorization: branching is thread ID dependent... cannot vectorize. ID FPGA c 2016 Information Processing Society of Japan 8

9 JST CREST :ppopen-hpc JSPS 15K00166 Quartus II Altera University Program [1] Putnam, A. and Caulfield, A.M. and Chung, E.S. and Chiou, D. and Constantinides, K. and Demme, J. and Esmaeilzadeh, H. and Fowers, J. and Gopal, G.P. and Gray, J. and Haselman, M. and Hauck, S. and Heil, S. and Hormati, A. and Kim, J.-Y. and Lanka, S. and Larus, J. and Peterson, E. and Pope, S. and Smith, A. and Thong, J. and Xiao, P.Y. and Burger, D., A reconfigurable fabric for accelerating large-scale datacenter services, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), pp.13-24, [2] OpenCL - The open standard for parallel programming of heterogeneous systems opencl/ [3],,, Alexander Vazhenin, Stanislav Sedukhin: FPGA, (2015-HPC-149), [4],, :, (2015-HPC-151), [5], Hamid Reza Zohouri,, : OpenCL FPGA, (2015-HPC-150), [6] Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, and SatoshiMatsuoka, Optimizing the Rodinia Benchmark for FPGAs (Unrefereed Workshop Manuscript), (2015-HPC- 152), [7] K. Nakajima and M. Satoh and T. Furumura and H. Okuda and T. Iwashita and H. Sakaguchi and T. Katagiri and M. Matsumoto and S. Ohshima and H. Jitsumoto and T. Arakawa and F. Mori and T. Kitayama and A. Ida and M. Y. Matsuo and K. Fujisawa and et al., ppopen-hpc: Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT), Optimization in the Real World, pp.15 35, DOI / , [8] ppopen-hpc Open Source Infrastructure for Development and Execution of Large-Scale Scientific Applications on Post-Peta-Scale Supercomputers with Automatic Tuning (AT) ac.jp/ppopenhpc/ [9] Tightly Coupled Accelerators GPU Vol.6, No.4, pp.14-25, [10] Yuetsu Kodama, Toshihiro Hanawa, Taisuke Boku and Mitsuhisa Sato, PEACH2: FPGA based PCIe network device for Tightly Coupled Accelerators, International Symposium on Highly-Efficient Accelerators and Reconfigurable Technologies (HEART2014), pp. 3-8, Jun [11] Altera Corporation, Floating-Point IP Cores User Guide, UG-01058, [12] Altera, Stratix V Device Handbook, https: // stratix-v/stx5_core.pdf [13] CUDA Dynamic Parallelism, com/cuda/cuda-c-programming-guide/index.html# cuda-dynamic-parallelism [14] Altera Corporation, SDK for OpenCL - design-software/embedded-software-developers/ opencl/overview.html [15] Altera Corporation, Altera SDK for OpenCL Programming Guide 15.1, UG-OCL002, [16] Altera Corporation, Altera SDK for OpenCL Best Practice Guide 15.1, UG-OCL003, c 2016 Information Processing Society of Japan 9

IPSJ SIG Technical Report Vol.2016-HPC-155 No /8/10 FPGA 1,a) FPGA(Field Programmable Gate Array) FPGA OpenCL FPGA FPGA OpenCL FPGA 1. CP

IPSJ SIG Technical Report Vol.2016-HPC-155 No /8/10 FPGA 1,a) FPGA(Field Programmable Gate Array) FPGA OpenCL FPGA FPGA OpenCL FPGA 1. CP FPGA 1,a) 1 1 1 FPGA(Field Programmable Gate Array) FPGA OpenCL FPGA FPGA OpenCL FPGA 1. CPU GPGPU HPC FPGA (Field Programmable Gate Array) FPGA FPGA FPGA Catapult[1] HPC FPGA [3], [4] FPGA Verilog HDL

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2 ! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale

More information

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) HA-PACS 2012 2 HA-PACS TCA (Tightly Coupled Accelerators) TCA PEACH2 1. (Graphics Processing Unit) HPC GP(General Purpose ) TOP500 [1] CPU PCI Express (PCIe)

More information

組込みシステムシンポジウム2011 Embedded Systems Symposium 2011 ESS /10/20 FPGA Android Android Java FPGA Java FPGA Dalvik VM Intel Atom FPGA PCI Express DM

組込みシステムシンポジウム2011 Embedded Systems Symposium 2011 ESS /10/20 FPGA Android Android Java FPGA Java FPGA Dalvik VM Intel Atom FPGA PCI Express DM Android Android Java Java Dalvik VM Intel Atom PCI Express DMA 1.25 Gbps Atom Android Java Acceleration with an Accelerator in an Android Mobile Terminal Keisuke Koike, Atsushi Ohta, Kohta Ohshima, Kaori

More information

strtok-count.eps

strtok-count.eps IoT FPGA 2016/12/1 IoT FPGA 200MHz 32 ASCII PCI Express FPGA OpenCL (Volvox) Volvox CPU 10 1 IoT (Internet of Things) 2020 208 [1] IoT IoT HTTP JSON ( Python Ruby) IoT IoT IoT (Hadoop [2] ) AI (Artificial

More information

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP Android 1 1 1 1 1 Dominic Hillenbrand 1 1 1 ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GPIO API GPIO API GPIO MPEG2 Optical Flow MPEG2 1PE 0.97[W] 0.63[W] 2PE 1.88[w] 0.46[W] 3PE 2.79[W] 0.37[W] Optical

More information

1 osana@eee.u-ryukyu.ac.jp : FPGA : HDL, Xilinx Vivado + Digilent Nexys4 (Artix-7 100T) LSI / PC clock accurate / Artix-7 XC7A100T Kintex-7 XC7K325T : CAD Hands-on: HDL (Verilog) CAD (Vivado HLx) : 28y4

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation Z HP 6 Z HP HP Z840 Workstation P.9 HP Z640 Workstation & CPU P.10 HP Z440 Workstation P.11 17.3in WIDE HP ZBook 17 G2 Mobile Workstation P.15 15.6in WIDE HP ZBook 15 G2 Mobile Workstation

More information

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 FINAL PROGRAM 25th Annual Workshop SWoPP 2012 2012 / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012 8 1 ( ) 8 3 ( ) 680-0017 101-5 http://www.torikenmin.jp/kenbun/

More information

ネットリストおよびフィジカル・シンセシスの最適化

ネットリストおよびフィジカル・シンセシスの最適化 11. QII52007-7.1.0 Quartus II Quartus II atom atom Electronic Design Interchange Format (.edf) Verilog Quartus (.vqm) Quartus II Quartus II Quartus II Quartus II 1 Quartus II Quartus II 11 3 11 12 Altera

More information

XACCの概要

XACCの概要 2 global void kernel(int a[max], int llimit, int ulimit) {... } : int main(int argc, char *argv[]){ MPI_Int(&argc, &argc); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); dx

More information

PLDとFPGA

PLDとFPGA PLDFPGA 2002/12 PLDFPGA PLD:Programmable Logic Device FPGA:Field Programmable Gate Array Field: Gate Array: LSI MPGA:Mask Programmable Gate Array» FPGA:»» 2 FPGA FPGALSI FPGA FPGA Altera, Xilinx FPGA DVD

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c Vol.214-HPC-145 No.45 214/7/3 OpenACC 1 3,1,2 1,2 GPU CUDA OpenCL OpenACC OpenACC High-level OpenACC CPU Intex Xeon Phi K2X GPU Intel Xeon Phi 27% K2X GPU 24% 1. TSUBAME2.5 CPU GPU CUDA OpenCL CPU OpenMP

More information

? FPGA FPGA FPGA : : : ? ( ) (FFT) ( ) (Localization) ? : 0. 1 2 3 0. 4 5 6 7 3 8 6 1 5 4 9 2 0. 0 5 6 0 8 8 ( ) ? : LU Ax = b LU : Ax = 211 410 221 x 1 x 2 x 3 = 1 0 0 21 1 2 1 0 0 1 2 x = LUx = b 1 31

More information

Microsoft PowerPoint - Lec pptx

Microsoft PowerPoint - Lec pptx Course number: CSC.T34 コンピュータ論理設計 Computer Logic Design 5. リコンフィギャラブルシステム Reconfigurable Systems 吉瀬謙二情報工学系 Kenji Kise, Department of Computer Science kise _at_ c.titech.ac.jp www.arch.cs.titech.ac.jp/lecture/cld/

More information

26 FPGA 11 05340 1 FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1

26 FPGA 11 05340 1 FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1 FPGA 272 11 05340 26 FPGA 11 05340 1 FPGA (Field Programmable Gate Array) ASIC (Application Specific Integrated Circuit) FPGA FPGA FPGA FPGA Linux FreeDOS skewed way L1 FPGA skewed L2 FPGA skewed Linux

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1

More information

FPGAメモリおよび定数のインシステム・アップデート

FPGAメモリおよび定数のインシステム・アップデート QII53012-7.2.0 15. FPGA FPGA Quartus II Joint Test Action Group JTAG FPGA FPGA FPGA Quartus II In-System Memory Content Editor FPGA 15 2 15 3 15 3 15 4 In-System Memory Content Editor Quartus II In-System

More information

プロセッサ・アーキテクチャ

プロセッサ・アーキテクチャ 2. NII51002-8.0.0 Nios II Nios II Nios II 2-3 2-4 2-4 2-6 2-7 2-9 I/O 2-18 JTAG Nios II ISA ISA Nios II Nios II Nios II 2 1 Nios II Altera Corporation 2 1 2 1. Nios II Nios II Processor Core JTAG interface

More information

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0 AMBA 1 1 1 1 FabScalar FabScalar AMBA AMBA FutureBus Improvement of AMBA Bus Frame-work for Heterogeneos Multi-processor Seto Yusuke 1 Takahiro Sasaki 1 Kazuhiko Ohno 1 Toshio Kondo 1 Abstract: The demand

More information

スライド 1

スライド 1 swk(at)ic.is.tohoku.ac.jp 2 Outline 3 ? 4 S/N CCD 5 Q Q V 6 CMOS 1 7 1 2 N 1 2 N 8 CCD: CMOS: 9 : / 10 A-D A D C A D C A D C A D C A D C A D C ADC 11 A-D ADC ADC ADC ADC ADC ADC ADC ADC ADC A-D 12 ADC

More information

untitled

untitled c NUMA 1. 18 (Moore s law) 1Hz CPU 2. 1 (Register) (RAM) Level 1 (L1) L2 L3 L4 TLB (translation look-aside buffer) (OS) TLB TLB 3. NUMA NUMA (Non-uniform memory access) 819 0395 744 1 2014 10 Copyright

More information

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation 1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1

More information

IPSJ SIG Technical Report Vol.2015-ARC-215 No.7 Vol.2015-OS-133 No /5/26 Just-In-Time PG 1,a) 1, Just-In-Time VM Geyser Dalvik VM Caffei

IPSJ SIG Technical Report Vol.2015-ARC-215 No.7 Vol.2015-OS-133 No /5/26 Just-In-Time PG 1,a) 1, Just-In-Time VM Geyser Dalvik VM Caffei Just-In-Time PG 1,a) 1, 1 2 1 1 Just-In-Time VM Geyser Dalvik VM CaffeineMark SPECJVM 17% 1. LSI [1][2][3][4][5] (PG) Geyser [6][7] PG ON/OFF OS PG PG [7][8][9][10] Java Just-In-Time (JIT PG [10] JIT 1

More information

デザインパフォーマンス向上のためのHDLコーディング法

デザインパフォーマンス向上のためのHDLコーディング法 WP231 (1.1) 2006 1 6 HDL FPGA TL TL 100MHz 400MHz HDL FPGA FPGA 2005 2006 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx,

More information

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing FINAL PROGRAM 22th Annual Workshop SWoPP 2009 2009 / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2009 8 4 ( ) 8 6 ( ) 981-0933 1-2-45 http://www.forestsendai.jp

More information

Nios II ハードウェア・チュートリアル

Nios II ハードウェア・チュートリアル Nios II ver. 7.1 2007 8 1. Nios II FPGA Nios II Quaruts II 7.1 Nios II 7.1 Nios II Cyclone II count_binary 2. 2-1. http://www.altera.com/literature/lit-nio2.jsp 2-2. Nios II Quartus II FEATURE Nios II

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation E5 v2 Z Z SFF E5 v2 2 HP Windows Z 3 Performance Innovation Reliability 3 HPZ HP HP Z820 Workstation P.11 HP Z620 Workstation & CPU P.12 HP Z420 Workstation P.13 17.3in WIDE HP ZBook 17

More information

Cyclone IIIデバイスのI/O機能

Cyclone IIIデバイスのI/O機能 7. Cyclone III I/O CIII51003-1.0 2 Cyclone III I/O 1 I/O 1 I/O Cyclone III I/O FPGA I/O I/O On-Chip Termination OCT Quartus II I/O Cyclone III I/O Cyclone III LAB I/O IOE I/O I/O IOE I/O 5 Cyclone III

More information

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP : BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S

SWoPP BOF BOF-1 8/3 19:10 BoF SWoPP :   BOF-2 8/5 17:00 19:00 HW/SW 15 x5 SimMips/MieruPC M-Core/SimMc FPGA S FINAL PROGRAM 23rd Annual Workshop SWoPP 2010 2010 / / 2010 Kanazawa Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2010 8 3 ( ) 8 5 ( ) 920-0864 15 1 http://www.bunka-h.gr.jp/

More information

「FPGAを用いたプロセッサ検証システムの製作」

「FPGAを用いたプロセッサ検証システムの製作」 FPGA 2210010149-5 2005 2 21 RISC Verilog-HDL FPGA (celoxica RC100 ) LSI LSI HDL CAD HDL 3 HDL FPGA MPU i 1. 1 2. 3 2.1 HDL FPGA 3 2.2 5 2.3 6 2.3.1 FPGA 6 2.3.2 Flash Memory 6 2.3.3 Flash Memory 7 2.3.4

More information

User-defined Logic Application Memory Manager (Replacement) Application Specific Prefetcher (ASP) Application Kernel On-chip RAM (BRAM) On-chip RAM I/

User-defined Logic Application Memory Manager (Replacement) Application Specific Prefetcher (ASP) Application Kernel On-chip RAM (BRAM) On-chip RAM I/ RTL 1,2,a) 1,b) CPU Verilog HDL RTL 1. CPU GPU Verilog HDL VHDL RTL HDL Vivado HLS Impulse C CPU 1 2 a) takamaeda@arch.cs.titech.ac.jp b) kise@cs.titech.ac.jp RTL RTL RTL Verilog HDL RTL 2. 1 HDL 1 User-defined

More information

HPEハイパフォーマンスコンピューティング ソリューション

HPEハイパフォーマンスコンピューティング ソリューション HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System

More information

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速 1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES

More information

ACE Associated Computer Experts bv

ACE Associated Computer Experts bv CoSy Application CoSy Marcel Beemster/Yoichi Sugiyama ACE Associated Compiler Experts & Japan Novel Corporation contact: yo_sugi@jnovel.co.jp Parallel Architecture 2 VLIW SIMD MIMD 3 MIMD HW DSP VLIW/ILP

More information

_ _2013_中島_YR

_ _2013_中島_YR ポストペタ - スケール高性能計算に資するシステムソフトウェア技術の創出 平成 22 年度採択研究代表者 H25 年度 実績報告 中島研吾 東京大学情報基盤センター 教授 自動チューニング機構を有するアプリケーション開発 実行環境 1. 研究実施体制 (1) 中島グループ 1 研究代表者 : 中島研吾 ( 東京大学情報基盤センター 教授 ) 2 研究項目 : 自動チューニング機構を有するポストペタスケールアプリケーション開発

More information

Cloud[2] (48 ) Xeon Phi (50+ ) IBM Cyclops[9] (64 ) Cavium Octeon II (32 ) Tilera Tile-GX (100 ) PE [11][7] 2 Nsim[10] 8080[1] SH-2[5] SH [8

Cloud[2] (48 ) Xeon Phi (50+ ) IBM Cyclops[9] (64 ) Cavium Octeon II (32 ) Tilera Tile-GX (100 ) PE [11][7] 2 Nsim[10] 8080[1] SH-2[5] SH [8 1600 1,a) 1,b) 8080 SH-2 8080 SH-2 Simulation of a Many-Core Architecture with 16 Million Processing Cores Hisanobu Tomari 1,a) Kei Hiraki 1,b) Abstract: 8080 and SH-2 processors are evaluated as building

More information

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

(    CUDA CUDA CUDA CUDA (  NVIDIA CUDA I GPGPU (II) GPGPU CUDA 1 GPGPU CUDA(CUDA Unified Device Architecture) CUDA NVIDIA GPU *1 C/C++ (nvcc) CUDA NVIDIA GPU GPU CUDA CUDA 1 CUDA CUDA 2 CUDA NVIDIA GPU PC Windows Linux MaxOSX CUDA GPU CUDA NVIDIA

More information

5 2 5 Stratix IV PLL 2 CMU PLL 1 ALTGX MegaWizard Plug-In Manager Reconfig Alt PLL CMU PLL Channel and TX PLL select/reconfig CMU PLL reconfiguration

5 2 5 Stratix IV PLL 2 CMU PLL 1 ALTGX MegaWizard Plug-In Manager Reconfig Alt PLL CMU PLL Channel and TX PLL select/reconfig CMU PLL reconfiguration 5. Stratix IV SIV52005-2.0 Stratix IV GX PMA BER FPGA PMA CMU PLL Pphased-Locked Loop CDR 5 1 5 3 5 5 Quartus II MegaWizard Plug-In Manager 5 42 5 47 rx_tx_duplex_sel[1:0] 5 49 logical_channel_address

More information

スライド 1

スライド 1 GPU クラスタによる格子 QCD 計算 広大理尾崎裕介 石川健一 1.1 Introduction Graphic Processing Units 1 チップに数百個の演算器 多数の演算器による並列計算 ~TFLOPS ( 単精度 ) CPU 数十 GFLOPS バンド幅 ~100GB/s コストパフォーマンス ~$400 GPU の開発環境 NVIDIA CUDA http://www.nvidia.co.jp/object/cuda_home_new_jp.html

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

IPSJ SIG Technical Report Vol.2015-HPC-149 No /6/26 FPGA 1,a) 2, 2, Alexander Vazhenin 2, Stanislav Sedukhin 2 MOST(Method Of Splitting Tsunami)

IPSJ SIG Technical Report Vol.2015-HPC-149 No /6/26 FPGA 1,a) 2, 2, Alexander Vazhenin 2, Stanislav Sedukhin 2 MOST(Method Of Splitting Tsunami) FPGA 1,a) 2, 2, Alexander Vazhenin 2, Stanislav Sedukhin 2 MOST(Method Of Splitting Tsunami) FPGA 28nm Statix V FPGA 4.0GB/s 80GFlop/s MOST FPGA,, Performance Evaluation of FPGA-based Stream Computing

More information

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**

More information

IPSJ SIG Technical Report Vol.2016-ARC-221 No /8/9 GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC Dalv

IPSJ SIG Technical Report Vol.2016-ARC-221 No /8/9 GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC Dalv GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC DalvikVM[1] GC 1 Nagoya Institute of Technology GC GC 2. GC GC 2.1 GC 1 c 2016 Information Processing Society of

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

DDR3 SDRAMメモリ・インタフェースのレベリング手法の活用

DDR3 SDRAMメモリ・インタフェースのレベリング手法の活用 WP-01034-1.0/JP DLL (PVT compensation) 90 PLL PVT compensated FPGA fabric 90 Stratix III I/O block Read Dynamic OC T FPGA Write Memory Run Time Configurable Run Time Configurable Set at Compile dq0 dq1

More information

26102 (1/2) LSISoC: (1) (*) (*) GPU SIMD MIMD FPGA DES, AES (2/2) (2) FPGA(8bit) (ISS: Instruction Set Simulator) (3) (4) LSI ECU110100ECU1 ECU ECU ECU ECU FPGA ECU main() { int i, j, k for { } 1 GP-GPU

More information

IBM PureData

IBM PureData IBM Software Information Management IBM PureData System for Analytics 2 IBM PureData System for Analytics 2 2 3 2 5 - S-Blade 6 S-Blade - IBM FAST 7 7 8 10 11 11 IBM PureData System for Analytics 11 IBM

More information

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing 1,a) 1,b) 1,c) 2012 11 8 2012 12 18, 2013 1 27 WEB Ruby Removal Filters Using Genetic Programming for Early-modern Japanese Printed Books Taeka Awazu 1,a) Masami Takata 1,b) Kazuki Joe 1,c) Received: November

More information

HP High Performance Computing(HPC)

HP High Performance Computing(HPC) ACCELERATE HP High Performance Computing HPC HPC HPC HPC HPC 1000 HPHPC HPC HP HPC HPC HPC HP HPCHP HP HPC 1 HPC HP 2 HPC HPC HP ITIDC HP HPC 1HPC HPC No.1 HPC TOP500 2010 11 HP 159 32% HP HPCHP 2010 Q1-Q4

More information

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS RTOS OS Lightweight partitioning architecture for automotive systems Suzuki Takehito 1 Honda Shinya 1 Abstract: Partitioning using protection RTOS has high

More information

matrox0

matrox0 Image processing products Hardware/Software Software Hardware INDEX 4 3 2 12 13 15 18 14 11 10 21 26 20 9 8 7 6 5 Hardware 2 MatroxRadient 3 MatroxSolios MatroxMorphis MatroxVio 10 MatroxOrionHD 11 MatroxConcord

More information

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP 10000 SFMOPT / / MOPT(Merge OPTimization) MOPT FMOPT(Fast MOPT) FMOPT SFMOPT(Subgrouping FMOPT) SFMOPT 2 8192 31 The Proposal and Evaluation of SFMOPT, a Task Mapping Method for 10000 Tasks Haruka Asano

More information

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

単位、情報量、デジタルデータ、CPUと高速化  ~ICT用語集~ CPU ICT mizutani@ic.daito.ac.jp 2014 SI: Systèm International d Unités SI SI 10 1 da 10 1 d 10 2 h 10 2 c 10 3 k 10 3 m 10 6 M 10 6 µ 10 9 G 10 9 n 10 12 T 10 12 p 10 15 P 10 15 f 10 18 E 10 18 a 10 21

More information

09中西

09中西 PC NEC Linux (1) (2) (1) (2) 1 Linux Linux 2002.11.22) LLNL Linux Intel Xeon 2300 ASCIWhite1/7 / HPC (IDC) 2002 800 2005 2004 HPC 80%Linux) Linux ASCI Purple (ASCI 100TFlops Blue Gene/L 1PFlops (2005)

More information

Stratix IIIデバイスの外部メモリ・インタフェース

Stratix IIIデバイスの外部メモリ・インタフェース 8. Stratix III SIII51008-1.1 Stratix III I/O R3 SRAM R2 SRAM R SRAM RII+ SRAM RII SRAM RLRAM II 400 MHz R Stratix III I/O On-Chip Termination OCT / HR 4 36 R ouble ata RateStratix III FPGA Stratix III

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

Run-Based Trieから構成される 決定木の枝刈り法

Run-Based Trieから構成される  決定木の枝刈り法 Run-Based Trie 2 2 25 6 Run-Based Trie Simple Search Run-Based Trie Network A Network B Packet Router Packet Filtering Policy Rule Network A, K Network B Network C, D Action Permit Deny Permit Network

More information

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC H.264 CABAC 1 1 1 1 1 2, CABAC(Context-based Adaptive Binary Arithmetic Coding) H.264, CABAC, A Parallelization Technology of H.264 CABAC For Real Time Encoder of Moving Picture YUSUKE YATABE 1 HIRONORI

More information

01_OpenMP_osx.indd

01_OpenMP_osx.indd OpenMP* / 1 1... 2 2... 3 3... 5 4... 7 5... 9 5.1... 9 5.2 OpenMP* API... 13 6... 17 7... 19 / 4 1 2 C/C++ OpenMP* 3 Fortran OpenMP* 4 PC 1 1 9.0 Linux* Windows* Xeon Itanium OS 1 2 2 WEB OS OS OS 1 OS

More information

untitled

untitled PC murakami@cc.kyushu-u.ac.jp muscle server blade server PC PC + EHPC/Eric (Embedded HPC with Eric) 1216 Compact PCI Compact PCIPC Compact PCISH-4 Compact PCISH-4 Eric Eric EHPC/Eric EHPC/Eric Gigabit

More information

10D16.dvi

10D16.dvi D IEEJ Transactions on Industry Applications Vol.136 No.10 pp.686 691 DOI: 10.1541/ieejias.136.686 NW Accelerating Techniques for Sequence Alignment based on an Extended NW Algorithm Jin Okaze, Non-member,

More information

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1.

IPSJ SIG Technical Report Vol.2015-MUS-107 No /5/23 HARK-Binaural Raspberry Pi 2 1,a) ( ) HARK 2 HARK-Binaural A/D Raspberry Pi 2 1. HARK-Binaural Raspberry Pi 2 1,a) 1 1 1 2 3 () HARK 2 HARK-Binaural A/D Raspberry Pi 2 1. [1,2] [2 5] () HARK (Honda Research Institute Japan audition for robots with Kyoto University) *1 GUI ( 1) Python

More information

CANON_IT_catalog_1612

CANON_IT_catalog_1612 Image processing products Hardware /Software MatroxRadient Pro CL 7 HDR-26 HDR-26 Data Clock CC [4] UART Data Clock CC [4] UART Camera Link Interface w/ PoCL Camera Link Interface w/ PoCL Image Reconstruction

More information

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat AUTOSAR 1 1, 2 2 2 AUTOSAR AUTOSAR 3 2 2 41% 29% An Extension of AUTOSAR Communication Layers for Multicore Systems Toshiyuki Ichiba, 1 Hiroaki Takada, 1, 2 Shinya Honda 2 and Ryo Kurachi 2 AUTOSAR, a

More information

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~

MATLAB® における並列・分散コンピューティング ~ Parallel Computing Toolbox™ & MATLAB Distributed Computing Server™ ~ MATLAB における並列 分散コンピューティング ~ Parallel Computing Toolbox & MATLAB Distributed Computing Server ~ MathWorks Japan Application Engineering Group Takashi Yoshida 2016 The MathWorks, Inc. 1 System Configuration

More information

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU GPU MapReduce 1 1 1, 2, 3 MapReduce GPGPU GPU GPU MapReduce CPU GPU GPU CPU GPU CPU GPU Map K-Means CPU 2GPU CPU 1.02-1.93 Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments Koichi

More information

HardCopy IIIデバイスの外部メモリ・インタフェース

HardCopy IIIデバイスの外部メモリ・インタフェース 7. HardCopy III HIII51007-1.0 Stratix III I/O HardCopy III I/O R3 R2 R SRAM RII+ RII SRAM RLRAM II R HardCopy III Stratix III LL elay- Locked Loop PLL Phase-Locked Loop On-Chip Termination HR 4 36 HardCopy

More information

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo

IPSJ SIG Technical Report Vol.2011-IOT-12 No /3/ , 6 Construction and Operation of Large Scale Web Contents Distribution Platfo 1 1 2 3 4 5 1 1, 6 Construction and Operation of Large Scale Web Contents Distribution Platform using Cloud Computing 1. ( ) 1 IT Web Yoshihiro Okamoto, 1 Naomi Terada and Tomohisa Akafuji, 1, 2 Yuko Okamoto,

More information

GPUコンピューティング講習会パート1

GPUコンピューティング講習会パート1 GPU コンピューティング (CUDA) 講習会 GPU と GPU を用いた計算の概要 丸山直也 スケジュール 13:20-13:50 GPU を用いた計算の概要 担当丸山 13:50-14:30 GPU コンピューティングによる HPC アプリケーションの高速化の事例紹介 担当青木 14:30-14:40 休憩 14:40-17:00 CUDA プログラミングの基礎 担当丸山 TSUBAME の

More information

VHDL-AMS Department of Electrical Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, Japan TOYOTA Motor Corporation, Susono, Shizuok

VHDL-AMS Department of Electrical Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, Japan TOYOTA Motor Corporation, Susono, Shizuok VHDL-AMS 1-3 1200 Department of Electrical Engineering, Doshisha University, Tatara, Kyotanabe, Kyoto, Japan TOYOTA Motor Corporation, Susono, Shizuoka, Japan E-mail: tkato@mail.doshisha.ac.jp E-mail:

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

LAN LAN LAN LAN LAN LAN,, i

LAN LAN LAN LAN LAN LAN,, i 22 A secure wireless communication system using virtualization technologies 1115139 2011 3 4 LAN LAN LAN LAN LAN LAN,, i Abstract A secure wireless communication system using virtualization technologies

More information

untitled

untitled AMD HPC GP-GPU Opteron HPC 2 1 AMD Opteron 85 FLOPS 10,480 TOP500 16 T2K 95 FLOPS 10,800 140 FLOPS 15,200 61 FLOPS 7,200 3 Barcelona 4 2 AMD Opteron CPU!! ( ) L1 5 2003 2004 2005 2006 2007 2008 2009 2010

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

DS0 0/9/ a b c d u t (a) (b) (c) (d) [].,., Del Barrio [], Pilato [], [].,,. [],.,.,,.,.,,.,, 0%,..,,, 0,.,.,. (variable-latency unit)., (a) ( DFG ).,

DS0 0/9/ a b c d u t (a) (b) (c) (d) [].,., Del Barrio [], Pilato [], [].,,. [],.,.,,.,.,,.,, 0%,..,,, 0,.,.,. (variable-latency unit)., (a) ( DFG )., DS0 0/9/,.,,.,,,.,.,.0%,.%.,,,, Speculative Execution in Distributed Controllers for High-Level Synthesis Shimizu iho Ishiura Nagisa bstract: This article proposes a method of incorporating speculative

More information

2017 (413812)

2017 (413812) 2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

if clear = 1 then Q <= " "; elsif we = 1 then Q <= D; end rtl; regs.vhdl clk 0 1 rst clear we Write Enable we 1 we 0 if clk 1 Q if rst =

if clear = 1 then Q <=  ; elsif we = 1 then Q <= D; end rtl; regs.vhdl clk 0 1 rst clear we Write Enable we 1 we 0 if clk 1 Q if rst = VHDL 2 1 VHDL 1 VHDL FPGA VHDL 2 HDL VHDL 2.1 D 1 library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; regs.vhdl entity regs is clk, rst : in std_logic; clear : in std_logic; we

More information

untitled

untitled Power Wall HPL1 10 B/F EXTREMETECH Supercomputing director bets $2,000 that we won t have exascale computing by 2020 One of the biggest problems standing in our way is power. [] http://www.extremetech.com/computing/155941

More information

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57 WebGL 2014.04.15 X021 2014 3 1F Kageyama (Kobe Univ.) Visualization 2014.04.15 1 / 57 WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization 2014.04.15 2 / 57 WebGL Kageyama (Kobe Univ.) Visualization 2014.04.15

More information

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

論理設計の基礎

論理設計の基礎 . ( ) IC (Programmable Logic Device, PLD) VHDL 2. IC PLD 2.. PLD PLD PLD SIC PLD PLD CPLD(Complex PLD) FPG(Field Programmable Gate rray) 2.2. PLD PLD PLD I/O I/O : PLD D PLD Cp D / Q 3. VHDL 3.. HDL (Hardware

More information

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.

More information

ADZBT1 Hardware User Manual Hardware User Manual Version 1.0 1/13 アドバンスデザインテクノロジー株式会社

ADZBT1 Hardware User Manual Hardware User Manual Version 1.0 1/13 アドバンスデザインテクノロジー株式会社 Hardware User Manual Version 1.0 1/13 アドバンスデザインテクノロジー株式会社 Revision History Version Date Comment 1.0 2019/4/25 新規作成 2/13 アドバンスデザインテクノロジー株式会社 目次 1 Overview... 4 2 Block Diagram... 5 3 機能説明... 6 3.1 Power

More information

C_PLD報告書要約_H doc

C_PLD報告書要約_H doc 14 4 26 PLD/FPGA PLDProgrammable Logic Device ASIC 1970 1 ROM( ROM) AND-OR PLD PROM AND-OR 1970 Signetics(Philips)MMI(Lattice) PLD MMI PAL TM (Programmable Array Logic)PAL TM OR PROM( ROM) / 1980 1CMOS

More information

IPSJ SIG Technical Report Vol.2013-CVIM-188 No /9/2 1,a) D. Marr D. Marr 1. (feature-based) (area-based) (Dense Stereo Vision) van der Ma

IPSJ SIG Technical Report Vol.2013-CVIM-188 No /9/2 1,a) D. Marr D. Marr 1. (feature-based) (area-based) (Dense Stereo Vision) van der Ma ,a) D. Marr D. Marr. (feature-based) (area-based) (Dense Stereo Vision) van der Mark [] (Intelligent Vehicle: IV) SAD(Sum of Absolute Difference) Intel x86 CPU SSE2(Streaming SIMD Extensions 2) CPU IV

More information

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1 TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU 2010 11 TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME 1.0 30 2.4PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of

More information

HPC (pay-as-you-go) HPC Web 2

HPC (pay-as-you-go) HPC Web 2 ,, 1 HPC (pay-as-you-go) HPC Web 2 HPC Amazon EC2 OpenFOAM GPU EC2 3 HPC MPI MPI Courant 1 GPGPU MPI 4 AMAZON EC2 GPU CLUSTER COMPUTE INSTANCE EC2 GPU (cg1.4xlarge) ( N. Virgina ) Quadcore Intel Xeon 5570

More information

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] I/O Abstr

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] I/O Abstr THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] 466-8555 E-mail: fukushima@nitech.ac.jp I/O Abstract [Invited] High-Performance Computing Programming

More information

Chip PlannerによるECO

Chip PlannerによるECO 13. Chip Planner ECO QII52017-8.0.0 ECO Engineering Change Orders Chip Planner ECO Chip Planner FPGA LAB LE ALM ECO ECO ECO ECO Chip Planner Chip Planner ECO LogicLock Chip Planner Quartus II Volume 2

More information