211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

Size: px
Start display at page:

Download "211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G"

Transcription

1 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 GPU 4 8 BLAS 4 8 BLAS Basic Linear Algebra Subprograms GPU Graphics Processing Unit 4 8 double 2 4 double-double DD quad-double QD 8 VIDIA Tesla C25 Intel Core i AXPY AXPY 19 4 GEMM CPU 29 8 GEMM 24 Tesla C25 4 AXPY 2.1 GEMV GEMM CPU PCI-Express PCIe GEMM PCIe 4 8 GEMM 4 8 BLAS GPU CPU Implementation and Evaluation of Quadruple and Octuple Precision BLAS on GPUs Daichi Mukunoki and Daisuke Takahashi We implemented quadruple and octuple precision Basic Linear Algebra Subprograms (BLAS) functions on graphics processing units (GPUs), and evaluated their performances. We used DD-type quadruple precision operation, which combines two double precision values to represent a quadruple precision value, and QD-type octuple precision operation, which combines four double precision value, to represent a octuple precision value. On VIDIA Tesla C25, quadruple precision AXPY is approximately 9.5 times faster, and octuple precision AXPY is approximately 19 times faster than that on Intel Core i7 92. Additionally, quadruple precision GEMM is approximately 29 times faster, and octuple precision GEMM is approximately 24 times faster than that on the CPU. Moreover, the execution time of quadruple precision AXPY takes only approximately 2.1 times longer than that of double precision AXPY on the GPU. Also on quadruple and octuple precision GEMV and GEMM on the GPU, the increase of the execution time relative to double precision operation is decreased compared to the CPU. On the other hand, taking the PCI-Express (PCIe) data transfer time into consideration, the performance of double precision GEMM is limited by PCIe data transfer time, but that of quadruple and octuple precision GEMM is almost not limited by them. In this research, we show that quadruple and octuple precision BLAS operations are suitable for GPUs. 1. CG Graduate School of Systems and Information Engineering, University of Tsukuba 64bit 148 c 211 Information Processing Society of Japan

2 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU General Purpose computing on GPU GPU VIDIA Tesla C25 13GFlops 515GFlops CPU GPU CPU CPU GPU PCI-Express PCIe PCIe 2. x16 8GB/s GPU GPU 1 Byte/Flop GPU 4 8 BLAS Basic Linear Algebra Subprograms VIDIA GPU BLAS GPU 4 8 Byte/Flop 4 8 BLAS GPU BLAS GPU GMP 1) MPFR 2) ARPREC 3) 4 8 QD 4) QD double 2 4 double-double DD 4 double 4 8 quad-double QD QD DD 4 QD 8 BLAS DD XBLAS 5) XBLAS DD MBLAS 6) BLAS GMP MPFR QD 4 8 CPU GPU Göddeke 7) GPU FEM double-float Thall 8) double-float quad-float GPU 9) DD 4 AMD GPU GRAPE-DR Zhao 1) GPU GMP GPUMP Lu 11) QD ARPREC GPU GQD GARPREC 149 c 211 Information Processing Society of Japan

3 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 GPU BLAS GPU BLAS GPU BLAS GPU GPU BLAS DD 4 QD 8 QD DD 4 QD 8 1 DD 4 4 a 2 a a 1 a = a + a 1 a > a 1 QD 8 4 a = a + a 1 + a 2 + a 3 a > a 1 > a 2 > a 3 IEEE binary DD IEEE binary128 QD IEEE DD 4 4 a = a + a 1 b = b + b 1 a a 1 b b 1 QD 8 Hida 12) QD sloppy 2 4 a 1 b 1 16 DD 4 QD 8 a b + c 16 Fused-Multiply Add FMA DD 4 QD 8 1 DD 2 QD 4 GPU 1 DD 4 QD 8 DD 4 QD 8 2 Flop 9 Flop FMA 1 Flop 193 Flop FMA 24 Flop 333 Flop BLAS QD DD 4 QD 8 BLAS GPU DD 4 QD 8 VIDIA GPGPU CUDA Compute Unified Device Architecture GT2 GPU 3.1 BLAS Level 1 3 BLAS Level 1 BLAS AXPY (y = αx + y) Level 2 BLAS GEMV (y = αax + βy) Level 3 BLAS GEMM (C = αab + βc) BLAS CUDA CPU GPU GPU CPU GPU PCIe BLAS BLAS CUDA 1 AXPY GEMV ID GEMM = = 256 Tesla C16 16KB Tesla C25 64KB 16KB L1 48KB 48KB L1 16KB 2 GPU 16KB 15 c 211 Information Processing Society of Japan

4 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/ QD CUDA QD CUDA Lu GQD GQD 16 sloppy GPU FMA QD DD CUDA 13) QD 4 8 CPU QD BLAS GPU GT2 GPU 16 FMA FMA CUDA FMA FMA FMA FMA FMA fma rn FMA FMA dmul rn dadd rn DD 4 QD 8 AXPY GEMV GEMM VIDIA Tesla C25 Fermi VIDIA Tesla C16 GT2 2 GPU Tesla C16 78GFlops Tesla C25 515GFlops Tesla C16 4GB GDDR3 12GB/s Tesla C25 3GB GDDR5 144GB/s Tesla C25 ECC ECC ECC GPU 4 8 BLAS CUDDBLAS DD 4 CUQD- BLAS QD 8 GotoBLAS CPU CUBLAS 3.1 GPU CPU CUDDBLAS CUQDBLAS DD 4 QD 8 BLAS CPU BLAS DDBLAS QDBLAS CPU DD 4 QD 8 BLAS MBLAS MBLAS DDBLAS QDBLAS QD ) OpenMP CPU Intel Core i GHz Quad- Core Hyper-Threading GotoBLAS DDBLAS QDBLAS CPU 4 OS CentOS 5.5 x86-64 kernel el5 CUDA Version 3.1 CPU g O3 GPU nvcc 3.1 O3 DDBLAS QDBLAS QD Intel C++ icpc 11.1 fast 1 DD 4 QD 8 Flops DDFlops QDFlops 1 GPU BLAS CPU GPU PCIe 4 8 QD dd rand qd rand AXPY α GEMV GEMM α 1. β. 4.2 AXPY 4 8 AXPY 2 4 AXPY 2 3 2:3 Byte/Flop = 8, 192, Tesla C25 1.6GFlops 2.1%4 Tesla C25 5.GDDFlops 1 FMA 4 2DDFlop 3Flop 5.GDDFlops 75.5GFlops 14.7% Tesla C GQDFlops 17.8GFlops 151 c 211 Information Processing Society of Japan

5 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 12 DAXPY (Double).8 QDAXPY (Octuple) GFlops GQDFlops e e e e e e e e+6 GotoBLAS (Double, Core i7 92) CUBLAS (Double, Tesla C16) CUBLAS (Double, Tesla C25) QDBLAS (Octuple, Core i7 92) CUQDBLAS (Octuple, Tesla C16) CUQDBLAS (Octuple, Tesla C25) 2 AXPY 4 8 AXPY GDDFlops DDAXPY (Quadruple) 2.48e e e e+6 DDBLAS (Quadruple, Core i7 92) CUDDBLAS (Quadruple, Tesla C16) CUDDBLAS (Quadruple, Tesla C25) 3 4 AXPY 2.9%4 8 Byte/Flop GPU CPU Tesla C AXPY AXPY 15 CPU AXPY 2.5 Tesla C25 AXPY 2.1 AXPY DD 2 8 AXPY CPU 34 Tesla C GEMV 4 8 GEMV 5 7 = 8, 192 CPU Tesla C CPU GEMV 4 GEMV GEMV 89 Tesla C25 GEMV GEMV :1 AXPY 2:3 Byte/Flop AXPY CPU GEMV AXPY GEMV 4 8 AXPY Tesla C16 GEMV AXPY CPU GEMV AXPY Tesla C25 GEMV AXPY 3. 4 GEMV 4 AXPY 2. 8 GEMV 8 AXPY CPU Tesla C AXPY Tesla C25 4 AXPY 8 AXPY GEMM 152 c 211 Information Processing Society of Japan

6 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 35 DGEMV (Double) 1 QDGEMV (Octuple) GFlops 2 15 GQDFlops GotoBLAS (Double, Core i7 92) CUBLAS (Double, Tesla C16) CUBLAS (Double, Tesla C25) QDBLAS (Octuple, Core i7 92) CUQDBLAS (Octuple, Tesla C16) CUQDBLAS (Octuple, Tesla C25) 5 GEMV 7 8 GEMV 12 DDGEMV (Quadruple) 2 FMA AXPY: =8,192,, GEMV: =8,192, GEMM: =4,96 GDDFlops FMA FMA 4 AXPY 5.3 GDDFlops 4.96 GDDFlops 8 AXPY.76 GQDFlops.7 GQDFlops 4 GEMV 1.18 GDDFlops 7.1 GDDFlops 8 GEMV.8 GQDFlops.62 GQDFlops 4 GEMM GDDFlops 1.6 GDDFlops 8 GEMM.97 GQDFlops.74 GQDFlops DDBLAS (Quadruple, Core i7 92) CUDDBLAS (Quadruple, Tesla C16) CUDDBLAS (Quadruple, Tesla C25) 6 4 GEMV 4.4 GEMM 4 8 GEMV 8 1 = 4, 96 CPU Tesla C CPU Tesla C GEMM Tesla C GFlops 96.4% Tesla C GFlops 33.8%GEMM : 3 GEMM AXPY GEMV Byte/Flop Tesla C25 GEMM 9 4 Tesla C16 2.6GDDFlops Tesla C GDDFlops 39GFlops 212.8GFlops 5.1% 41.3% 1 8 Tesla C16.18GQDFlops Tesla C25.97GQDFlops 25.8GFlops 137.4GFlops 25.8% 33.1% DD QD FMA FMA DD 3.4% QD 3.7% 4.5 FMA CUDDBLAS CUQD- BLAS FMA FMA FMA 153 c 211 Information Processing Society of Japan

7 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 25 DGEMM (Double) 1.2 QDGEMM (Octuple) 2 1 GFlops 15 1 GQDFlops GotoBLAS (Double, Core i7 92) CUBLAS (Double, Tesla C16) CUBLAS (Double, Tesla C25) QDBLAS (Octuple, Core i7 92) CUQDBLAS (Octuple, Tesla C16) CUQDBLAS (Octuple, Tesla C25) 8 GEMM 1 8 GEMM GDDFlops DDGEMM (Quadruple) DDBLAS (Quadruple, Core i7 92) CUDDBLAS (Quadruple, Tesla C16) CUDDBLAS (Quadruple, Tesla C25) 9 4 GEMM 2 FMA FMA 4 GEMM FMA GEMM 1.3 FMA FMA FMA 4.6 PCIe GPU BLAS CPU GPU BLAS CPU GPU PCIe PCIe 8GB/s GPU Tesla C16 12GB/s Tesla C25 144GB/sGPU PCIe PCIe PCIe BLAS 11 AXPY Tesla C % PCIe % % Tesla C25 Tesla C PCIe PCIe 12 GEMV 13 GEMM Tesla C16 3.5% Tesla C25 7.6% PCIe PCIe % PCIe 4 8 PCIe Byte/Flop GPU 5. DD 4 QD 8 BLAS GPU VIDIA Tesla C25 Intel Core i GEMM 29 8 GEMM c 211 Information Processing Society of Japan

8 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 1 AXPY (=8,192,) 1 GEMV (=8,192) 8 8 % of total 6 4 % of total Octuple (C16) Quadruple (C16) Double (C16) Octuple (C25) Quadruple (C25) Double (C25) Octuple (C16) Quadruple (C16) Double (C16) Octuple (C25) Quadruple (C25) Double (C25) Computation Computation PCIe Data Transfer PCIe Data Transfer 11 PCIe AXPY 12 PCIe GEMV CPU 4 GEMM 84 8 GEMM 116 Tesla C GPU GPU DD QD FMA PCI-Express PCIe GEMM PCIe GPU BLAS GPU 4 8 BLAS CPU AXPY DOT SpMV Byte/Flop Tesla C25 4 AXPY ) Granlund, T.: GMP: GU Multiple Precision Arithmetic Library, 2) Fousse, L., Hanrot, G., Lefevre, V., Pelissier, % of total Double (C16) GEMM (=4,96) Quadruple (C16) Octuple (C16) Computation PCIe Data Transfer Double (C25) Quadruple (C25) Octuple (C25) PCIe GEMM P. and Zimmermann, P.: MPFR : GU MPFR Library, 3) Bailey, D. H.: ARPREC (C++/Fortran-9 arbitrary precision package), dhbailey/mpdist/. 4) Bailey, D. H.: QD (C++ / Fortran-9 double double and quad-double package), dhbailey/mpdist/. 5) Li, X. S., Demmel, J. W., Bailey, D. H., Hida, Y., Iskandar, J., Kapur, A., Martin, M. C., Thompson, B., Tung, T. and Yoo, D. J.: 155 c 211 Information Processing Society of Japan

9 211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 XBLAS Extra Precise Basic Linear Algebra Subroutines. 6) : The MPACK; Multiple precision arithmetic BLAS (MBLAS) and LAPACK (MLAPACK), 7) Göddeke, D., Strzodka, R. and Turek, S.: Performance and accuracy of hardware-oriented native-, emulated- and mixed-precision solvers in FEM simulations, International Journal of Parallel, Emergent and Distributed Systems 22 (27). 8) Thall, A.: Extended-Precision Floating-Point umbers for GPU Computation, ACM SIG- GRAPH 26 Research Posters (26). 9),,, :,, Vol. 29 HPC 121, o. 39 (29). 1) Zhao, K. and Chu, X.: GPUMP: a Multiple- Precision Integer Library for GPUs, Proc. IEEE International Conference on Computer and Information Technology (CIT 21) (21). 11) Lu, M., He, B. and Luo, Q.: Supporting Extended Precision on Graphics Processors, Proc. Sixth International Workshop on Data Management on ew Hardware (DaMo 21) (21). 12) Hida, Y., Li, X. S. and Bailey, D. H.: Algorithms for Quad-Double Precision Floating Point Arithmetic, Proc. 15th Symposium on Computer Arithmetic, pp (21). 13), : GPU 4 BLAS,, Vol. 29 HPC 122, o. 13 (29). 156 c 211 Information Processing Society of Japan

倍々精度RgemmのnVidia C2050上への実装と応用

倍々精度RgemmのnVidia C2050上への実装と応用 .. maho@riken.jp http://accc.riken.jp/maho/,,, 2011/2/16 1 - : GPU : SDPA-DD 10 1 - Rgemm : 4 (32 ) nvidia C2050, GPU CPU 150, 24GFlops 25 20 GFLOPS 15 10 QuadAdd Cray, QuadMul Sloppy Kernel QuadAdd Cray,

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

untitled

untitled A = QΛQ T A n n Λ Q A = XΛX 1 A n n Λ X GPGPU A 3 T Q T AQ = T (Q: ) T u i = λ i u i T {λ i } {u i } QR MR 3 v i = Q u i A {v i } A n = 9000 Quad Core Xeon 2 LAPACK (4/3) n 3 O(n 2 ) O(n 3 ) A {v i }

More information

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N GPU 1 1 2 1, 3 2, 3 (Graphics Unit: GPU) GPU GPU GPU Evaluation of GPU Computing Based on An Automatic Program Generation Technology Makoto Sugawara, 1 Katsuto Sato, 1 Kazuhiko Komatsu, 2 Hiroyuki Takizawa

More information

MBLAS¤ÈMLAPACK; ¿ÇÜĹÀºÅÙÈǤÎBLAS/LAPACK¤ÎºîÀ®

MBLAS¤ÈMLAPACK; ¿ÇÜĹÀºÅÙÈǤÎBLAS/LAPACK¤ÎºîÀ® MBLAS MLAPACK; BLAS/LAPACK maho@riken.jp February 23, 2009 MPACK(MBLAS/MLAPACK) ( ) (2007 ) ( ) http://accc.riken.jp/maho/ BLAS/LAPACK http://mplapack.sourceforge.net/ BLAS (Basic Linear Algebra Subprograms)

More information

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU GPGPU (I) GPU GPGPU 1 GPU(Graphics Processing Unit) GPU GPGPU(General-Purpose computing on GPUs) GPU GPGPU GPU ( PC ) PC PC GPU PC PC GPU GPU 2008 TSUBAME NVIDIA GPU(Tesla S1070) TOP500 29 [1] 2009 AMD

More information

GPGPU

GPGPU GPGPU 2013 1008 2015 1 23 Abstract In recent years, with the advance of microscope technology, the alive cells have been able to observe. On the other hand, from the standpoint of image processing, the

More information

QD library! Feature! Easy to use high precision! Easy to understand the structure of arithmetic! 2 type high precision arithmetic! Double-Double precision (pseudo quadruple precision)! Quad-Double precision

More information

07-二村幸孝・出口大輔.indd

07-二村幸孝・出口大輔.indd GPU Graphics Processing Units HPC High Performance Computing GPU GPGPU General-Purpose computation on GPU CPU GPU GPU *1 Intel Quad-Core Xeon E5472 3.0 GHz 2 6 MB L2 cache 1600 MHz FSB 80 GFlops 1 nvidia

More information

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h 23 FPGA CUDA Performance Comparison of FPGA Array with CUDA on Poisson Equation (lijiang@sekine-lab.ei.tuat.ac.jp), (kazuki@sekine-lab.ei.tuat.ac.jp), (takahashi@sekine-lab.ei.tuat.ac.jp), (tamukoh@cc.tuat.ac.jp),

More information

main.dvi

main.dvi PC 1 1 [1][2] [3][4] ( ) GPU(Graphics Processing Unit) GPU PC GPU PC ( 2 GPU ) GPU Harris Corner Detector[5] CPU ( ) ( ) CPU GPU 2 3 GPU 4 5 6 7 1 toyohiro@isc.kyutech.ac.jp 45 2 ( ) CPU ( ) ( ) () 2.1

More information

tabaicho3mukunoki.pptx

tabaicho3mukunoki.pptx 1 2 はじめに n 目的 4倍精度演算より高速な3倍精度演算を実現する l 倍精度では足りないが4倍精度は必要ないケースに欲しい l 4倍精度に比べてデータサイズが小さい Ø 少なくともメモリ律速な計算では4倍精度よりデータ 転送時間を減らすことが可能 Ø PCIeやノード間通信がボトルネックとなりやすい GPUクラスタ環境に有効か n 研究概要 l DD型4倍精度演算 DD演算 に基づく3倍精度演算

More information

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1 SMYLE OpenCL 128 1 1 1 1 1 2 2 3 3 3 (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 128 SMYLEref SMYLE OpenCL SMYLE OpenCL Implementation and Evaluations on 128 Cores Takuji Hieda 1 Noriko Etani

More information

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla GPU CRS 1,a),b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla K0 CUDA5.0 cusparse CRS SpMV 00 1.86 177 1. SpMV SpMV CRS Compressed Row Storage *1 SpMV GPU GPU NVIDIA Kepler

More information

4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司

4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司 4 倍精度基本線形代数ルーチン群 QPBLAS の紹介 [index] 1. Introduction 2. Double-double algorithm 3. QPBLAS 4. QPBLAS-GPU 5. Summary 佐々成正 1, 山田進 1, 町田昌彦 1, 今村俊幸 2, 奥田洋司 3 1 1 日本原子力研究開発機構システム計算科学センター 2 理科学研究所計算科学研究機構 3 東京大学新領域創成科学研究科

More information

HPC pdf

HPC pdf GPU 1 1 2 2 1 1024 3 GPUGraphics Unit1024 3 GPU GPU GPU GPU 1024 3 Tesla S1070-400 1 GPU 2.6 Accelerating Out-of-core Cone Beam Reconstruction Using GPU Yusuke Okitsu, 1 Fumihiko Ino, 1 Taketo Kishi, 2

More information

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2 FFT 1 Fourier fast Fourier transform FFT FFT FFT 1 FFT FFT 2 Fourier 2.1 Fourier FFT Fourier discrete Fourier transform DFT DFT n 1 y k = j=0 x j ω jk n, 0 k n 1 (1) x j y k ω n = e 2πi/n i = 1 (1) n DFT

More information

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe

IPSJ SIG Technical Report Vol.2009-DPS-141 No.20 Vol.2009-GN-73 No.20 Vol.2009-EIP-46 No /11/27 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Spe 1. MIERUKEN 1 2 MIERUKEN MIERUKEN MIERUKEN: Speech Visualization System Based on Augmented Reality Yuichiro Nagano 1 and Takashi Yoshino 2 As the spread of the Augmented Reality(AR) technology and service,

More information

KBLAS[7] *1., CUBLAS.,,, Byte/flop., [13] 1 2. (AT). GPU AT,, GPU SYMV., SYMV CUDABLAS., (double, float) (cu- FloatComplex, cudoublecomplex).,, DD(dou

KBLAS[7] *1., CUBLAS.,,, Byte/flop., [13] 1 2. (AT). GPU AT,, GPU SYMV., SYMV CUDABLAS., (double, float) (cu- FloatComplex, cudoublecomplex).,, DD(dou Vol.214-HPC-146 No.14 214/1/3 CUDA-xSYMV 1,3,a) 1 2,3 2,3 (SYMV)., (GEMV) 2.,, mutex., CUBLAS., 1 2,. (AT). 2, SYMV GPU., SSYMV( SYMV), GeForce GTXTitan Black 211GFLOPS( 62.8%)., ( ) (, ) DD(double-double),

More information

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came 3DCG 1,a) 2 2 2 2 3 On rigid body animation taking into account the 3D computer graphics camera viewpoint Abstract: In using computer graphics for making games or motion pictures, physics simulation is

More information

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Parallel Computer Ships1 Makoto OYA*, Hiroto MATSUBARA**, Kazuyoshi SAKURAI** and Yu KATO**

More information

http://na-inet.jp/ 4 @ 2015 1 19 ( ) MPFR/GMP BNCpack (cf., Vol, 21, pp.197-206, 2011) Runge-Kutta (cf. arxiv preprint arxiv:1306.2392, Vol.19, No.3, pp.313-328, 2009) Strassen (cf. JSIAM Letters, Vol.6,

More information

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc iphone 1 1 1 iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Processing Unit)., AR Realtime Natural Feature Tracking Library for iphone Makoto

More information

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat AUTOSAR 1 1, 2 2 2 AUTOSAR AUTOSAR 3 2 2 41% 29% An Extension of AUTOSAR Communication Layers for Multicore Systems Toshiyuki Ichiba, 1 Hiroaki Takada, 1, 2 Shinya Honda 2 and Ryo Kurachi 2 AUTOSAR, a

More information

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2 ! OpenCL [Open Computing Language] 言 [OpenCL C 言 ] CPU, GPU, Cell/B.E.,DSP 言 行行 [OpenCL Runtime] OpenCL C 言 API Khronos OpenCL Working Group AMD Broadcom Blizzard Apple ARM Codeplay Electronic Arts Freescale

More information

untitled

untitled AMD HPC GP-GPU Opteron HPC 2 1 AMD Opteron 85 FLOPS 10,480 TOP500 16 T2K 95 FLOPS 10,800 140 FLOPS 15,200 61 FLOPS 7,200 3 Barcelona 4 2 AMD Opteron CPU!! ( ) L1 5 2003 2004 2005 2006 2007 2008 2009 2010

More information

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 GPU 4 2010 8 28 1 GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1 Register & Shared Memory ( ) CPU CPU(Intel Core i7 965) GPU(Tesla

More information

GPU n Graphics Processing Unit CG CAD

GPU n Graphics Processing Unit CG CAD GPU 2016/06/27 第 20 回 GPU コンピューティング講習会 ( 東京工業大学 ) 1 GPU n Graphics Processing Unit CG CAD www.nvidia.co.jp www.autodesk.co.jp www.pixar.com GPU n GPU ü n NVIDIA CUDA ü NVIDIA GPU ü OS Linux, Windows, Mac

More information

6_27.dvi

6_27.dvi Vol. 49 No. 6 1932 1941 (June 2008) RFID 1 2 RFID RFID RFID 13.56 MHz RFID A Experimental Study for Measuring Human Activities in A Bathroom Using RFID Ryo Onishi 1 and Shigeyuki Hirai 2 A bathroom is

More information

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D 3DCG 1) ( ) 2) 2) 1) 2) Real-Time Line Drawing Using Image Processing and Deforming Process Together in 3DCG Takeshi Okuya 1) Katsuaki Tanaka 2) Shigekazu Sakai 2) 1) Department of Intermedia Art and Science,

More information

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055 1 1 1 2 DCRA 1. 1.1 1) 1 Tactile Interface with Air Jets for Floating Images Aya Higuchi, 1 Nomin, 1 Sandor Markon 1 and Satoshi Maekawa 2 The new optical device DCRA can display floating images in free

More information

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf 1,a) 2,b) 4,c) 3,d) 4,e) Web A Review Supporting System for Whiteboard Logging Movies Based on Notes Timeline Taniguchi Yoshihide 1,a) Horiguchi Satoshi 2,b) Inoue Akifumi 4,c) Igaki Hiroshi 3,d) Hoshi

More information

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU GPU MapReduce 1 1 1, 2, 3 MapReduce GPGPU GPU GPU MapReduce CPU GPU GPU CPU GPU CPU GPU Map K-Means CPU 2GPU CPU 1.02-1.93 Improving MapReduce Task Scheduling for CPU-GPU Heterogeneous Environments Koichi

More information

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1 TSUBAME 2.0 Linpack 1,,,, Intel NVIDIA GPU 2010 11 TSUBAME 2.0 Linpack 2CPU 3GPU 1400 Dual-Rail QDR InfiniBand TSUBAME 1.0 30 2.4PFlops TSUBAME 1.0 Linpack GPU 1.192PFlops PFlops Top500 4 Achievement of

More information

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) GNU MP BNCpack tkouya@cs.sist.ac.jp 2002 9 20 ( ) Linux Conference 2002 1 1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit) 10 2 2 3 4 5768:9:; = %? @BADCEGFH-I:JLKNMNOQP R )TSVU!" # %$ & " #

More information

Bulletin of JSSAC(2014) Vol. 20, No. 2, pp (Received 2013/11/27 Revised 2014/3/27 Accepted 2014/5/26) It is known that some of number puzzles ca

Bulletin of JSSAC(2014) Vol. 20, No. 2, pp (Received 2013/11/27 Revised 2014/3/27 Accepted 2014/5/26) It is known that some of number puzzles ca Bulletin of JSSAC(2014) Vol. 20, No. 2, pp. 3-22 (Received 2013/11/27 Revised 2014/3/27 Accepted 2014/5/26) It is known that some of number puzzles can be solved by using Gröbner bases. In this paper,

More information

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS RTOS OS Lightweight partitioning architecture for automotive systems Suzuki Takehito 1 Honda Shinya 1 Abstract: Partitioning using protection RTOS has high

More information

supercomputer2010.ppt

supercomputer2010.ppt nanri@cc.kyushu-u.ac.jp 1 !! : 11 12! : nanri@cc.kyushu-u.ac.jp! : Word 2 ! PC GPU) 1997 7 http://wiredvision.jp/news/200806/2008062322.html 3 !! (Cell, GPU )! 4 ! etc...! 5 !! etc. 6 !! 20km 40 km ) 340km

More information

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›»

rank ”«‘‚“™z‡Ì GPU ‡É‡æ‡éŁÀŠñ›» rank GPU ERATO 2011 11 1 1 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced parenthesis GPU rank 2 / 26 GPU rank/select wavelet tree balanced

More information

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装 2010 GPGPU 2010 9 29 MPI/Pthread (DDM) DDM CPU CPU CPU CPU FEM GPU FEM CPU Mult - NUMA Multprocessng Cell GPU Accelerator, GPU CPU Heterogeneous computng L3 cache L3 cache CPU CPU + GPU GPU L3 cache 4

More information

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website

The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website The copyright of this material is retained by the Information Processing Society of Japan (IPSJ). The material has been made available on the website by the author(s) under the agreement with the IPSJ.

More information

Microsoft PowerPoint - GPU_computing_2013_01.pptx

Microsoft PowerPoint - GPU_computing_2013_01.pptx GPU コンピューティン No.1 導入 東京工業大学 学術国際情報センター 青木尊之 1 GPU とは 2 GPGPU (General-purpose computing on graphics processing units) GPU を画像処理以外の一般的計算に使う GPU の魅力 高性能 : ハイエンド GPU はピーク 4 TFLOPS 超 手軽さ : 普通の PC にも装着できる 低価格

More information

10D16.dvi

10D16.dvi D IEEJ Transactions on Industry Applications Vol.136 No.10 pp.686 691 DOI: 10.1541/ieejias.136.686 NW Accelerating Techniques for Sequence Alignment based on an Extended NW Algorithm Jin Okaze, Non-member,

More information

HP Workstation 総合カタログ

HP Workstation 総合カタログ HP Workstation Z HP 6 Z HP HP Z840 Workstation P.9 HP Z640 Workstation & CPU P.10 HP Z440 Workstation P.11 17.3in WIDE HP ZBook 17 G2 Mobile Workstation P.15 15.6in WIDE HP ZBook 15 G2 Mobile Workstation

More information

HP cafe HP of A A B of C C Map on N th Floor coupon A cafe coupon B Poster A Poster A Poster B Poster B Case 1 Show HP of each company on a user scree

HP cafe HP of A A B of C C Map on N th Floor coupon A cafe coupon B Poster A Poster A Poster B Poster B Case 1 Show HP of each company on a user scree LAN 1 2 3 2 LAN WiFiTag WiFiTag LAN LAN 100% WiFi Tag An Improved Determination Method with Multiple Access Points for Relative Position Estimation Using Wireless LAN Abstract: We have proposed a WiFiTag

More information

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis

[2] 2. [3 5] 3D [6 8] Morishima [9] N n 24 24FPS k k = 1, 2,..., N i i = 1, 2,..., n Algorithm 1 N io user-specified number of inbetween omis 1,a) 2 2 2 1 2 3 24 Motion Frame Omission for Cartoon-like Effects Abstract: Limited animation is a hand-drawn animation style that holds each drawing for two or three successive frames to make up 24 frames

More information

スライド 1

スライド 1 GPU クラスタによる格子 QCD 計算 広大理尾崎裕介 石川健一 1.1 Introduction Graphic Processing Units 1 チップに数百個の演算器 多数の演算器による並列計算 ~TFLOPS ( 単精度 ) CPU 数十 GFLOPS バンド幅 ~100GB/s コストパフォーマンス ~$400 GPU の開発環境 NVIDIA CUDA http://www.nvidia.co.jp/object/cuda_home_new_jp.html

More information

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L 1,a) 1,b) 1/f β Generation Method of Animation from Pictures with Natural Flicker Abstract: Some methods to create animation automatically from one picture have been proposed. There is a method that gives

More information

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation 1 1 1 1 SPEC CPU 2000 EQUAKE 1.6 50 500 A Parallelizing Compiler Cooperative Multicore Architecture Simulator with Changeover Mechanism of Simulation Modes GAKUHO TAGUCHI 1 YOUICHI ABE 1 KEIJI KIMURA 1

More information

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP

1 2 4 5 9 10 12 3 6 11 13 14 0 8 7 15 Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT 8192 2 16384 5 MOP 10000 SFMOPT / / MOPT(Merge OPTimization) MOPT FMOPT(Fast MOPT) FMOPT SFMOPT(Subgrouping FMOPT) SFMOPT 2 8192 31 The Proposal and Evaluation of SFMOPT, a Task Mapping Method for 10000 Tasks Haruka Asano

More information

インテル(R) Visual Fortran Composer XE

インテル(R) Visual Fortran Composer XE Visual Fortran Composer XE 1. 2. 3. 4. 5. Visual Studio 6. Visual Studio 7. 8. Compaq Visual Fortran 9. Visual Studio 10. 2 https://registrationcenter.intel.com/regcenter/ w_fcompxe_all_jp_2013_sp1.1.139.exe

More information

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL

xx/xx Vol. Jxx A No. xx 1 Fig. 1 PAL(Panoramic Annular Lens) PAL(Panoramic Annular Lens) PAL (2) PAL PAL 2 PAL 3 2 PAL 1 PAL 3 PAL PAL 2. 1 PAL PAL On the Precision of 3D Measurement by Stereo PAL Images Hiroyuki HASE,HirofumiKAWAI,FrankEKPAR, Masaaki YONEDA,andJien KATO PAL 3 PAL Panoramic Annular Lens 1985 Greguss PAL 1 PAL PAL 2 3 2 PAL DP

More information

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0 AMBA 1 1 1 1 FabScalar FabScalar AMBA AMBA FutureBus Improvement of AMBA Bus Frame-work for Heterogeneos Multi-processor Seto Yusuke 1 Takahiro Sasaki 1 Kazuhiko Ohno 1 Toshio Kondo 1 Abstract: The demand

More information

iphone GPGPU GPU OpenCL Mac OS X Snow LeopardOpenCL iphone OpenCL OpenCL NVIDIA GPU CUDA GPU GPU GPU 15 GPU GPU CPU GPU iii OpenMP MPI CPU OpenCL CUDA OpenCL CPU OpenCL GPU NVIDIA Fermi GPU Fermi GPU GPU

More information

IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1

IPSJ SIG Technical Report An Evaluation Method for the Degree of Strain of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1 1 1 1 An Evaluation Method for the Degree of of an Action Scene Mao Kuroda, 1 Takeshi Takai 1 and Takashi Matsuyama 1 The purpose of our research is to investigate structure of an action scene scientifically.

More information

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard 64 81 Magic Bitboard Magic Bitboard Bonanza Proposal and Implementation of Magic Bitboards in Shogi Issei Yamamoto, Shogo Takeuchi,

More information

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日

TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 TSUBAME2.0 における GPU の 活用方法 東京工業大学学術国際情報センター丸山直也第 10 回 GPU コンピューティング講習会 2011 年 9 月 28 日 目次 1. TSUBAMEのGPU 環境 2. プログラム作成 3. プログラム実行 4. 性能解析 デバッグ サンプルコードは /work0/gsic/seminars/gpu- 2011-09- 28 からコピー可能です 1.

More information

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS 2 3 4 5 2. 2.1 3 1) GPS Global Positioning System Vol. 52 No. 1 257 268 (Jan. 2011) 1 2, 1 1 measurement. In this paper, a dynamic road map making system is proposed. The proposition system uses probe-cars which has an in-vehicle camera and a GPS receiver.

More information

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF a m

Vol.55 No (Jan. 2014) saccess 6 saccess 7 saccess 2. [3] p.33 * B (A) (B) (C) (D) (E) (F) *1 [3], [4] Web PDF   a m Vol.55 No.1 2 15 (Jan. 2014) 1,a) 2,3,b) 4,3,c) 3,d) 2013 3 18, 2013 10 9 saccess 1 1 saccess saccess Design and Implementation of an Online Tool for Database Education Hiroyuki Nagataki 1,a) Yoshiaki

More information

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC H.264 CABAC 1 1 1 1 1 2, CABAC(Context-based Adaptive Binary Arithmetic Coding) H.264, CABAC, A Parallelization Technology of H.264 CABAC For Real Time Encoder of Moving Picture YUSUKE YATABE 1 HIRONORI

More information

理研スーパーコンピュータ・システム

理研スーパーコンピュータ・システム 線形代数演算ライブラリ BLAS と LAPACK の基礎と実践 2 理化学研究所情報基盤センター 2013/5/30 13:00- 大阪大学基礎工学部 中田真秀 この授業の目的 対象者 - 研究用プログラムを高速化したい人 - LAPACK についてよく知らない人 この講習会の目的 - コンピュータの簡単な仕組みについて - 今後 どうやってプログラムを高速化するか - BLAS, LAPACK

More information

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325 社団法人人工知能学会 Japanese Society for Artificial Intelligence 人工知能学会研究会資料 JSAI Technical Report SIG-Challenge-B3 (5/5) RoboCup SSL Humanoid A Proposal and its Application of Color Voxel Server for RoboCup SSL

More information

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions

IPSJ SIG Technical Report Vol.2014-HCI-158 No /5/22 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions 1,a) 2 2 3,b) Development of visualization technique expressing rainfall changing conditions with a still picture Yuuki Hyougo 1,a) Hiroko Suzuki 2 Tadanobu Furukawa 2 Kazuo Misue 3,b) Abstract: In order

More information

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2 CHLAC 1 2 3 3,. (CHLAC), 1).,.,, CHLAC,.,. Suspicious Behavior Detection based on CHLAC Method Hideaki Imanishi, 1 Toyohiro Hayashi, 2 Shuichi Enokida 3 and Toshiaki Ejima 3 We have proposed a method for

More information

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl

IPSJ SIG Technical Report Vol.2009-HCI-134 No /7/17 1. RDB Wiki Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visibl 1. RDB Wiki 1 1 2 Wiki RDB SQL Wiki Wiki RDB Wiki RDB Wiki A Wiki System Enhanced by Visible RDB Operations Toshiya Okumura, 1 Minoru Terada 1 and Kazutaka Maruyama 2 Although Wiki systems can easily be

More information

ipod touch 1 2 Apple ipod touch ipod touch 3 ( ) ipod touch ( 1 ) Apple ( 2 ) Web 1),2) 3. ipod touch 1 2 ipod touch x y z i

ipod touch 1 2 Apple ipod touch ipod touch 3 ( ) ipod touch ( 1 ) Apple ( 2 ) Web 1),2) 3. ipod touch 1 2 ipod touch x y z i ipod touch 1 1 ipod touch. 1) 6 2) 3) A library for detecting movements of an ipod touch by 3D acceleration Akira Kotaki 1 and Mariko Sasakura 1 The aim of this study is to develop a library for detecting

More information

3_23.dvi

3_23.dvi Vol. 52 No. 3 1234 1244 (Mar. 2011) 1 1 mixi 1 Casual Scheduling Management and Shared System Using Avatar Takashi Yoshino 1 and Takayuki Yamano 1 Conventional scheduling management and shared systems

More information

2017 (413812)

2017 (413812) 2017 (413812) Deep Learning ( NN) 2012 Google ASIC(Application Specific Integrated Circuit: IC) 10 ASIC Deep Learning TPU(Tensor Processing Unit) NN 12 20 30 Abstract Multi-layered neural network(nn) has

More information

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU.....

1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU..... CPU GPU N Q07-065 2011 2 17 1 1 4 1.1........................................... 4 1.2.................................. 4 1.3................................... 4 2 5 2.1 GPU...........................................

More information

HPEハイパフォーマンスコンピューティング ソリューション

HPEハイパフォーマンスコンピューティング ソリューション HPE HPC / AI Page 2 No.1 * 24.8% No.1 * HPE HPC / AI HPC AI SGIHPE HPC / AI GPU TOP500 50th edition Nov. 2017 HPE No.1 124 www.top500.org HPE HPC / AI TSUBAME 3.0 2017 7 AI TSUBAME 3.0 HPE SGI 8600 System

More information

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY SIMD 2(SSE2) SAXPY/DAXPY 2.0 2000 7 : 248600J-001 01/12/06 1 305-8603 115 Fax: 0120-47-8832 * Copyright Intel Corporation 1999, 2000 01/12/06 2 1...5 2 SAXPY DAXPY...5 2.1 SAXPY DAXPY...6 2.1.1 SIMD C++...6

More information

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted DEGIMA LINPACK Energy Performance for LINPACK Benchmark on DEGIMA 1 AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK 1.4698 GFlops/Watt 1.9658 GFlops/Watt Abstract GPU Computing has

More information

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments 計算機アーキテクチャ第 11 回 マルチプロセッサ 本資料は授業用です 無断で転載することを禁じます 名古屋大学 大学院情報科学研究科 准教授加藤真平 デスクトップ ジョブレベル並列性 スーパーコンピュータ 並列処理プログラム プログラムの並列化 for (i = 0; i < N; i++) { x[i] = a[i] + b[i]; } プログラムの並列化 x[0] = a[0] + b[0];

More information

PeerPool IP NAT IP UPnP 2) Bonjour 3) PeerPool CPU 4) 2 UPnP Bonjour PeerPool CPU PeerPool PeerPool PPv2 PPv2 2. PeerPool 2.1 PeerPool PeerPool PoolGW

PeerPool IP NAT IP UPnP 2) Bonjour 3) PeerPool CPU 4) 2 UPnP Bonjour PeerPool CPU PeerPool PeerPool PPv2 PPv2 2. PeerPool 2.1 PeerPool PeerPool PoolGW PPv2: 1 2 3 4 PeerPool PeerPool 3 PPv2(PeerPool version 2) PPv2: A Transparent Network Architecture for Naive Inter-Smart Environment Communication Michinobu Shimatani, 4 Yu Enokibori, 2 ismail Arai 3

More information

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter ,a),2,3 3,4 CG 2 2 2 An Interpolation Method of Different Flow Fields using Polar Interpolation Syuhei Sato,a) Yoshinori Dobashi,2,3 Tsuyoshi Yamamoto Tomoyuki Nishita 3,4 Abstract: Recently, realistic

More information

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 PC PC PC PC PC Key Words:Grid, PC Cluster, Distributed

More information

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速

DO 時間積分 START 反変速度の計算 contravariant_velocity 移流項の計算 advection_adams_bashforth_2nd DO implicit loop( 陰解法 ) 速度勾配, 温度勾配の計算 gradient_cell_center_surface 速 1 1, 2 1, 2 3 2, 3 4 GP LES ASUCA LES NVIDIA CUDA LES 1. Graphics Processing Unit GP General-Purpose SIMT Single Instruction Multiple Threads 1 2 3 4 1),2) LES Large Eddy Simulation 3) ASUCA 4) LES LES

More information

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro TV 1,2,a) 1 2 2015 1 26, 2015 5 21 Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Rotation Using Mobile Device Hiroyuki Kawakita 1,2,a) Toshio Nakagawa 1 Makoto Sato

More information

IPSJ SIG Technical Report NetMAS NetMAS NetMAS One-dimensional Pedestrian Model for Fast Evacuation Simulator Shunsuke Soeda, 1 Tomohisa Yam

IPSJ SIG Technical Report NetMAS NetMAS NetMAS One-dimensional Pedestrian Model for Fast Evacuation Simulator Shunsuke Soeda, 1 Tomohisa Yam 1 1 1 1 1 NetMAS NetMAS NetMAS One-dimensional Model for Fast Evacuation Simulator Shunsuke Soeda, 1 Tomohisa Yamashita, 1 Masaki Onishi, 1 Ikushi Yoda 1 and Itsuki Noda 1 We propose the one-dimentional

More information

2reN-A14.dvi

2reN-A14.dvi 340 30 1 SP2-N 2015 Onomatoperori : Ranking Cooking Recipes by using Onomatopoeias which Express their Tastes and Textures Chiemi Watanabe Satoshi Nakamura Graduate School of Systems and Information Engineering,

More information

<95DB8C9288E397C389C88A E696E6462>

<95DB8C9288E397C389C88A E696E6462> 2011 Vol.60 No.2 p.138 147 Performance of the Japanese long-term care benefit: An International comparison based on OECD health data Mie MORIKAWA[1] Takako TSUTSUI[2] [1]National Institute of Public Health,

More information

チューニング講習会 初級編

チューニング講習会 初級編 GPU のしくみ RICC での使い方 およびベンチマーク 理化学研究所情報基盤センター 2013/6/27 17:00 17:30 中田真秀 RICC の GPU が高速に! ( 旧 C1060 比約 6.6 倍高速 ) RICCのGPUがC2075になりました! C1060 比 6.6 倍高速 倍精度 515GFlops UPCに100 枚導入 : 合計 51.5TFlops うまく行くと5 倍程度高速化

More information

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L

Vol. 48 No. 4 Apr LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for L Vol. 48 No. 4 Apr. 2007 LAN TCP/IP LAN TCP/IP 1 PC TCP/IP 1 PC User-mode Linux 12 Development of a System to Visualize Computer Network Behavior for Learning to Associate LAN Construction Skills with TCP/IP

More information

IPSJ SIG Technical Report Vol.2015-ARC-215 No.7 Vol.2015-OS-133 No /5/26 Just-In-Time PG 1,a) 1, Just-In-Time VM Geyser Dalvik VM Caffei

IPSJ SIG Technical Report Vol.2015-ARC-215 No.7 Vol.2015-OS-133 No /5/26 Just-In-Time PG 1,a) 1, Just-In-Time VM Geyser Dalvik VM Caffei Just-In-Time PG 1,a) 1, 1 2 1 1 Just-In-Time VM Geyser Dalvik VM CaffeineMark SPECJVM 17% 1. LSI [1][2][3][4][5] (PG) Geyser [6][7] PG ON/OFF OS PG PG [7][8][9][10] Java Just-In-Time (JIT PG [10] JIT 1

More information

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP

[4] ACP (Advanced Communication Primitives) [1] ACP ACP [2] ACP Tofu UDP [3] HPC InfiniBand InfiniBand ACP 2 ACP, 3 InfiniBand ACP 4 5 ACP 2. ACP ACP InfiniBand ACP 1,5,a) 1,5,b) 2,5 1,5 4,5 3,5 2,5 ACE (Advanced Communication for Exa) ACP (Advanced Communication Primitives) HPC InfiniBand ACP InfiniBand ACP ACP InfiniBand Open MPI 20% InfiniBand Implementation

More information

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4]

Q [4] 2. [3] [5] ϵ- Q Q CO CO [4] Q Q [1] i = X ln n i + C (1) n i i n n i i i n i = n X i i C exploration exploitation [4] Q Q Q ϵ 1 ϵ 3. [3] [5] [4] 1,a) 2,3,b) Q ϵ- 3 4 Q greedy 3 ϵ- 4 ϵ- Comparation of Methods for Choosing Actions in Werewolf Game Agents Tianhe Wang 1,a) Tomoyuki Kaneko 2,3,b) Abstract: Werewolf, also known as Mafia, is a kind of

More information

1_26.dvi

1_26.dvi C3PV 1,a) 2,b) 2,c) 3,d) 1,e) 2012 4 20, 2012 10 10 C3PV C3PV C3PV 1 Java C3PV 45 38 84% Programming Process Visualization for Supporting Students in Programming Exercise Hiroshi Igaki 1,a) Shun Saito

More information

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member (University of Tsukuba), Yasuharu Ohsawa, Member (Kobe

More information

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf Gfarm/Pwrake NICT 1 1 1 1 2 2 3 4 5 5 5 6 NICT 10TB 100TB CPU I/O HPC I/O NICT Gfarm Gfarm Pwrake A Parallel Processing Technique on the NICT Science Cloud via Gfarm/Pwrake KEN T. MURATA 1 HIDENOBU WATANABE

More information

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St

IPSJ SIG Technical Report Vol.2011-MUS-91 No /7/ , 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical St 1 2 1, 3 1 Design and Implementation on a System for Learning Songs by Presenting Musical Structures based on Phrase Similarity Yuma Ito, 1 Yoshinari Takegawa, 2 Tsutomu Terada 1, 3 and Masahiko Tsukamoto

More information

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU 1 2 2 1, 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KUNIAKI SUSEKI, 2 KENTARO NAGAHASHI 2 and KEN-ICHI OKADA 1, 3 When there are a lot of injured people at a large-scale

More information

indd

indd Windows Vista 2 Service pack 1 SP1 Windows Vista Windows Xp Windows Vista Windows Vista CPU Windows OS Windows Xp Windows Vista Windows 7 15 20 Windows Vista Windows Vista Windows Xp Windows Vista Windows

More information

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of GPU 1,a) 1,b) 1,c) 1,d) GPU 1 GPU Structure Of Array Array Of Structure 1. MPS(Moving Particle Semi-Implicit) [1] SPH(Smoothed Particle Hydrodynamics) [] DEM(Distinct Element Method)[] [] 1 Tokyo Institute

More information

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076% 2013 (409812) FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT 6 1000 IPC FabCache 0.076% Abstract Single-ISA heterogeneous multi-core processors are increasing importance in the processor architecture.

More information

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing FINAL PROGRAM 22th Annual Workshop SWoPP 2009 2009 / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2009 8 4 ( ) 8 6 ( ) 981-0933 1-2-45 http://www.forestsendai.jp

More information

EGunGPU

EGunGPU Super Computing in Accelerator simulations - Electron Gun simulation using GPGPU - K. Ohmi, KEK-Accel Accelerator Physics seminar 2009.11.19 Super computers in KEK HITACHI SR11000 POWER5 16 24GB 16 134GFlops,

More information

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a 1, 1,a) 1, 2 1 1, 3 2 1 2011 6 17, 2011 12 16 Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a) Kazuki Kanamori 1, 2 Mie Nakatani 1 Hirokazu Kato 1, 3 Sanae H. Wake 2 Shogo Nishida

More information

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect

IPSJ SIG Technical Report Vol.2014-GN-90 No.16 Vol.2014-CDS-9 No.16 Vol.2014-DCC-6 No /1/24 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect 1,a) 2,b) 2,c) 1,d) QUMARION QUMARION Kinect Kinect Using a Human-Shaped Input Device for Remote Pose Instruction Yuki Tayama 1,a) Yoshiaki Ando 2,b) Misaki Hagino 2,c) Ken-ichi Okada 1,d) Abstract: There

More information

2 ( ) i

2 ( ) i 25 Study on Rating System in Multi-player Games with Imperfect Information 1165069 2014 2 28 2 ( ) i ii Abstract Study on Rating System in Multi-player Games with Imperfect Information Shigehiko MORITA

More information