on PS3 Linux Core 2 Quad (GHz) SMs 7 SPEs 1 OS 4 1 Hz 1 (GFLOPS) SM PPE SPE bit

Similar documents
07-二村幸孝・出口大輔.indd

1 GPU GPGPU GPU CPU 2 GPU 2007 NVIDIA GPGPU CUDA[3] GPGPU CUDA GPGPU CUDA GPGPU GPU GPU GPU Graphics Processing Unit LSI LSI CPU ( ) DRAM GPU LSI GPU

GPGPU

IPSJ SIG Technical Report Vol.2013-ARC-203 No /2/1 SMYLE OpenCL (NEDO) IT FPGA SMYLEref SMYLE OpenCL SMYLE OpenCL FPGA 1

1 OpenCL OpenCL 1 OpenCL GPU ( ) 1 OpenCL Compute Units Elements OpenCL OpenCL SPMD (Single-Program, Multiple-Data) SPMD OpenCL work-item work-group N

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

main.dvi

! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

HPC pdf

CPU Levels in the memory hierarchy Level 1 Level 2... Increasing distance from the CPU in access time Level n Size of the memory at each level 1: 2.2

GPU n Graphics Processing Unit CG CAD

GPU GPU CPU CPU CPU GPU GPU N N CPU ( ) 1 GPU CPU GPU 2D 3D CPU GPU GPU GPGPU GPGPU 2 nvidia GPU CUDA 3 GPU 3.1 GPU Core 1

untitled

Shonan Institute of Technology MEMOIRS OF SHONAN INSTITUTE OF TECHNOLOGY Vol. 41, No. 1, 2007 Ships1 * ** ** ** Development of a Small-Mid Range Paral

AMD/ATI Radeon HD 5870 GPU DEGIMA LINPACK HD 5870 GPU DEGIMA LINPACK GFlops/Watt GFlops/Watt Abstract GPU Computing has lately attracted

10D16.dvi

2). 3) 4) 1.2 NICTNICT DCRA Dihedral Corner Reflector micro-arraysdcra DCRA DCRA DCRA 3D DCRA PC USB PC PC ON / OFF Velleman K8055 K8055 K8055

IPSJ SIG Technical Report Vol.2017-ARC-225 No.12 Vol.2017-SLDM-179 No.12 Vol.2017-EMB-44 No /3/9 1 1 RTOS DefensiveZone DefensiveZone MPU RTOS

supercomputer2010.ppt

Cell/B.E. BlockLib

Microsoft PowerPoint - GPU_computing_2013_01.pptx

211 年ハイパフォーマンスコンピューティングと計算科学シンポジウム Computing Symposium 211 HPCS /1/18 a a 1 a 2 a 3 a a GPU Graphics Processing Unit GPU CPU GPU GPGPU G

untitled

GPUコンピューティング講習会パート1

HBase Phoenix API Mars GPU MapReduce GPU Hadoop Hadoop Hadoop MapReduce : (1) MapReduce (2)JobTracker 1 Hadoop CPU GPU Fig. 1 The overview of CPU-GPU

IPSJ SIG Technical Report iphone iphone,,., OpenGl ES 2.0 GLSL(OpenGL Shading Language), iphone GPGPU(General-Purpose Computing on Graphics Proc

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

1 3DCG [2] 3DCG CG 3DCG [3] 3DCG 3 3 API 2 3DCG 3 (1) Saito [4] (a) 1920x1080 (b) 1280x720 (c) 640x360 (d) 320x G-Buffer Decaudin[5] G-Buffer D

FINAL PROGRAM 22th Annual Workshop SWoPP / / 2009 Sendai Summer United Workshops on Parallel, Distributed, and Cooperative Processing

EGunGPU

FINAL PROGRAM 25th Annual Workshop SWoPP / / 2012 Tottori Summer United Workshops on Parallel, Distributed, and Cooperative Processing 2012

Core1 FabScalar VerilogHDL Cache Cache FabScalar 1 CoreConnect[2] Wishbone[3] AMBA[4] AMBA 1 AMBA ARM L2 AMBA2.0 AMBA2.0 FabScalar AHB APB AHB AMBA2.0

Chip Size and Performance Evaluations of Shared Cache for On-chip Multiprocessor Takahiro SASAKI, Tomohiro INOUE, Nobuhiko OMORI, Tetsuo HIRONAKA, Han

The 15th Game Programming Workshop 2010 Magic Bitboard Magic Bitboard Bitboard Magic Bitboard Bitboard Magic Bitboard Magic Bitboard Magic Bitbo

B

Vol. 23 No. 4 Oct Kitchen of the Future 1 Kitchen of the Future 1 1 Kitchen of the Future LCD [7], [8] (Kitchen of the Future ) WWW [7], [3

HP High Performance Computing(HPC)

Run-Based Trieから構成される 決定木の枝刈り法

Logistello 1) playout playout 1 5) SIMD Bitboard playout playout Bitboard Bitboard 8 8 = black white 2 2 Bitboard 2 1 6) position rev i

1 OpenCL Work-Item Private Memory Workgroup Local Memory Compute Device Global/Constant Memory Host Host Memory OpenCL CUDA CUDA Compute Unit MP Proce

2. Eades 1) Kamada-Kawai 7) Fruchterman 2) 6) ACE 8) HDE 9) Kruskal MDS 13) 11) Kruskal AGI Active Graph Interface 3) Kruskal 5) Kruskal 4) 3. Kruskal

IPSJ SIG Technical Report Vol.2016-ARC-221 No /8/9 GC 1 1 GC GC GC GC DalvikVM GC 12.4% 5.7% 1. Garbage Collection: GC GC Java GC GC GC GC Dalv

GPUコンピューティング講習会パート1

GPUを用いたN体計算

スライド 1


倍々精度RgemmのnVidia C2050上への実装と応用

Slides: TimeGraph: GPU Scheduling for Real-Time Multi-Tasking Environments

Haiku Generation Based on Motif Images Using Deep Learning Koki Yoneda 1 Soichiro Yokoyama 2 Tomohisa Yamashita 2 Hidenori Kawamura Scho

Cloud[2] (48 ) Xeon Phi (50+ ) IBM Cyclops[9] (64 ) Cavium Octeon II (32 ) Tilera Tile-GX (100 ) PE [11][7] 2 Nsim[10] 8080[1] SH-2[5] SH [8

IPSJ SIG Technical Report Vol.2010-GN-74 No /1/ , 3 Disaster Training Supporting System Based on Electronic Triage HIROAKI KOJIMA, 1 KU

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

16.16%

PC Development of Distributed PC Grid System,,,, Junji Umemoto, Hiroyuki Ebara, Katsumi Onishi, Hiroaki Morikawa, and Bunryu U PC WAN PC PC WAN PC 1 P

IPSJ SIG Technical Report Vol.2013-ARC-206 No /8/1 Android Dominic Hillenbrand ODROID-X2 GPIO Android OSCAR WFI 500[us] GPIO GP

AMD AMD AMD Opteron x86 OS 2P 8P x GHz 75W ACP OEM Q4 2.3GHz HE (55W) 2.8GHz SE (105W) AMD PC 2009 All rights reserved. AMD Japan, L

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE. [ ] I/O Abstr

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2013-HPC-139 No /5/29 Gfarm/Pwrake NICT NICT 10TB 100TB CPU I/O HPC I/O NICT Gf

マルチコアPCクラスタ環境におけるBDD法のハイブリッド並列実装

IPSJ SIG Technical Report NetMAS NetMAS NetMAS One-dimensional Pedestrian Model for Fast Evacuation Simulator Shunsuke Soeda, 1 Tomohisa Yam

DPA,, ShareLog 3) 4) 2.2 Strino Strino STRain-based user Interface with tacticle of elastic Natural ObjectsStrino 1 Strino ) PC Log-Log (2007 6)

組込みシステムシンポジウム2011 Embedded Systems Symposium 2011 ESS /10/20 FPGA Android Android Java FPGA Java FPGA Dalvik VM Intel Atom FPGA PCI Express DM

fiš„v8.dvi

単位、情報量、デジタルデータ、CPUと高速化 ~ICT用語集~

IPSJ SIG Technical Report Vol.2012-ARC-202 No.13 Vol.2012-HPC-137 No /12/13 Tightly Coupled Accelerators 1,a) 1,b) 1,c) 1,d) GPU HA-PACS

IPSJ SIG Technical Report Vol.2012-CG-148 No /8/29 3DCG 1,a) On rigid body animation taking into account the 3D computer graphics came

4.1 % 7.5 %

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

untitled

Iteration 0 Iteration 1 1 Iteration 2 Iteration 3 N N N! N 1 MOPT(Merge Optimization) 3) MOPT MOP

29 jjencode JavaScript

IPSJ SIG Technical Report Vol.2014-ARC-213 No.24 Vol.2014-HPC-147 No /12/10 GPU 1,a) 1,b) 1,c) 1,d) GPU GPU Structure Of Array Array Of

B 2 Thin Q=3 0 0 P= N ( )P Q = 2 3 ( )6 N N TSUB- Hub PCI-Express (PCIe) Gen 2 x8 AME1 5) 3 GPU Socket 0 High-performance Linpack 1

IEEE HDD RAID MPI MPU/CPU GPGPU GPU cm I m cm /g I I n/ cm 2 s X n/ cm s cm g/cm

Vol.214-HPC-145 No /7/3 C #pragma acc directive-name [clause [[,] clause] ] new-line structured block Fortran!$acc directive-name [clause [[,] c

大学における原価計算教育の現状と課題

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

1 4 4 [3] SNS 5 SNS , ,000 [2] c 2013 Information Processing Society of Japan

IPSJ SIG Technical Report Vol.2014-CG-155 No /6/28 1,a) 1,2,3 1 3,4 CG An Interpolation Method of Different Flow Fields using Polar Inter

IPSJ SIG Technical Report Vol.2013-HPC-138 No /2/21 GPU CRS 1,a) 2,b) SpMV GPU CRS SpMV GPU NVIDIA Kepler CUDA5.0 Fermi GPU Kepler Kepler Tesla

6 2. AUTOSAR 2.1 AUTOSAR AUTOSAR ECU OSEK/VDX 3) OSEK/VDX OS AUTOSAR AUTOSAR ECU AUTOSAR 1 AUTOSAR BSW (Basic Software) (Runtime Environment) Applicat

,,,,., C Java,,.,,.,., ,,.,, i

WebGL OpenGL GLSL Kageyama (Kobe Univ.) Visualization / 57

1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

大学論集第42号本文.indb

Lytro [11] The Franken Camera [12] 2.2 Creative Coding Community Creative Coding Community [13]-[19] Sketch Fork 2.3 [20]-[23] 3. ourcam 3.1 ou

1: A/B/C/D Fig. 1 Modeling Based on Difference in Agitation Method artisoc[7] A D 2017 Information Processing

GPU.....

HPC可視化_小野2.pptx

1 1 tf-idf tf-idf i

スライド 1

TCP/IP IEEE Bluetooth LAN TCP TCP BEC FEC M T M R M T 2. 2 [5] AODV [4]DSR [3] 1 MS 100m 5 /100m 2 MD 2 c 2009 Information Processing Society of

OpenGL GLSL References Kageyama (Kobe Univ.) Visualization / 58

untitled

1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

橡3_2石川.PDF

(a) (b) (c) Fig. 2 2 (a) ; (b) ; (c) (a)configuration of the proposed system; (b)processing flow of the system; (c)the system in use 1 GPGPU (

Vol.53 No (Mar. 2012) 1, 1,a) 1, 2 1 1, , Musical Interaction System Based on Stage Metaphor Seiko Myojin 1, 1,a

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

(a) 1 (b) 3. Gilbert Pernicka[2] Treibitz Schechner[3] Narasimhan [4] Kim [5] Nayar [6] [7][8][9] 2. X X X [10] [11] L L t L s L = L t + L s

Transcription:

vs. 1 1 1 GPU TFLOPS GPU GPU GPGPU GPGPU 1 SIMD MFLOPS HPC GPU FFTZIP HPC Challenge RandomAccess Levenshtein 6 vs. Ryōhei NISHIMURA, 1 Hidetsugu IRIE 1 and Kei HIRAKI 1 Recently, on the one hand, performance of a GPU has been higher than a TFLOPS, on the other hand, demand of GPUs of high performance for graphics has peak. Then, GPGPU that uses GPUs increasing in ability and possible range of processing for general-purpose computing has been the hot theme. On the other hand, there is the processor that accumulates many SIMD processors into one chip as what competes GPGPU. It also has performance of hundreds MFLOPS and wide memory bandwidth and has been paid attension to on a field of HPC. We compared of the GPU of the up-to-date architecture and the processor using the 6 applications: matrix multiplication, FFT, sorting, password-cracking of ZIP files, RandomAccess of HPC Challenge and calculation of the Levenshtein distance. As a result, it was shown that the performance of was superior except the part of the applications. 1. Moore 15) Graphics Processing Unit (GPU) 2008 6 1 TFLOPS GPU 1 GPU General Purpose GPU (GPGPU) 13) GPGPU 1 9 4 way SIMD 17) GPU TOP 500 2008 11 Roadrunner GPU GPGPU 2. 2.1 2008 6 1 NVIDIA GPU GPGPU CUDA 7) Streaming Multiprocessor (SM) 30 Streaming Processor (SP) 8 SP 1 1 Graduate School of Information Science and Technology, the University of Tokyo 1 c 2009 Information Processing Society of Japan

on PS3 Linux Core 2 Quad Q9400 @ (GHz) 1.296 3.2 2.66 30 SMs 7 SPEs 1 OS 4 1 Hz 1 (GFLOPS) 1 1 24 1 SM 933.12 8 8 153.6 PPE SPE 1 85.12 512 1 1 32 bit 16384 128 bit 128 128 bit 16 SSE (GiB/s) 141.696 25.6 12.8 (MiB) 1024 256 4096 1 16 256 32 (L1 ) (KiB) (W) 236 64 1 95 OS NVIDIA 185.18.08 Linux 2.6.27.21 Linux 2.6.27.21 CUDA 2.2 GCC 4.1.1 GCC 4.3.2 1 SM 2 1 SM 16 KiB Shared Memory Shared Memory Constant Memory Texture Memory SM 11) CUDA SM 4 CUDA 32 Warp SM 32 Half Warp 16 32 bit 64 bit 128 bit Half Warp Shared Memory Half Warp 32 bit Shared Memory Constant Memory Half Warp Texture Memory Warp 1 Fedora 10 GPU PCI Express 2.0 x16 8 GB/s 2.2 HPCPC PPE CPU SPE SIMD SPE 256 KiB Local Store SPE Local Store DMA SPE 128 bit SIMD 1 Local Store 2 Way Local Store 2 DMA SPE 7 SPE SPE PLAYSTATION 3 Fedora 10 IBM SDK 3.1 1 4 GHz 80 W 21) 2 c 2009 Information Processing Society of Japan

3. 3.1 TOP500 Linpack 2048 2048 4) Volkov 20) Volkov 64 16 1 Streaming Multiprocessor Volkov 64 64 64 DMA 16 KiB 3.2 FFT (FFT) 9) 2 19 1 FFT 1000 FFT Stockham 19) FFT FFT FFT Cooley-Tukey 4 2 4 FFT 1 2 3 FFT 1 2 7 FFT 2 2 6 FFT 1 Streaming Multiprocessor 2 4 FFT 128 2 3 FFT 256 FFT Shared Memory 64 KiB 3 0 1 FFT 2 6 FFT 2 7 FFT 2 2 2 7 FFT 64 8 16 FFT Local Store 64 16 8 FFT Local Store Local Store 2 6 FFT 64 2 64 FFT 1 2 PPE SPE 3 TLB 16 MiB 3.3 2 20 FFT O(N(log N) 2 ) 6) O(N log N) O(N 2 ) 7 1 Streaming Multiprocessor 128 256 512 3 GTX 280 8 SIMD 3 c 2009 Information Processing Society of Japan

SPE 64 KiB 2 3.4 ZIP ZIP ZIP 8 bit CRC ZIP 1 1 8 bit 5 ZIP 1 4 95 1 Streaming Multiprocessor 192 Shared Memory Constant Memory ZIP CRC Texture Memory 8 bit CRC 8 bit CRC CRC 3.5 RandomAccess HPC Challenge 1 N 2 N = 2 20 1 Streaming Processor 256 32 bit XOR XOR 2 SPE SPE DMA SPE 1 1 3.6 Levenshtein SACSIS 3) GPU Challenge 2) Challenge 1) 2 Levenshtein 12) CUDA Levenshtein 2 2 1 1 1 Levenshtein Levenshtein 2 1 1 GPU Challenge 9 128 128 1 Streaming Multiprocessor 64 64 1 SPE 8 bit 16 SIMD 4. 4.1 2 4 c 2009 Information Processing Society of Japan

void init(unsigned long long t[]) { int i; for (i = 0; i < N; i++) { void t[i] = i; update(unsigned long long t[]) { int i; unsigned lont long ran; for (i = 0; i < N * 4; i++) { int main() { ran = (ran << 1) ^ (((signed long long) ran < 0)? 7ULL : 0); t[ran & (N - 1)] ^= ran; unsigned long long t[n]; init(t); update(t); 1 RandomAccess 2 3 59.5 140 (46.8) 289 123 (GFLOPS) (367) 1.22 1.92 (GFLOPS/W) (1.56) () GPU 1.97 1.68 (0.483) 25.3 29.6 (GFLOPS) (103) 1.98 6.98 (GiB/s) (40.4) 107 462 (MFLOPS/W) (436) FFT () GPU 2 1 Streaming Processor 32 bit Streaming Multiprocessor 128 bit 1 1 1 4.2 FFT 3 GPU 4 10 20 47.5 10 20 FLOP 5 c 2009 Information Processing Society of Japan

4 7.20 24.1 (5.42) 15.3 4.57 G /s (20.3) 1.09 2.92 (GiB/s) (33.2) 64.8 71.4 (M /s/w) (86.0) () GPU 5 0.237 0.770 (Mword/s) 362 111 (Mword/s/W) 1.53 1.73 ZIP Local Store FFT 4.3 4 14 CPU 1 Local Store 4.4 ZIP 5 3 Texture Memory Texture Memory 4.5 RandomAccess 6 1 10 11.0 G /s 1 3 27.0 207 (GiB/s) 2.31 0.302 (MiB/s/W) 10.0 4.83 6 RandomAccess 477 218 (10 3 8.88 71.7 /s/w) 7 Levenshtein DMA 4.6 Levenshtein 7 8 bit 8 bit 32 bit GTX 280 8 bit 32 bit 4 6 c 2009 Information Processing Society of Japan

9 8 7 6 5 4 3 2 1 0 2 転送込み 転送抜き ワット性能 ( 転送込み ) ワット性能 ( 転送抜き ) 3 GPU Challenge GPU Challenge Challenge 5. GPGPU OpenCL 16) NVIDIA Intel AMD GPU IBM AMD GPU GPGPU Brook 8) Brook+ 10) GPGPU Scherl 18) 8800 GTX Agarwal 5) 8800 GTX SDK CUDA RapidMind SDK 14) 6. 3 6 Shared Memory Local Store Local Store SIMD SP 236 W GPGPU 7 c 2009 Information Processing Society of Japan

GPGPU Tesla 1 GPGPU GPU 1) Challenge 2009, 2009. http://www.hpcc.jp/sacsis/2009/cell/. 2) GPU Challenge 2009, 2009. http://www.hpcc.jp/sacsis/2009/gpu/. 3) SACSIS2009 -, 2009. http://www.hpcc.jp/ sacsis/2009/. 4) W. Abu-Sufah, D. J. Kuck, and D. H. Lawrie. Automatic program transformations for virtual memory computers. In Proceeding of the 1979 National Computer Conference, pp. 969 974, June 1979. 5) V.Agarwal, Lurng-Kuo Liu, and D.A. Bader. Financial modeling on the cell broadband engine. Parallel and Distributed Processing, 2008. IPDPS 2008. IEEE International Symposium on, pp. 1 12, April 2008. 6) K.E. Batcher. Sorting networks and their applications. Proceeding AFIPS Spring Joint Computer Conference, 1968. 7) I.Buck. Geforce 8800 & nvidia cuda: A new architecture for computing on the gpu. website of Supercomputing 06 Workshop General-Purpose GPU Computing: Practice And Experience, 2006. 8) Ian Buck, Tim Foley, Daniel Horn, Jeremy Sugerman, Kayvon Fatahalian, Mike Houston, and Pat Hanrahan. Brook for gpus: stream computing on graphics hardware. In SIGGRAPH 04: ACM SIGGRAPH 2004 Papers, pp. 777 786, New York, NY, USA, 2004. ACM. 9) JamesW. Cooley and JohnW. Tukey. An algorithm for the machine calculation of complex fourier series. Math. Comput. 19, pp. 297 301, 1965. 10) Advanced MicroDevices Inc. Brook+ sc07 bof session. Supercomputing 2007 Conference, November 2007. 11) James Laudon, Anoop Gupta, and Mark Horowitz. Interleaving: a multithreading technique targeting multiprocessors and workstations. SIGPLAN Not., Vol.29, No.11, pp. 308 318, 1994. 12) VladimirI. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Technical Report8, 1966. 13) David Luebke, Mark Harris, Jens Krüger, Tim Purcell, Naga Govindaraju, Ian Buck, Cliff Woolley, and Aaron Lefohn. Gpgpu: general purpose computation on graphics hardware. In SIGGRAPH 04: ACM SIGGRAPH 2004 Course Notes, p.33, New York, NY, USA, 2004. ACM. 14) MichaelD. McCool. Data-parallel programming on the cell be and the gpu using the rapidmind development platform. the GSPx Multicore Applications Conference, 2006. 15) G.E. Moore. Cramming more components onto integrated circuits. Proceedings of the IEEE, Vol.86, No.1, pp. 82 85, 1998. 16) Aaftab Munshi. Opencl. http://s08.idav.ucdavis.edu/munshi-opencl.pdf. 17) D. Pham, S. Asano, M. Bolliger, M. N. Day, H. P. Hofstee, C. Johns, J. Kahle, A.Kameyama, J.Keaty, Y.Masubuchi, M.Riley, D.Shippy, D.Stasiak, M.Suzuoki, M.Wang, J.Warnock, S.Weitzel, D.Wendel, T.Yamazaki, and K.Yazawa. The design and implementation of a first-generation cell processor. In Solid-State Circuits Conference, 2005. Digest of Technical Papers. ISSCC. 2005 IEEE International, pp. 184 592 Vol. 1, 2005. 18) H. Scherl, B. Keck, M. Kowarschik, and J. Hornegger. Fast gpu-based ct reconstruction using the common unified device architecture (cuda). Nuclear Science Symposium Conference Record, 2007. NSS 07. IEEE, Vol. 6, pp. 4464 4466, 26 2007-Nov. 3 2007. 19) D.Takahashi. High-performance parallel fft algorithms for the hitachi sr8000. High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International Conference/Exhibition on, Vol.1, pp. 192 199 vol.1, 2000. 20) Vasily Volkov. Homepage for vasily volkov. http://www.cs.berkeley.edu/volkov/. 21) D.Wang. Isscc 2005: The cell microprocessor. Real World Technologies, February 2005. http://www.realworldtech.com/page.cfm?articleid=rwt021005084318& p=2. 1 Tesla C1060 187.8 W 8 c 2009 Information Processing Society of Japan