FFTSS Library Version 3.0 User's Guide

Similar documents
DPD Software Development Products Overview

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

インテル(R) Visual Fortran Composer XE

Microsoft Word - w_mkl_build_howto.doc

Contents Windows* /Linux* C++/Fortran... 3 Microsoft* embedded Visual C++* C Microsoft* Windows* CE.NET Platform Builder C IP

64bit SSE2 SSE2 FPU Visual C++ 64bit Inline Assembler 4 FPU SSE2 4.1 FPU Control Word FPU 16bit R R R IC RC(2) PC(2) R R PM UM OM ZM DM IM R: reserved

1 (bit ) ( ) PC WS CPU IEEE754 standard ( 24bit) ( 53bit)

インテル® VTune™ パフォーマンス・アナライザー 9.1 Windows* 版


07-二村幸孝・出口大輔.indd

2. OpenMP OpenMP OpenMP OpenMP #pragma#pragma omp #pragma omp parallel #pragma omp single #pragma omp master #pragma omp for #pragma omp critica

HP High Performance Computing(HPC)

Intel® Compilers Professional Editions

LP-M720F

v10 IA-32 64¹ IA-64²

WinHPC ppt

main.dvi

橡Webcamユーザーガイド03.PDF

ACDSee-Press-Release_0524

インテル(R) C++ Composer XE 2011 Windows版 入門ガイド


P3PC

Oracle Policy Automation 10.0システム要件

KLCシリーズ インストール/セットアップ・ガイド

Copyright 2011, Oracle and/or its affiliates. All rights reserved. U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integra


▼ RealSecure Desktop Protector 7

I I / 47

( CUDA CUDA CUDA CUDA ( NVIDIA CUDA I

01_OpenMP_osx.indd


Intel_ParallelStudioXE2013_ClusterStudioXE2013_Introduction.pptx

C

EPSON EasyMP Multi PC Projection Ver.1.00 Operation Guide

スパコンに通じる並列プログラミングの基礎

Untitled

konicaminolta.co.jp PageScope Net Care

SAS Campaign Management 5.4 システム必要条件

Ver ceil floor FunctionGenerator (PTZCameraSony)

インテル(R) Visual Fortran Composer XE 2013 Windows版 入門ガイド

HP High Performance Computing(HPC)

untitled

Windows XP Windows Me Windows 98 Second Edition Windows /... 25

THE PARALLEL Issue UNIVERSE James Reinders Parallel Building Blocks: David Sekowski Parallel Studio XE Cluster Studio Sanjay Goil John McHug

DTF Connectivity Informatio J9.01

法政大学理工学部創生科学科 小林一行研究室 YP-Spur をMATLAB で使うには? YP-Spur は,Linux ベースで開発されているようであるが,Windows でも使えるようなので, ここでは,Windows 版のMATLAB から使う方法を紹介する.YP-Spu

2015/4/13 10: C C C C John C. Hull,, Steven E. Shreve, (1), Peter E. Kloeden, Eckhard Platen Num

OpenCV Windows(cygwin) Linux USB PC [1] Inel OpenCV OpenCV 1 Windows Linux OpenCV (a) (b)2 (c) (d) 1: OpenCV 1

EPSON EasyMP Multi PC Projection Ver.1.10 Operation Guide

スパコンに通じる並列プログラミングの基礎

HP Workstation Xeon 5600

TLS _final

ストリーミング SIMD 拡張命令2 (SSE2) を使用した SAXPY/DAXPY

2nd-1.dvi

スパコンに通じる並列プログラミングの基礎

Office BCP () Office Microsoft Exchange Exchange Server Exchange Online Exchange Server Exchange Online Exchange Exchange 1997 Exc

Agenda Windows 64-bit 概要マネージコード開発 ( 導入 ) マネージコード開発 ( 詳細 ) コーディング上の留意点まとめ

EPSON EasyMP Multi PC Projection Ver.1.11 Operation Guide

GT-X830


! 行行 CPUDSP PPESPECell/B.E. CPUGPU 行行 SIMD [SSE, AltiVec] 用 HPC CPUDSP PPESPE (Cell/B.E.) SPE CPUGPU GPU CPU DSP DSP PPE SPE SPE CPU DSP SPE 2

CSV ToDo ToDo


対応OS一覧表

Transcription:

: 19 10 31 FFTSS 3.0 Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, (CREST),,. http://www.ssisc.org/

Contents 1 4 2 (DFT) 4 3 4 3.1 UNIX............................................ 4 3.2 Microsoft Windows............................................... 5 3.2.1 Visual Studio.............................................. 5 3.2.2 C/C++..................................... 5 3.2.3 MinGW.............................................. 5 4 5 4.1 UNIX.............................................. 5 4.2 Visual Studio.................................................. 6 5 6 6 6 6.1................................................. 6 6.1.1 fftss malloc............................................... 6 6.1.2 fftss free................................................. 6 6.2................................................... 6 6.2.1 fftss plan dft 1d............................................ 6 6.2.2 fftss plan dft 2d............................................ 7 6.2.3 fftss plan dft 3d............................................ 7 6.2.4........................................ 7 6.3.................................................. 8 6.3.1 fftss execute............................................... 8 6.3.2 fftss execute dft............................................. 8 6.4.................................................. 8 6.4.1 fftss destroy plan............................................ 8 6.5..................................................... 8 6.5.1 fftss get wtime............................................. 8 6.6................................................. 9 6.6.1 fftss init threads............................................ 9 6.6.2 fftss plan with nthreads........................................ 9 6.6.3 fftss cleanup threads.......................................... 9 6.7 MPI....................................................... 9 6.7.1 pfftss plan dft 2d............................................ 9 6.7.2 pfftss execute.............................................. 9 6.7.3 pfftss execute dft............................................ 10 6.7.4 pfftss destroy plan........................................... 10 7 10

8 FFTW 11 8.1............................................ 11 8.2........................................... 11 9 List of FFT 12 10 13

1 FFTSS (FFT). (JST) (CREST).,. FFT.,., FFTW 3. FFTW. FFTW,. 2 (DFT) FFTSS n 1 DFT. Y k = Y k = n X j e 2πjk 1/n j=0 n X j e 2πjk 1/n. j=0 X, Y. FFTSS FFTW, 1/N. DFT 1 DFT. 3.,. 3.1 UNIX UNIX 3. 1. configure. 2. make. 3. make install. ( ) configure.

--without-simd --without-asm --with-bg --with-bg-compat --with-recommended --enable-openmp --enable-mpi SIMD.. IBM Blue Gene. ( ) Blue Gene FFT. CC CFLAGS. OpenMP. MPI. CC CFLAGS. $./configure CC=gcc CFLAGS= -O3 -msse2 3.2 Microsoft Windows Microsoft Windows 3. 3.2.1 Visual Studio win32 fftss.sln. Visual Studio.NET 2003. Visual Studio 2005. Visual Studio 2003.. Microsoft SDK ( Platform SDK). win32 pfftss.sln MPI. Microsoft Compute Cluster Pack SDK MPI. 3.2.2 C/C++ win32 C/C++.. win32.. icl-x86.bat IA-32. (32 ) icl-amd64 EM64T (Intel 64) AMD64. (64 ) icl-ia64 IA-64. (64bit) Visual Studio,,. 3.2.3 MinGW MinGW UNIX configure. 4 4.1 UNIX. CFLAGS -I/path/to/header/file

LDFLAGS -L/path/to/library/file -lfftss LDFLAGS -lpfftss (MPI ) /path/to/header/file, /path/to/library/file fftss.h, libfftss.a.. 4.2 Visual Studio Visual Studio, win32/release ( win32/debug) fftss.lib. MPI pfftss.lib. fftss.h pfftss.h include. FFTSS. 5 FFTSS. 2. FFT Stockham out-of-place transform. in-place transform out-of-place transform. 6 6.1 6.1.1 fftss malloc void *fftss malloc(long size); fftss malloc() size,. 16. 6.1.2 fftss free void fftss free(void *ptr); fftss free() ptr. fftss malloc(). 6.2 6.2.1 fftss plan dft 1d fftss plan fftss plan dft 1d(long n, double *in, double *out, long sign, long flags );

fftss plan dft 1d() n 1 FFT. i in[i*2], in[i*2+1]. 6.2.2 fftss plan dft 2d fftss plan fftss plan dft 2d(long nx, long ny, long py, double *in, double *out, long sign, long flags ); fftss plan dft 2d() nx ny 2 FFT. (x,y) in[x*2+y*py*2], in[x*2+y*py*2+1]. 6.2.3 fftss plan dft 3d fftss plan fftss plan dft 3d(long nx, long ny, long nz, long py, long pz, double *in, double *out, long sign, long flags ); fftss plan dft 3d() nx ny nz 3 FFT. (x,y,z) in[x*2+y*py*2+z*pz*2], in[x*2+y*py*2+z*pz*2+1]. 6.2.4 1, 2, 3 FFT. FFTSS VERBOSE.. FFT. FFTSS MEASURE FFT,. ( ) FFTSS ESTIMATE FFT. FFT,. FFTSS PATIENT FFTSS MEASURE. FFTSS EXHAUSTIVE FFTSS MEASURE. FFTSS NO SIMD SIMD ( SIMOMD). FFT. FFTSS UNALIGNED 16..

,. fftss execute dft(),., fftss malloc(). FFTSS DESTROY INPUT. ( ) FFTSS PRESERVE INPUT.. FFTSS INOUT, in. out. 6.3 6.3.1 fftss execute void fftss execute(fftss plan p ); fftss execute() p. 6.3.2 fftss execute dft void fftss execute dft(fftss plan p, double *in, double *out); fftss execute() p. in out p. 6.4 6.4.1 fftss destroy plan void fftss destroy plan(fftss plan p ); fftss destroy plan() p,. 6.5 6.5.1 fftss get wtime double fftss get wtime(void);

fftss get wtime(). 6.6 6.6.1 fftss init threads int fftss init threads(void); fftss init threads(). FFTW. 6.6.2 fftss plan with nthreads void fftss plan with nthreads(int nthreads); fftss plan with nthreads(). FFTSS OpenMP, omp set num threads() OpenMP. 6.6.3 fftss cleanup threads void fftss cleanup threads(void); fftss cleanup threads(). FFTW. 6.7 MPI FFTSS 3 pfftss MPI. 6.7.1 pfftss plan dft 2d pfftss plan pfftss plan dft 2d(long nx, long ny, long py, long oy, long ly, double *inout, long sign, long flags, MPI Comm comm); pfftss plan dft 2d() MPI nx ny 2 FFT. inout,. oy ly, py ly 2. ( 1 double inout[2*py*ly])., comm. nx. ly 0.. 6.7.2 pfftss execute void pfftss execute(pfftss plan p);

pfftss execute() p. 6.7.3 pfftss execute dft void pfftss execute dft(pfftss plan p, double *inout); pfftss execute dft() p. inout. 6.7.4 pfftss destroy plan void pfftss destroy plan(pfftss plan p); pfftss destroy plan() p. 7 FFTSS OpenMP., OpenMP, OpenMP. C/C++. -openmp, -xp SSE3. $./configure CC=icc CFLAGS= -O3 -openmp -xp OMP NUM THREADS., CPU 1. omp set num threads(). fftss plan with nthreads() FFTW, omp set num threads()...,. max_threads = omp_get_num_procs(); fftss_plan_with_nthreads(max_threads); plan = fftss_plan_dft_2d(nx, ny, py, vin, vout, FFTSS_FORWARD, FFTSS_MEASURE); { /*. */ } for (nthreads = 1; nthreads <= max_threads; nthreads ++) { fftss_plan_with_nthreads(nthreads); t = fftss_get_wtime(); fftss_execute(plan); t = fftss_get_wtime() - t; printf("%d %lf.\n", nthreads, t); }

MPI OpenMP. enable-mpi enable-openmp MPI OpenMP. 8 FFTW FFTW. FFTW FFTSS. FFTW fftw3.h, fftw3compat.h FFTSS. fftw3compat.h FFTW FFTSS, fftss.h. FFTW fftw3compat.h. MPI. 8.1 fftw malloc() fftw free() fftw plan dft 1d() fftw plan dft 2d() fftw plan dft 3d() fftw execute() fftw execute dft() fftw destroy plan() fftw init threads() fftw plan with nthreads() fftw cleanup threads() 8.2 FFTW MEASURE FFTW ESTIMATE FFTW PATIENT FFTW EXHAUSTIVE FFTW NO SIMD FFTW PRESERVE INPUT

FFTW DESTROY INPUT FFTW FORWARD FFTW BACKWARD 9 List of FFT FFT., FFTSS VERBOSE. normal. FMA FFT. SSE2 (1) Intel SSE2. SSE2 (2) Intel SSE2 (UNPCKHPD/UNPCKLPD). SSE3 Intel SSE2 (ADDSUBPD). SSE3 (H) Intel SSE2 (HADDPD/HSUBPD). C99 Complex C99. Blue Gene IBM Blue Gene. Blue Gene (PL) IBM Blue Gene ( ). Blue Gene asm IBM Blue Gene ( ). IA-64 asm Intel IA-64 ( ).

10 FFTSS. UltraSPARC III Sun Solaris 9 Sun ONE Studio 11 Itanium 2 Linux Intel C/C++ Compiler 9.1, gcc 4.0.1 PowerPC G5 Mac OS X 10.4 IBM XL C Compiler 6.0, gcc 4.0 POWER5 Linux IBM XL C Compiler 7.0, gcc 4.0.1 POWER4 AIX IBM XL C Compiler 6.0 PA-RISC HP-UX 11 Bundled C Compiler PPC440FP2 Blue Gene CNK IBM XL C Compiler 7.0/8.0 Opteron Linux gcc 3.3.3, gcc 4.0.1 Pentium 4 Solaris 9 (IA-32) Sun ONE Studio 11, gcc 4.0.1 Xeon Linux Intel C/C++ Compiler 8.1/9.0/9.1, gcc IA-32 Windows XP SP2 Visual Studio.NET 2003 IA-32 Windows XP SP2 Visual Studio 2005 IA-32 Windows XP SP2 Intel C/C++ Compiler 9.1 x64 Windows XP, 2003 Visual Studio.NET 2003 x64 Windows XP, 2003 Intel C/C++ Compiler for EM64T 9.1