: 19 10 31 FFTSS 3.0 Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, (CREST),,. http://www.ssisc.org/
Contents 1 4 2 (DFT) 4 3 4 3.1 UNIX............................................ 4 3.2 Microsoft Windows............................................... 5 3.2.1 Visual Studio.............................................. 5 3.2.2 C/C++..................................... 5 3.2.3 MinGW.............................................. 5 4 5 4.1 UNIX.............................................. 5 4.2 Visual Studio.................................................. 6 5 6 6 6 6.1................................................. 6 6.1.1 fftss malloc............................................... 6 6.1.2 fftss free................................................. 6 6.2................................................... 6 6.2.1 fftss plan dft 1d............................................ 6 6.2.2 fftss plan dft 2d............................................ 7 6.2.3 fftss plan dft 3d............................................ 7 6.2.4........................................ 7 6.3.................................................. 8 6.3.1 fftss execute............................................... 8 6.3.2 fftss execute dft............................................. 8 6.4.................................................. 8 6.4.1 fftss destroy plan............................................ 8 6.5..................................................... 8 6.5.1 fftss get wtime............................................. 8 6.6................................................. 9 6.6.1 fftss init threads............................................ 9 6.6.2 fftss plan with nthreads........................................ 9 6.6.3 fftss cleanup threads.......................................... 9 6.7 MPI....................................................... 9 6.7.1 pfftss plan dft 2d............................................ 9 6.7.2 pfftss execute.............................................. 9 6.7.3 pfftss execute dft............................................ 10 6.7.4 pfftss destroy plan........................................... 10 7 10
8 FFTW 11 8.1............................................ 11 8.2........................................... 11 9 List of FFT 12 10 13
1 FFTSS (FFT). (JST) (CREST).,. FFT.,., FFTW 3. FFTW. FFTW,. 2 (DFT) FFTSS n 1 DFT. Y k = Y k = n X j e 2πjk 1/n j=0 n X j e 2πjk 1/n. j=0 X, Y. FFTSS FFTW, 1/N. DFT 1 DFT. 3.,. 3.1 UNIX UNIX 3. 1. configure. 2. make. 3. make install. ( ) configure.
--without-simd --without-asm --with-bg --with-bg-compat --with-recommended --enable-openmp --enable-mpi SIMD.. IBM Blue Gene. ( ) Blue Gene FFT. CC CFLAGS. OpenMP. MPI. CC CFLAGS. $./configure CC=gcc CFLAGS= -O3 -msse2 3.2 Microsoft Windows Microsoft Windows 3. 3.2.1 Visual Studio win32 fftss.sln. Visual Studio.NET 2003. Visual Studio 2005. Visual Studio 2003.. Microsoft SDK ( Platform SDK). win32 pfftss.sln MPI. Microsoft Compute Cluster Pack SDK MPI. 3.2.2 C/C++ win32 C/C++.. win32.. icl-x86.bat IA-32. (32 ) icl-amd64 EM64T (Intel 64) AMD64. (64 ) icl-ia64 IA-64. (64bit) Visual Studio,,. 3.2.3 MinGW MinGW UNIX configure. 4 4.1 UNIX. CFLAGS -I/path/to/header/file
LDFLAGS -L/path/to/library/file -lfftss LDFLAGS -lpfftss (MPI ) /path/to/header/file, /path/to/library/file fftss.h, libfftss.a.. 4.2 Visual Studio Visual Studio, win32/release ( win32/debug) fftss.lib. MPI pfftss.lib. fftss.h pfftss.h include. FFTSS. 5 FFTSS. 2. FFT Stockham out-of-place transform. in-place transform out-of-place transform. 6 6.1 6.1.1 fftss malloc void *fftss malloc(long size); fftss malloc() size,. 16. 6.1.2 fftss free void fftss free(void *ptr); fftss free() ptr. fftss malloc(). 6.2 6.2.1 fftss plan dft 1d fftss plan fftss plan dft 1d(long n, double *in, double *out, long sign, long flags );
fftss plan dft 1d() n 1 FFT. i in[i*2], in[i*2+1]. 6.2.2 fftss plan dft 2d fftss plan fftss plan dft 2d(long nx, long ny, long py, double *in, double *out, long sign, long flags ); fftss plan dft 2d() nx ny 2 FFT. (x,y) in[x*2+y*py*2], in[x*2+y*py*2+1]. 6.2.3 fftss plan dft 3d fftss plan fftss plan dft 3d(long nx, long ny, long nz, long py, long pz, double *in, double *out, long sign, long flags ); fftss plan dft 3d() nx ny nz 3 FFT. (x,y,z) in[x*2+y*py*2+z*pz*2], in[x*2+y*py*2+z*pz*2+1]. 6.2.4 1, 2, 3 FFT. FFTSS VERBOSE.. FFT. FFTSS MEASURE FFT,. ( ) FFTSS ESTIMATE FFT. FFT,. FFTSS PATIENT FFTSS MEASURE. FFTSS EXHAUSTIVE FFTSS MEASURE. FFTSS NO SIMD SIMD ( SIMOMD). FFT. FFTSS UNALIGNED 16..
,. fftss execute dft(),., fftss malloc(). FFTSS DESTROY INPUT. ( ) FFTSS PRESERVE INPUT.. FFTSS INOUT, in. out. 6.3 6.3.1 fftss execute void fftss execute(fftss plan p ); fftss execute() p. 6.3.2 fftss execute dft void fftss execute dft(fftss plan p, double *in, double *out); fftss execute() p. in out p. 6.4 6.4.1 fftss destroy plan void fftss destroy plan(fftss plan p ); fftss destroy plan() p,. 6.5 6.5.1 fftss get wtime double fftss get wtime(void);
fftss get wtime(). 6.6 6.6.1 fftss init threads int fftss init threads(void); fftss init threads(). FFTW. 6.6.2 fftss plan with nthreads void fftss plan with nthreads(int nthreads); fftss plan with nthreads(). FFTSS OpenMP, omp set num threads() OpenMP. 6.6.3 fftss cleanup threads void fftss cleanup threads(void); fftss cleanup threads(). FFTW. 6.7 MPI FFTSS 3 pfftss MPI. 6.7.1 pfftss plan dft 2d pfftss plan pfftss plan dft 2d(long nx, long ny, long py, long oy, long ly, double *inout, long sign, long flags, MPI Comm comm); pfftss plan dft 2d() MPI nx ny 2 FFT. inout,. oy ly, py ly 2. ( 1 double inout[2*py*ly])., comm. nx. ly 0.. 6.7.2 pfftss execute void pfftss execute(pfftss plan p);
pfftss execute() p. 6.7.3 pfftss execute dft void pfftss execute dft(pfftss plan p, double *inout); pfftss execute dft() p. inout. 6.7.4 pfftss destroy plan void pfftss destroy plan(pfftss plan p); pfftss destroy plan() p. 7 FFTSS OpenMP., OpenMP, OpenMP. C/C++. -openmp, -xp SSE3. $./configure CC=icc CFLAGS= -O3 -openmp -xp OMP NUM THREADS., CPU 1. omp set num threads(). fftss plan with nthreads() FFTW, omp set num threads()...,. max_threads = omp_get_num_procs(); fftss_plan_with_nthreads(max_threads); plan = fftss_plan_dft_2d(nx, ny, py, vin, vout, FFTSS_FORWARD, FFTSS_MEASURE); { /*. */ } for (nthreads = 1; nthreads <= max_threads; nthreads ++) { fftss_plan_with_nthreads(nthreads); t = fftss_get_wtime(); fftss_execute(plan); t = fftss_get_wtime() - t; printf("%d %lf.\n", nthreads, t); }
MPI OpenMP. enable-mpi enable-openmp MPI OpenMP. 8 FFTW FFTW. FFTW FFTSS. FFTW fftw3.h, fftw3compat.h FFTSS. fftw3compat.h FFTW FFTSS, fftss.h. FFTW fftw3compat.h. MPI. 8.1 fftw malloc() fftw free() fftw plan dft 1d() fftw plan dft 2d() fftw plan dft 3d() fftw execute() fftw execute dft() fftw destroy plan() fftw init threads() fftw plan with nthreads() fftw cleanup threads() 8.2 FFTW MEASURE FFTW ESTIMATE FFTW PATIENT FFTW EXHAUSTIVE FFTW NO SIMD FFTW PRESERVE INPUT
FFTW DESTROY INPUT FFTW FORWARD FFTW BACKWARD 9 List of FFT FFT., FFTSS VERBOSE. normal. FMA FFT. SSE2 (1) Intel SSE2. SSE2 (2) Intel SSE2 (UNPCKHPD/UNPCKLPD). SSE3 Intel SSE2 (ADDSUBPD). SSE3 (H) Intel SSE2 (HADDPD/HSUBPD). C99 Complex C99. Blue Gene IBM Blue Gene. Blue Gene (PL) IBM Blue Gene ( ). Blue Gene asm IBM Blue Gene ( ). IA-64 asm Intel IA-64 ( ).
10 FFTSS. UltraSPARC III Sun Solaris 9 Sun ONE Studio 11 Itanium 2 Linux Intel C/C++ Compiler 9.1, gcc 4.0.1 PowerPC G5 Mac OS X 10.4 IBM XL C Compiler 6.0, gcc 4.0 POWER5 Linux IBM XL C Compiler 7.0, gcc 4.0.1 POWER4 AIX IBM XL C Compiler 6.0 PA-RISC HP-UX 11 Bundled C Compiler PPC440FP2 Blue Gene CNK IBM XL C Compiler 7.0/8.0 Opteron Linux gcc 3.3.3, gcc 4.0.1 Pentium 4 Solaris 9 (IA-32) Sun ONE Studio 11, gcc 4.0.1 Xeon Linux Intel C/C++ Compiler 8.1/9.0/9.1, gcc IA-32 Windows XP SP2 Visual Studio.NET 2003 IA-32 Windows XP SP2 Visual Studio 2005 IA-32 Windows XP SP2 Intel C/C++ Compiler 9.1 x64 Windows XP, 2003 Visual Studio.NET 2003 x64 Windows XP, 2003 Intel C/C++ Compiler for EM64T 9.1