インテル® スレッディング・ビルディング・ブロック・リファレンス・マニュアル

Similar documents
インテル® ソフトウェア開発製品

(STL) STL 1 (deta structure) (algorithm) (deta structure) 2 STL STL (Standard Template Library) 2.1 STL STL ( ) vector<int> x; for(int i = 0; i < 10;

cpp1.dvi

解きながら学ぶC++入門編

cpp2.dvi

新版明解C言語 実践編

1.3 ( ) ( ) C

ohp03.dvi

Condition DAQ condition condition 2 3 XML key value


3.1 stdio.h iostream List.2 using namespace std C printf ( ) %d %f %s %d C++ cout cout List.2 Hello World! cout << float a = 1.2f; int b = 3; cout <<

r03.dvi

連載講座 : 高生産並列言語を使いこなす (5) 分子動力学シミュレーション 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 問題の定義 17 2 逐次プログラム 分子 ( 粒子 ) セル 系の状態 ステップ 18


困ったときのQ&A

II

1 C STL(1) C C C libc C C C++ STL(Standard Template Library ) libc libc C++ C STL libc STL iostream Algorithm libc STL string vector l

( ) ( ) 30 ( ) 27 [1] p LIFO(last in first out, ) (push) (pup) 1

活用ガイド (ソフトウェア編)

IntelR Compilers Professional Editions

r07.dvi

活用ガイド (ハードウェア編)

ohp07.dvi

活用ガイド (ソフトウェア編)

LIFO(last in first out, ) 1 FIFO(first in first out, ) 2 2 PUSH POP : 1



エクセルカバー入稿用.indd

6-1

01_.g.r..

DPD Software Development Products Overview

untitled

untitled

design_pattern.key

SC-85X2取説


<4D F736F F F696E74202D C835B B E B8CDD8AB B83685D>

RX600 & RX200シリーズ アプリケーションノート RX用仮想EEPROM

program.dvi

cpp4.dvi


TopLink å SampleClient.java... 5 Ò readallsample() querysample() cachesample() Ç..

ii

/ SCHEDULE /06/07(Tue) / Basic of Programming /06/09(Thu) / Fundamental structures /06/14(Tue) / Memory Management /06/1

O(N) ( ) log 2 N

はしがき・目次・事例目次・凡例.indd

untitled

main.dvi

untitled

i

連載講座 : 高生産並列言語を使いこなす (4) ゲーム木探索の並列化 田浦健次朗 東京大学大学院情報理工学系研究科, 情報基盤センター 目次 1 準備 問題の定義 αβ 法 16 2 αβ 法の並列化 概要 Young Brothers Wa

01_OpenMP_osx.indd

ストリーミング SIMD 拡張命令2 (SSE2) を使用した、倍精度浮動小数点ベクトルの最大/最小要素とそのインデックスの検出

III


VB.NETコーディング標準

これわかWord2010_第1部_ indd

パワポカバー入稿用.indd

これでわかるAccess2010

Java演習(4) -- 変数と型 --

橡6.プログラム.doc

exec.dvi

untitled

i

2

2008chom.pdf

パソコン機能ガイド

パソコン機能ガイド

3 Powered by mod_perl, Apache & MySQL use Item; my $item = Item->new( id => 1, name => ' ', price => 1200,

4.1 % 7.5 %

アルゴリズムとデータ構造1

平成18年版 男女共同参画白書

解きながら学ぶJava入門編

Qt (Generic Containers) Java STL <QtAlgorithms> STL (Generic Algorithms) QList<T>, QLinkedList<T>, QVector<T>, QStack<T>, QQueue<T> QMap<Key,

困ったときのQ&A

Javaと.NET

Parametric Polymorphism

活用ガイド (ソフトウェア編)

2004年度日本経団連規制改革要望

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment


class IntCell { private int value ; int getvalue() {return value; private IntCell next; IntCell next() {return next; IntCell(int value) {this.value =

C++0x

結婚生活を強める

SystemC言語概論

~~~~~~~~~~~~~~~~~~ wait Call CPU time 1, latch: library cache 7, latch: library cache lock 4, job scheduler co

やさしいJavaプログラミング -Great Ideas for Java Programming サンプルPDF


新・明解C言語 実践編

,,,,., C Java,,.,,.,., ,,.,, i

ストラドプロシージャの呼び出し方

1 # include < stdio.h> 2 # include < string.h> 3 4 int main (){ 5 char str [222]; 6 scanf ("%s", str ); 7 int n= strlen ( str ); 8 for ( int i=n -2; i

:30 12:00 I. I VI II. III. IV. a d V. VI

RubyKaigi2009 COBOL

# let st1 = {name = "Taro Yamada"; id = };; val st1 : student = {name="taro Yamada"; id=123456} { 1 = 1 ;...; n = n } # let string_of_student {n

org/ghc/ Windows Linux RPM 3.2 GHCi GHC gcc javac ghc GHCi(ghci) GHCi Prelude> GHCi :load file :l file :also file :a file :reload :r :type expr :t exp

™…

ohp11.dvi

r11.dvi

Green with White Lines

Transcription:

2007 Intel Corporation. : 1.6 Web : http://www.intel.com

Intel's Terms and Conditions of Sale MPEG /ISO MPEG MPEG Intel Corporation IntelIntel Itanium Intel Corporation * 2007 Intel Corporation. 315415-001 1.5 Partitioner parallel_for parallel_reduce parallel_scan recycle_as_safe_continuation "" 2007 3 1 1.6 2007 4 11 ii

1...1 2...2 2.1...2 2.2...3 2.2.1...3 2.2.2...4 2.2.3 CopyConstructible...4 2.3...4 2.3.1...4 2.3.2...5 2.4...5 2.4.1 tbb...5 2.4.2 tbb::internal...5 2.5...5 2.6...6 2.6.1 TBB_DO_ASSERT...6 2.6.2 TBB_DO_THREADING_TOOLS...6 3...7 3.1...7 3.1.1 split...8 3.2 Range...8 3.2.1 blocked_range<value>...10 3.2.1.1 size_type...12 3.2.1.2 blocked_range( Value begin, Value end, size_t grainsize=1 )...13 3.2.1.3 blocked_range( blocked_range& range, split )...13 3.2.1.4 size_type size() const...14 3.2.1.5 bool empty() const...14 3.2.1.6 size_type grainsize() const...14 3.2.1.7 bool is_divisible() const...14 3.2.1.8 const_iterator begin() const...15 3.2.1.9 const_iterator end() const...15 iii

3.2.2 blocked_range2d...15 3.2.2.1 row_range_type...17 3.2.2.2 col_range_type...17 3.2.2.3 blocked_range2d<rowvalue,colvalue>( RowValue row_begin, RowValue row_end, typename row_range_type::size_type row_grainsize, ColValue col_begin, ColValue col_end, typename col_range_type::size_type col_grainsize )...18 3.2.2.4 blocked_range2d<rowvalue,colvalue> ( blocked_range2d& range, split )...18 3.2.2.5 bool empty() const...18 3.2.2.6 bool is_divisible() const...19 3.2.2.7 const row_range_type& rows() const...19 3.2.2.8 const col_range_type& cols() const...19 3.3 Partitioner...19 3.3.1 simple_partitioner...21 3.3.1.1 simple_partitioner()...21 3.3.1.2 simple_partitioner(simple_partitioner &partitioner, split )...21 3.3.1.3 template<typename Range> bool should_execute_range (const Range &r, const task &t)...21 3.3.2 auto_partitioner...22 3.3.2.1 auto_partitioner()...22 3.3.2.2 auto_partitioner(auto_partitioner &partitioner, split )...22 3.3.2.3 template<typename Range> bool should_execute_range (const Range &r, const task &t)...23 3.4 parallel_for<range,body>...23 3.4.1 Partitioner...27 3.5 parallel_reduce<range,body>...29 3.5.1 Partitioner...32 3.6 parallel_scan<range,body>...34 3.6.1 pre_scan_tag and final_scan_tag...36 3.6.1.1 bool is_final_scan()...37 3.6.2 Partitioner...37 3.7 parallel_while...39 3.7.1 parallel_while<body>()...40 3.7.2 ~parallel_while<body>()...40 3.7.3 <typename Stream> void run( Stream& stream, const Body& body )41 3.7.4 void add( const value_type& item )...41 3.8 pipeline...41 3.8.1 pipeline()...43 3.8.2 ~pipeline()...43 3.8.3 void add_filter( filter& f )...43 3.8.4 void run( size_t max_number_of_live_tokens )...43 3.8.5 void clear()...44 iv

3.8.6 filter...44 3.8.6.1 filter( bool is_serial )...45 3.8.6.2 ~filter()...45 3.8.6.3 bool is_serial() const...45 3.8.6.4 virtual void* operator()( void * item )...45 3.9 parallel_sort<randomaccessiterator, Compare>...46 4...48 4.1 concurrent_hash_map<key,t, HashCompare>...48 4.1.1...50 4.1.1.1 concurrent_hash_map()...50 4.1.1.2 concurrent_hash_map( const concurrent_hash_map& table )...50 4.1.1.3 ~concurrent_hash_map()...50 4.1.1.4 concurrent_hash_map& operator= ( concurrent_hash_map& source )51 4.1.1.5 void clear()...51 4.1.2...51 4.1.2.1 const_accessor...52 4.1.2.2 accessor...54 4.1.3...55 4.1.3.1 bool find( const_accessor& result, const Key& key ) const...55 4.1.3.2 bool find( accessor& result, const Key& key )...56 4.1.3.3 bool insert( const_accessor& result, const Key& key )...56 4.1.3.4 bool insert( accessor& result, const Key& key )...56 4.1.3.5 bool erase(const Key& key )...57 4.1.4...57 4.1.4.1 const_range_type range( size_t grainsize ) const...57 4.1.4.2 range_type range( size_t grainsize )...58 4.1.5...58 4.1.5.1 size_type size() const...58 4.1.5.2 bool empty() const...58 4.1.5.3 size_type max_size() const...58 4.1.6...58 4.1.6.1 iterator begin()...58 4.1.6.2 iterator end()...59 4.1.6.3 const_iterator begin() const...59 4.1.6.4 const_iterator end() const...59 4.2 concurrent_queue<t>...59 4.2.1 concurrent_queue()...61 4.2.2 ~concurrent_queue()...61 4.2.3 void push( const T& source )...61 4.2.4 void pop( T& destination )...62 4.2.5 bool pop_if_present( T& destination )...62 4.2.6 size_type size() const...62 4.2.7 bool empty() const...62 4.2.8 size_type capacity() const...62 4.2.9 void set_capacity( size_type capacity )...63 v

4.2.10...63 4.2.10.1 iterator begin()...63 4.2.10.2 iterator end()...64 4.2.10.3 const_iterator begin() const...64 4.2.10.4 const_iterator end() const...64 4.3 concurrent_vector...64 4.3.1...66 4.3.1.1 concurrent_vector()...66 4.3.1.2 concurrent_vector( const concurrent_vector& src )...66 4.3.1.3 concurrent_vector& operator=( const concurrent_vector& src )...66 4.3.1.4 ~concurrent_vector()...66 4.3.1.5 void clear()...66 4.3.2...67 4.3.2.1 size_type grow_by( size_type delta )...67 4.3.2.2 void grow_to_at_least( size_type n )...67 4.3.2.3 size_t push_back( const_reference value );...67 4.3.2.4 reference operator[]( size_type index )...67 4.3.2.5 const_reference operator[]( size_type index ) const;...68 4.3.3...68 4.3.3.1 range_type range( size_t grainsize )...68 4.3.3.2 const_range_type range( size_t grainsize ) const...68 4.3.4...69 4.3.4.1 size_type size() const...69 4.3.4.2 bool empty() const...69 4.3.4.3 size_type capacity() const...69 4.3.4.4 void reserve( size_type n )...69 4.3.4.5 size_type max_size() const...69 4.3.5...70 4.3.5.1 iterator begin()...70 4.3.5.2 iterator end()...70 4.3.5.3 const_iterator begin() const...70 4.3.5.4 const_iterator end() const...70 4.3.5.5 iterator rbegin()...70 4.3.5.6 iterator rend()...70 4.3.5.7 const_reverse_iterator rbegin() const...71 4.3.5.8 const_ reverse_iterator rend() const...71 5...72 5.1 Allocator...72 5.2 scalable_allocator<t>...73 5.2.1 C...74 5.3 cache_aligned_allocator<t>...75 5.3.1 pointer allocate( size_type n, void* hint=0 )...77 5.3.2 void deallocate( pointer p, size_type n )...77 5.3.3 char* _Charalloc( size_type size )...77 5.4 aligned_space...78 vi

5.4.1 aligned_space()...78 5.4.2 ~aligned_space()...79 5.4.3 T* begin()...79 5.4.4 T* end()...79 6...80 6.1 Mutex...80 6.1.1 Mutex...80 6.1.2 mutex...81 6.1.3 spin_mutex...82 6.1.4 queuing_mutex...83 6.1.5 ReaderWriterMutex...83 6.1.5.1 ReaderWriterMutex()...85 6.1.5.2 ~ReaderWriterMutex()...85 6.1.5.3 ReaderWriterMutex::scoped_lock()...85 6.1.5.4 ReaderWriterMutex::scoped_lock( ReaderWriterMutex& rw, bool write =true)...85 6.1.5.5 ReaderWriterMutex::~scoped_lock()...85 6.1.5.6 void ReaderWriterMutex:: scoped_lock:: acquire( ReaderWriterMutex& rw, bool write=true )...85 6.1.5.7 bool ReaderWriterMutex:: scoped_lock::try_acquire( ReaderWriterMutex& rw, bool write=true )86 6.1.5.8 void ReaderWriterMutex:: scoped_lock::release()...86 6.1.5.9 bool ReaderWriterMutex:: scoped_lock::upgrade_to_writer()...86 6.1.5.10 bool ReaderWriterMutex:: scoped_lock::downgrade_to_reader()...86 6.1.6 spin_rw_mutex...87 6.1.7 queuing_rw_mutex...87 6.2 atomic<t>...88 6.2.1 enum memory_semantics...90 6.2.2 value_type fetch_and_add( value_type addend )...90 6.2.3 value_type fetch_and_increment()...91 6.2.4 value_type fetch_and_decrement()...91 6.2.5 value_type compare_and_swap...91 6.2.6 value_type fetch_and_store( value_type new_value )...91 7...93 7.1 tick_count...93 7.1.1 static tick_count tick_count::now()...94 7.1.2 tick_count::interval_t operator ( const tick_count& t1, const tick_count& t0 )...94 7.1.3 tick_count::interval_t...94 7.1.3.1 interval_t()...95 7.1.3.2 double seconds() const...95 7.1.3.3 interval_t operator+=( const interval_t& i )...95 7.1.3.4 interval_t operator =( const interval_t& i )...96 7.1.3.5 interval_t operator+ ( const interval_t& i, const interval_t& j )...96 7.1.3.6 interval_t operator ( const interval_t& i, const interval_t& j )...96 vii

8...97 8.1...98 8.2 task_scheduler_init...99 8.2.1 task_scheduler_init( int number_of_threads=automatic )...101 8.2.2 ~task_scheduler_init()...101 8.2.3 void initialize( int number_of_threads=automatic )...102 8.2.4 void terminate()...102 8.2.5 OpenMP...102 8.3 task...103 8.3.1...106 8.3.1.1 execute()...106 8.3.2...107 8.3.2.1 new( task::allocate_root() ) T...107 8.3.2.2 new( this. allocate_continuation() ) T...108 8.3.2.3 new( this. allocate_child() ) T...108 8.3.2.4 new( this.task::allocate_additional_child_of( parent ))...109 8.3.3...110 8.3.3.1 void destroy( task& victim )...110 8.3.4...111 8.3.4.1 void recycle_as_continuation()...111 8.3.4.2 void recycle_as_safe_continuation()...112 8.3.4.3 void recycle_as_child_of( task& parent )...112 8.3.4.4 void recycle _to_reexecute()...113 8.3.5...113 8.3.5.1 depth_type...113 8.3.5.2 depth_type depth() const...113 8.3.5.3 void set_depth( depth_type new_depth )...113 8.3.5.4 void add_to_depth( int delta )...114 8.3.6...114 8.3.6.1 void set_ref_count( int count )...115 8.3.6.2 void wait_for_all()...115 8.3.6.3 void spawn( task& child )...116 8.3.6.4 void spawn ( task_list& list )...116 8.3.6.5 void spawn_and_wait_for_all( task& child )...117 8.3.6.6 void spawn_and_wait_for_all( task_list& list )...117 8.3.6.7 static void spawn_root_and_wait( task& root )...118 8.3.6.8 static void spawn_root_and_wait( task_list& root_list )...118 8.3.7...118 8.3.7.1 static task& self()...118 8.3.7.2 task* parent() const...119 8.3.7.3 bool is_stolen_task() const...119 8.3.8...119 8.3.8.1 state_type state() const...119 8.3.8.2 int ref_count() const...121 viii

8.4 empty_task...122 8.5 task_list...122 8.5.1 task_list()...123 8.5.2 ~task_list()...123 8.5.3 bool empty() const...124 8.5.4 push_back( task& task )...124 8.5.5 task& task pop_front()...124 8.5.6 void clear()...124 8.6...124 8.6.1 k...125 8.6.2 k...125 8.6.2.1...125 8.6.2.2...126 9...127 ix

1 ISO C++ C++ (STL) : 1

2 2.1 Courier blocked_range<type> blocked_range Type Type blocked_range<int> Foo class Foo { public: int x(); int y; ~Foo(); }; class FooBase { protected: int x(); }; class Foo: protected FooBase { private: int internal_stuff; public: using FooBase::x; int y; }; 2 x() protected 2

2.2 2.2.1 (concept) "sortable" T x < y T swap(x,y) x y C++ 2 1 ISO C++ 1 T 1: "sortable" bool operator<(const T& x, const T& y) void swap(t& x, T& y) x y x y 1 Concepts for C++0x (http://www.osl.iu.edu/publications/prints/2005/siek05:_concepts_cpp0x.pdf) 3.3.2 3

C++ int bool U (const U&) U 1 operator< int operator<( U x, U y ) C++ const bool operator<( U& x, U& y ) 2.2.2 2 int x y swap(x,y) int 1 x<y int operator< 2.2.3 CopyConstructible ISO C++ CopyConstructible 2 CopyConstructible 2: CopyConstructible T( const T& ) const T ~T() T* operator&() const T* operator&() const const T 2.3 2.3.1 ISO C++ underscore_style () PascalCase 4

2.3.2 TBB 2.4 2.4.1 tbb tbb 2.4.2 tbb::internal tbb::internal tbb::internal typedef concurrent_vector<t>::iterator internal::vector_iterator<container,value> typedef iterator typedef 2.5 2 2 5

2.6 2 2.6.1 TBB_DO_ASSERT TBB_DO_ASSERT TBB_DO_ASSERT 1 stderr C abort tbb::assertion_failure : Windows* TBB_DO_ASSERT 1 2.6.2 TBB_DO_THREADING_TOOLS TBB_DO_THREADING_TOOLS TBB_DO_THREADING_TOOLS 1 TBB_DO_THREADING_TOOLS 0 spin_mutex (6.1.3) spin_rw_mutex (6.1.6) 6

3 3.1 2 3 x X 3: X::X(X& x, Split) x x 2 Split x x 2 2-2 - () 2 7

blocked_range (3.2.1) blocked_range2d (3.2.2) 2 blocked_range<value> 3.2.1.3 parallel_reduce (3.5) parallel_scan (3.6) 2 3.1.1 split class split; #include "tbb/tbb_stddef.h" split namespace tbb { class split { }; } 3.2 Range 8

4 Range R 4: Range R::R( const R& ) R::~R() bool R::empty() const bool R::is_divisible() const true 2 true R::R( R& r, split ) r 2 Range 2 Range Range blocked_range (3.2.1) grainsize 2 parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) Range TrivialIntegerRange (lower,upper) 9

struct TrivialIntegerRange { int lower; int upper; bool empty() const {return lower==upper;} bool is_divisible() const {return upper>lower+1;} TrivialIntegerRange( TrivialIntegerRange& r, split ) { int m = (r.lower+r.upper)/2; lower = m; upper = r.upper; r.upper = m; } }; TrivialIntegerRange blocked_range blocked_range (3.2.1) 1 blocked_range2d (3.2.2) 2 3.2.1 blocked_range<value> template<typename Value> class blocked_range; #include "tbb/blocked_range.h" blocked_range<value> [i,j) i j 5 2 int size_t blocked_range<int> Value size_t STL 10

blocked_range Range (3.2) 5: blocked_range Value Value::Value( const Value& ) Value::~Value() bool operator<( const Value& i, const Value& j ) size_t operator ( const Value& i, const Value& j ) i j [i,j) Value operator+( const Value& i, size_t k ) i k blocked_range<value> size_t grainsize grainsize blocked_range 2 parallel_for parallel_reduce parallel_scan blocked_range<value> 1 2 grainsize 1. 10,000 2. 1 3. 5-10% : blocked_range [i,j) j<i j<i parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) ( Value index=i; index<j; ++index )... 11

TBB_DO_ASSERT (2.6.1) blocked_range<value> parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) namespace tbb { template<typename Value> class blocked_range { public: // types typedef size_t size_type; typedef Value const_iterator; // constructors blocked_range( Value begin, Value end, size_type grainsize=1); blocked_range( blocked_range& r, split ); // capacity size_type size() const; bool empty() const; // access size_type grainsize() const; bool is_divisible() const; } }; // iterators const_iterator begin() const; const_iterator end() const; 3.2.1.1 size_type blocked_range size_t const_iterator 12

const_iterator STL 5 Value const_iterator blocked_range STL const_iterator 3.2.1.2 blocked_range( Value begin, Value end, size_t grainsize=1 ) grainsize grainsize (begin,end) blocked_range "blocked_range<int> r( 5, 14, 2 );" 2 5 13 int r.begin()==5 r.end()==14 3.2.1.3 blocked_range( blocked_range& range, split ) is_divisible() true range 2 blocked_range range range range grainsize i j (i,j) g blocked_range<int> r(i,j,g) g (i,j) blocked_range<int> 13

blocked_range<int> s(r,split); g r (i, i +(j i)/2) s (i +(j i)/2, j) 3.2.1.4 size_type size() const end()<begin() false end() begin() 3.2.1.5 bool empty() const!(begin()<end()) 3.2.1.6 size_type grainsize() const 3.2.1.7 bool is_divisible() const!(end()<begin()) 14

size()>grainsize() true false 3.2.1.8 const_iterator begin() const 3.2.1.9 const_iterator end() const 3.2.2 blocked_range2d 2 template<typename RowValue, typename ColValue> class blocked_range2d; #include "tbb/blocked_range2d.h" blocked_range2d<rowvalue,colvalue> 2 (i0,j0) (i1,j1) RowValue ColValue 5 blocked_range blocked_range Range (3.2) namespace tbb { template<typename RowValue, typename ColValue=RowValue> class blocked_range2d { 15

public: // Types typedef blocked_range<rowvalue> row_range_type; typedef blocked_range<colvalue> col_range_type; // Constructors blocked_range2d( RowValue row_begin, RowValue row_end, typename row_range_type::size_type row_grainsize, ColValue col_begin, ColValue col_end, typename col_range_type::size_type col_grainsize); blocked_range2d( blocked_range2d& r, split ); // Capacity bool empty() const; } }; // Access bool is_divisible() const; const row_range_type& rows() const; const col_range_type& cols() const; blocked_range2d const size_t L = 150; const size_t M = 225; const size_t N = 300; void SerialMatrixMultiply( float c[m][n], float a[m][l], float b[l][n] ) { for( size_t i=0; i<m; ++i ) { for( size_t j=0; j<n; ++j ) { float sum = 0; for( size_t k=0; k<l; ++k ) sum += a[i][k]*b[k][j]; c[i][j] = sum; } } } #include "tbb/parallel_for.h" #include "tbb/blocked_range2d.h" using namespace tbb; const size_t L = 150; const size_t M = 225; const size_t N = 300; 16

class MatrixMultiplyBody2D { float (*my_a)[l]; float (*my_b)[n]; float (*my_c)[n]; public: void operator()( const blocked_range2d<size_t>& r ) const { float (*a)[l] = my_a; float (*b)[n] = my_b; float (*c)[n] = my_c; for( size_t i=r.rows().begin(); i!=r.rows().end(); ++i ){ for( size_t j=r.cols().begin(); j!=r.cols().end(); ++j ) { float sum = 0; for( size_t k=0; k<l; ++k ) sum += a[i][k]*b[k][j]; c[i][j] = sum; } } } MatrixMultiplyBody2D( float c[m][n], float a[m][l], float b[l][n] ) : my_a(a), my_b(b), my_c(c) {} }; void ParallelMatrixMultiply(float c[m][n], float a[m][l], float b[l][n]){ parallel_for( blocked_range2d<size_t>(0, M, 16, 0, N, 32), MatrixMultiplyBody2D(c,a,b) ); } blocked_range2d 2 parallel_for 16 32 blocked_range2d MatrixMultiplyBody2D::operator() 3.2.2.1 row_range_type blocked_range<rowvalue> 3.2.2.2 col_range_type blocked_range<colvalue> 17

3.2.2.3 blocked_range2d<rowvalue,colvalue>( RowValue row_begin, RowValue row_end, typename row_range_type::size_type row_grainsize, ColValue col_begin, ColValue col_end, typename col_range_type::size_type col_grainsize ) 2 blocked_range2d (row_begin,row_end) (col_begin,col_end) "blocked_range2d<char,int> r( a, z +1, 3, 0, 10, 2 );" (i, j) 2 i a z 3 j 0 9 2 3.2.2.4 blocked_range2d<rowvalue,colvalue> ( blocked_range2d& range, split ) range 2 blocked_range2d range range range row_grainsize col_grainsize 2 2 3.2.2.5 bool empty() const rows().empty() cols().empty() 18

3.2.2.6 bool is_divisible() const rows().is_divisible() cols().is_divisible() 3.2.2.7 const row_range_type& rows() const 3.2.2.8 const col_range_type& cols() const 3.3 Partitioner (3.2) 6 Partitioner P 6: Partitioner P::~P() template <typename Range> bool P::should_execute_range(const Range &r, const task &t) P::P( P& p, split ) r t true r false p 2 19

Partitioner parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) Range (3.2) is_divisible Partitioner Partitioner should_execute_range 2 Range Partitioner Range 2 Partitioner 2 Partitioner parallel_for parallel_reduce parallel_scan Partitioner should_execute_range should_execute_range true Partitioner simple_partitioner is_divisible false should_execute_range true class simple_partitioner { public: simple_partitioner() {} simple_partitioner(simple_partitioner &partitioner, split) {} template <typename Range> inline bool should_execute_range(const Range &r, const task &t) { return (!r.is_divisible() ); } }; 20

simple_partitioner (3.3.1) auto_partitioner (3.3.2) task_scheduler (8) 3.3.1 simple_partitioner parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) class simple_partitioner; #include "tbb/partitioner.h" simple_partitioner parallel_for parallel_reduce parallel_scan 3.3.1.1 simple_partitioner() 3.3.1.2 simple_partitioner(simple_partitioner &partitioner, split ) 3.3.1.3 template<typename Range> bool should_execute_range (const Range &r, const task &t) true!range.is_divisible() 21

3.3.2 auto_partitioner task_scheduler class auto_partitioner; #include "tbb/partitioner.h" auto_partitioner SI SI auto_partitioner auto_partitioner auto_partitioner auto_partitioner 3.3.2.1 auto_partitioner() 3.3.2.2 auto_partitioner(auto_partitioner &partitioner, split ) auto_partitioner 2 22

3.3.2.3 template<typename Range> bool should_execute_range (const Range &r, const task &t) true range.is_divisible() == true true range.is_divisible() == false true t r t r 3.4 parallel_for<range,body> template<typename Range, typename Body> void parallel_for ( const Range& range, const Body& body ); #include "tbb/parallel_for.h" parallel_for<range,body> Range Body Range Range (3.2) 7 7: parallel_for Body::Body( const Body& ) Body::~Body() void Body::operator()( Range& range ) const range 23

parallel_for is_divisible() false / Body::operator() parallel_for (8.2) parallel_for parallel_for parallel_for Range(r,split()) r r Body::operator() O(1) O(P log(n)) N P input[i-1] input[i] input[i+1] (0 i<n ) output[i] ParallelAverage #include "tbb/parallel_for.h" #include "tbb/blocked_range.h" using namespace tbb; struct Average { float* input; float* output; void operator()( const blocked_range<int>& range ) const { for( int i=range.begin(); i!=range.end(); ++i ) output[i] = (input[i-1]+input[i]+input[i+1])*(1/3.0f); } 24

}; // Note: The input must be padded such that input[-1] and input[n] // can be used to calculate the first and last output values. void ParallelAverage( float* output, float* input, size_t n ) { Average avg; avg.input = input; avg.output = output; parallel_for( blocked_range<int>( 0, n, 1000 ), avg ); } STL parallel_for 2 1. 2-6 2. (begin1,end1) 2 (begin2,end2) 3. m1 (begin1,end1) key 4. m2 key (begin2,end2) 5. (begin1,m1) (begin2,m2) 6. (m,end1) (m2,end2) 2 is_divisible 1 2 3-6 #include "tbb/parallel_for.h" #include <algorithm> using namespace tbb; template<typename Iterator> struct ParallelMergeRange { static size_t grainsize; Iterator begin1, end1; merged Iterator begin2, end2; merged Iterator out; // [begin1,end1) is first sequence to be // [begin2,end2) is first sequence to be // where to put merged sequence 25

bool empty() const {return (end1-begin1)+(end2-begin2)==0;} bool is_divisible() { if( end1-begin1 < end2-begin2 ) { std::swap(begin1,begin2); std::swap(end1,end2); } // [begin2,end2) is now at least as short as [begin1,end1) return end2-begin2 > grainsize; } ParallelMergeRange( ParallelMergeRange& r, split ) { Iterator m1 = r.begin1 + (r.end1-r.begin1)/2; Iterator m2 = std::lower_bound( r.begin2, r.end2, *m1 ); begin1 = m1; begin2 = m2; end1 = r.end1; end2 = r.end2; out = r.out + (m1-r.begin1) + (m2-r.begin2); r.end1 = m1; r.end2 = m2; } ParallelMergeRange( Iterator begin1_, Iterator end1_, Iterator begin2_, Iterator end2_, Iterator out_ ) : begin1(begin1_), end1(end1_), begin2(begin2_), end2(end2_), out(out_) {} }; template<typename Iterator> size_t ParallelMergeRange<Iterator>::grainsize = 1000; template<typename Iterator> struct ParallelMergeBody { void operator()( ParallelMergeRange<Iterator>& r ) const { std::merge( r.begin1, r.end1, r.begin2, r.end2, r.out ); } }; template<typename Iterator> void ParallelMerge( Iterator begin1, Iterator end1, Iterator begin2, Iterator end2, Iterator out ) { parallel_for( ParallelMergeRange<Iterator>(begin1,end1,begin2,end2,out), ParallelMergeBody<Iterator>() ); } 26

3.4.1 Partitioner Partitioner template<typename Range, typename Body, typename Partitioner> void parallel_for ( const Range& range, const Body& body, const Partitioner &partitioner ); #include "tbb/parallel_for.h" parallel_for<range,body,partitioner> Range Body Range Range (3.2) 7 Partitioner Partitioner (3.3) parallel_for Partitioner auto_partitioner #include "tbb/parallel_for.h" #include "tbb/blocked_range.h" using namespace tbb; struct Average { float* input; float* output; void operator()( const blocked_range<int>& range ) const { for( int i=range.begin(); i!=range.end(); ++i ) output[i] = (input[i-1]+input[i]+input[i+1])*(1/3.0f); } }; // Note: The input must be padded such that input[-1] and input[n] // can be used to calculate the first and last output values. void ParallelAverage( float* output, float* input, size_t n ) { 27

} Average avg; avg.input = input; avg.output = output; parallel_for( blocked_range<int>( 0, n ), avg, auto_partitioner() ); 2 (1) parallel_for 3 auto_partitioner (2) blocked_range 3.2.1 3.2.2 blocked_range blocked_range2d 1 8 simple_partitioner auto_partitioner 28

8: Partitioner Partitioner simple_partitioner Range::is_divisible blocked_range blocked_range2d ( 3.2.1 ) auto_partitioner blocked_range blocked_range2d 1 : auto_partitioner () 3.5 parallel_reduce<range,body> template<typename Range, typename Body> void parallel_reduce( const Range& range, Body& body ); 29

#include "tbb/parallel_reduce.h" parallel_reduce<range,body> Range Body Range Range (3.2) Body 9 9: parallel_reduce Body Body::Body( Body&, split ); (3.1) operator() join Body::~Body() void Body::operator()( Range& range ); void Body::join( Body& rhs ); rhs this parallel_reduce is_divisible() false parallel_reduce 1 operator() join (8.2.1) parallel_reduce join this this rhs join op "left.join(right)" left left op right 1 parallel_reduce [0,20) b0 2 30

5 4 (/) (b1 b2) b0 b1 1 b2 [10,15) [15,20) parallel_reduce b0.join(b1) b0.join(b2) b 0 [0,20) b 0 [0,10) b 2 [10,20) b 0 [0,5) b 1 [5,10) b 2 [10,15) b 2 [15,20) 1: blocked_range<int>(0,20,5) parallel_reduce 1 1 b2 b2 b3 b0 join 1 1 b2 [10,15) [15,20) parallel_reduce parallel_reduce parallel_for (3.4) join O(1) O(P log(n)) N P #include "tbb/parallel_reduce.h" #include "tbb/blocked_range.h" using namespace tbb; 31

struct Sum { float value; Sum() : value(0) {} Sum( Sum& s, split ) {value = 0;} void operator()( const blocked_range<float*>& range ) { float temp = value; for( float* a=range.begin(); a!=range.end(); ++a ) { temp += *a; } value = temp; } void join( Sum& rhs ) {value += rhs.value;} }; float ParallelSum( float array[], size_t n ) { Sum total; parallel_reduce( blocked_range<float*>( array, array+n, 1000 ), total ); return total.value; } op 0 op += op= Sum op op 3.5.1 Partitioner Partitioner template<typename Range, typename Body, typename Partitioner> void parallel_reduce( const Range& range, Body& body, Partitioner &partitioner ); #include "tbb/parallel_reduce.h" 32

parallel_reduce<range,body> Range Body Range Range (3.2) Body 9 Partitioner Partitioner (3.3) auto_partitioner #include "tbb/parallel_reduce.h" #include "tbb/blocked_range.h" using namespace tbb; struct Sum { float value; Sum() : value(0) {} Sum( Sum& s, split ) {value = 0;} void operator()( const blocked_range<float*>& range ) { float temp = value; for( float* a=range.begin(); a!=range.end(); ++a ) { temp += *a; } value = temp; } void join( Sum& rhs ) {value += rhs.value;} }; float ParallelSum( float array[], size_t n ) { Sum total; parallel_reduce( blocked_range<float*>( array, array+n ), total, auto_partitioner() ); return total.value; } 2 (1) parallel_reduce 3 auto_partitioner (2) blocked_range 3.4.1 blocked_range 1 8 simple_partitioner auto_partitioner 33

3.6 parallel_scan<range,body> template<typename Range, typename Body> void parallel_scan( const Range& range, Body& body ); #include "tbb/parallel_scan.h" parallel_scan<range,body> () id x0, x1,...xn-1 y0, y1, y2,...yn-1 y0 = id x0 yi = yi 1 xi T temp = id ; for( int i=1; i<=n; ++i ) { temp = temp x[i]; y[i] = temp; } 2 2 34

: parallel_scan 2 2 parallel_scan 1 parallel_scan<range,body> 10 10: parallel_scan void Body::operator()( const Range& r, pre_scan_tag ) void Body::operator()( const Range& r, final_scan_tag ) r r Body::Body( Body& b, split ) void Body::reverse_join( Body& a ) this b b a this a b b void Body::assign( Body& b ) b this parallel_scan using namespace tbb; class Body { T sum; T* const y; const T* const x; public: Body( T y_[], const T x_[] ) : sum(0), x(x_), y(y_) {} T get_sum() const {return sum;} template<typename Tag> void operator()( const blocked_range<int>& r, Tag ) { T temp = sum; for( int i=r.begin(); i<r.end(); ++i ) { temp = temp x[i]; if( Tag::is_final_scan() ) y[i] = temp; } 35

}; sum = temp; } Body( Body& b, split ) : x(b.x), y(b.y), sum(id ) {} void reverse_join( Body& a ) { sum = a.sum sum;} void assign( Body& b ) {sum = b.sum;} float DoParallelScan( T y[], const T x[], int n) { Body body(y,x); parallel_scan( blocked_range<int>(0,n,1000), body ); return body.get_sum(); } operator() parallel_scan 1 2 is_final_scan() y parallel_scan y reverse_join parallel_reduce join this parallel_scan Body parallel_scan 3.6.1 pre_scan_tag and final_scan_tag parallel_scan struct pre_scan_tag; struct final_scan_tag; 36

#include "tbb/parallel_scan.h" pre_scan_tag final_scan_tag parallel_scan operator() 3.6 namespace tbb { } struct pre_scan_tag { static bool is_final_scan(); }; struct final_scan_tag { static bool is_final_scan(); }; 3.6.1.1 bool is_final_scan() final_scan_tag true false 3.6.2 Partitioner Partitioner template<typename Range, typename Body, typename Partitioner> void parallel_scan( const Range& range, Body& body, Partitioner &partitioner ); 37

#include "tbb/parallel_scan.h" parallel_scan<range,body,partitioner> ( ) ( 3.6 ) auto_partitioner using namespace tbb; class Body { T sum; T* const y; const T* const x; public: Body( T y_[], const T x_[] ) : sum(0), x(x_), y(y_) {} T get_sum() const {return sum;} }; template<typename Tag> void operator()( const blocked_range<int>& r, Tag ) { T temp = sum; for( int i=r.begin(); i<r.end(); ++i ) { temp = temp x[i]; if( Tag::is_final_scan() ) y[i] = temp; } sum = temp; } Body( Body& b, split ) : x(b.x), y(b.y), sum(id ) {} void reverse_join( Body& a ) { sum = a.sum sum;} void assign( Body& b ) {sum = b.sum;} float DoParallelScan( T y[], const T x[], int n) { Body body(y,x); parallel_scan( blocked_range<int>(0,n), body, auto_partitioner() ); return body.get_sum(); } 2 (1) parallel_scan 3 auto_partitioner (2) blocked_range 3.4.1 blocked_range 1 38

8 simple_partitioner auto_partitioner 3.7 parallel_while template<typename Body> class parallel_while; #include "tbb/parallel_while.h" parallel_while<body> Body 2 1. 2. 11 11: S B parallel_while bool S::pop_if_present( B::argument_type& item ) B::operator()( B::argument_type& item ) const B::argument_type() parallel_while this item parallel_while this item operator 39

B::argument_type( const B::argument_type& ) ~B::argument_type() C++ 20.3 B concurrent_queue (4.2) S : B::operator() 10,000 parallel_while parallel_while add 2 namespace tbb { template<typename Body> class parallel_while { public: parallel_while(); ~parallel_while(); typedef typename Body::argument_type value_type; template<typename Stream> void run( Stream& stream, const Body& body ); } }; void add( const value_type& item ); 3.7.1 parallel_while<body>() parallel_while 3.7.2 ~parallel_while<body>() parallel_while 40

3.7.3 <typename Stream> void run( Stream& stream, const Body& body ) body stream add 1. stream.pop_if_present false 2. body(x) add x 3.7.4 void add( const value_type& item ) parallel_while body.operator() run 3.8 pipeline class pipeline; #include "tbb/pipeline.h" 41

pipeline filter (3.8.6) fi 1 i f0 f1 f2... 1. filter fi fi filter (3.8.6.1) 2. filter filter::operator() f0 NULL 3. pipeline 4. fi 2 5. pipeline::run max_number_of_live_tokens max_number_of_live_tokens pipeline filter pipeline pipeline pipeline::clear() namespace tbb { class pipeline { public: pipeline(); virtual ~pipeline(); 42

} }; void add_filter( filter& f ); void run( size_t max_number_of_live_tokens ); void clear(); 3.8.1 pipeline() 3.8.2 ~pipeline() 3.8.3 void add_filter( filter& f ) f f 3.8.4 void run( size_t max_number_of_live_tokens ) NULL max_number_of_live_tokens 43

3.8.5 void clear() 3.8.6 filter class filter; #include "tbb/pipeline.h" filter pipeline (3.8) 1 filter pipeline (3.8) namespace tbb { class filter { protected: filter( bool is_serial ); public: bool is_serial() const; virtual void* operator()( void* item ) = 0; virtual ~filter(); }; } 44

(doc/tutorial.pdf) MyInputFilter MyTransformFilter MyOutputFilter 3.8.6.1 filter( bool is_serial ) is_serial true is_serial false 3.8.6.2 ~filter() pipeline pipeline pipeline pipeline C++ 3.8.6.3 bool is_serial() const true false 3.8.6.4 virtual void* operator()( void * item ) filter NULL 45

pipeline NULL pipeline 3.9 parallel_sort<randomaccessiterator, Compare> template<typename RandomAccessIterator> void parallel_sort(randomaccessiterator begin, RandomAccessIterator end); template<typename RandomAccessIterator, typename Compare> void parallel_sort(randomaccessiterator begin, RandomAccessIterator end, const Compare& comp ); #include "tbb/parallel_sort.h" [begin1, end1) unstable unstable std::sort RandomAccessIterator T 12 12: RandomAccessIterator T void swap( T& x, T& y ) bool Compare::operator()( const T& x, const T& y ) x y x y true false 46

parallel_sort(i,j,comp) 2 comp [i,j) comp(x,y) true x y parallel_sort(i,j) parallel_sort(i,j,std::less<t>) parallel_sort O(N log (N)) N (8.2.1) parallel_sort 2 a b std::greater<float> #include "tbb/parallel_sort.h" #include <math.h> using namespace tbb; const int N = 100000; float a[n]; float b[n]; void SortExample() { for( int i = 0; i < N; i++ ) { a[i] = sin((double)i); b[i] = cos((double)i); } parallel_sort(a, a + N); parallel_sort(b, b + N, std::greater<float>()); } 47

4 STL allocator 4.1 concurrent_hash_map<key,t, HashCompare> template<typename Key, typename T, typename HashCompare> class concurrent_hash_map; #include "tbb/concurrent_hash_map.h" concurrent_hash_map STL Key T CopyConstructible (2.2.3) HashCompare 13 HashCompare 48

13: HashCompare HashCompare::HashCompare( const HashCompare & ) HashCompare::~HashCompare () bool HashCompare::equal( const Key& j, const Key& k ) const size_t HashCompare::hash( const Key& k ) true : 2 HashCompare h 2 j k "!h.equal(j,k) h.hash(j)==h.hash(k)" concurrent_hash_map namespace tbb { template<typename Key, typename T, typename HashCompare> class concurrent_hash_map { public: // types typedef Key key_type; typedef T mapped_type; typedef std::pair<const Key,T> value_type; typedef size_t size_type; typedef ptrdiff_t difference_type; // whole-table operations concurrent_hash_map(); concurrent_hash_map( const concurrent_hash_map& ); ~concurrent_hash_map(); concurrent_hash_map operator=( const concurrent_hash_map& ); void clear(); // concurrent access class const_accessor; class accessor; // concurrent operations on a table bool find( const_accessor& result, const Key& key ) const; bool find( accessor& result, const Key& key ); bool insert( const_accessor& result, const Key& key ); bool insert( accessor& result, const Key& key ); bool erase( const Key& key ); // parallel iteration 49

} }; typedef implementation defined range_type; typedef implementation defined const_range_type; range_type range( size_t grainsize ); const_range_type range( size_t grainsize ) const; // Capacity size_type size() const; bool empty() const; size_type max_size() const; // Iterators typedef implementation defined iterator; typedef implementation defined const_iterator; iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const; 4.1.1 4.1.1.1 concurrent_hash_map() 4.1.1.2 concurrent_hash_map( const concurrent_hash_map& table ) 4.1.1.3 ~concurrent_hash_map() concurrent_hash_map 50

4.1.1.4 concurrent_hash_map& operator= ( concurrent_hash_map& source ) (this) / 4.1.1.5 void clear() / 4.1.2 const_accessor accessor concurrent_hash_map concurrent_hash_map release const_accessor accessor 51

14: const_accessor accessor value_type pair const_accessor const std::pair<const Key,T> accessor std::pair<const Key,T> 4.1.2.1 const_accessor concurrent_hash_map / template<typename Key, typename T, typename HashCompare> class concurrent_hash_map<key,t,hashcompare>::const_accessor; #include "tbb/concurrent_hash_map.h" const_accessor concurrent_hash_map / namespace tbb { template<typename Key, typename T, typename HashCompare> class concurrent_hash_map<key,t,hashcompare>::const_accessor { public: // types typedef const std::pair<const Key,T> value_type; // construction and destruction const_accessor(); 52

} }; ~const_accessor(); // inspection bool empty() const; const value_type& operator*() const; const value_type* operator->() const; // early release void release(); 4.1.2.1.1 bool empty() const ture/ false 4.1.2.1.2 void release()!empty() 4.1.2.1.3 const value_type& operator*() const empty() TBB_DO_ASSERT (2.6.1) / const 4.1.2.1.4 const value_type* operator->() const &operator*() 53

4.1.2.1.5 const_accessor() const_accessor 4.1.2.1.6 ~const_accessor / 4.1.2.2 accessor concurrent_hash_map / template<typename Key, typename T, typename HashCompare> class concurrent_hash_map<key,t,hashcompare>::accessor; #include "tbb/concurrent_hash_map.h" accessor concurrent_hash_map / / const_accessor const_accessor namespace tbb { template<typename Key, typename T, typename HashCompare> class concurrent_hash_map<key,t,hashcompare>::accessor: concurrent_hash_map<key,t,hashcompare>::const_accessor { public: typedef std::pair<const Key,T> value_type; value_type& operator*() const; value_type* operator->() const; }; } 54

4.1.2.2.1 value_type& operator*() const empty() TBB_DO_ASSERT (2.6.1) / 4.1.2.2.2 value_type* operator->() const &operator*() 4.1.3 find insert erase concurrent_hash_map / find insert 2 1 const_accessor / 1 accessor : nonconst const true 4.1.3.1 bool find( const_accessor& result, const Key& key ) const 55

ture false 4.1.3.2 bool find( accessor& result, const Key& key ) ture false 4.1.3.3 bool insert( const_accessor& result, const Key& key ) pair(key,t()) true false 4.1.3.4 bool insert( accessor& result, const Key& key ) pair(key,t()) true false 56

4.1.3.5 bool erase(const Key& key ) true false 4.1.4 const_range_type range_type Range (3.2) 15 const_range_type const_iterator range_type iterator concurrent_hash_map parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) 15: concurrent_hash_map Range R ( 4 ) R::iterator R::begin() const R::iterator R::end() const 4.1.4.1 const_range_type range( size_t grainsize ) const const_range_type grainsize 1 / const_range_type 57

4.1.4.2 range_type range( size_t grainsize ) range_type 4.1.5 4.1.5.1 size_type size() const / : STL 4.1.5.2 bool empty() const size()==0 : STL 4.1.5.3 size_type max_size() const / 4.1.6 concurrent_hash_map ( ) 4.1.6.1 iterator begin() / iterator 58

4.1.6.2 iterator end() / iterator 4.1.6.3 const_iterator begin() const / const_iterator 4.1.6.4 const_iterator end() const / const_iterator 4.2 concurrent_queue<t> template<typename T> class concurrent_queue; #include "tbb/concurrent_queue.h" concurrent_queue FIFO () 59

concurrent_queue STL std::queue 16: STL concurrent_queue STL std::queue concurrent_queue front back front back size_type size() size() q q x=q.front(); q.pop() bool b=!q.empty(); if(b) { x=q.front(); q.pop(); } q.pop(x) bool b = q.pop_if_present(x) : concurrent_queue namespace tbb { template<typename T> class concurrent_queue { public: // types typedef T value_type; typedef T& reference; typedef const T& const_reference; typedef std::ptrdiff_t size_type; typedef std::ptrdiff_t difference_type; 60

} }; concurrent_queue() {} ~concurrent_queue(); void push( const T& source ); void pop( T& destination ); bool pop_if_present( T& destination ); size_type size() const {return internal_size();} bool empty() const; size_t capacity() const; void set_capacity( size_type capacity ); typedef implementation-defined iterator; typedef implementation-defined const_iterator; // iterators (these are slow an intended only for debugging) iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const; 4.2.1 concurrent_queue() 4.2.2 ~concurrent_queue() 4.2.3 void push( const T& source ) size()<capacity source 61

4.2.4 void pop( T& destination ) destination 4.2.5 bool pop_if_present( T& destination ) destination true false 4.2.6 size_type size() const 4.2.7 bool empty() const size()==0 4.2.8 size_type capacity() const 62

4.2.9 void set_capacity( size_type capacity ) 4.2.10 concurrent_queue iterator const_iterator STL concurrent_queue : 0..9 0 1 2 3 4 5 6 7 8 9 #include "tbb/concurrent_queue.h" #include <iostream> using namespace std; using namespace tbb; int main() { concurrent_queue<int> queue; for( int i=0; i<10; ++i ) queue.push(i); for( concurrent_queue<int>::const_iterator i(queue.begin()); i!=queue.end(); ++i ) cout << *i << " "; cout << endl; return 0; } 4.2.10.1 iterator begin() iterator 63

4.2.10.2 iterator end() iterator 4.2.10.3 const_iterator begin() const const_iterator 4.2.10.4 const_iterator end() const const_iterator 4.3 concurrent_vector template<typename T> class concurrent_vector; #include "tbb/concurrent_vector.h" concurrent_vector 0 namespace tbb { template<typename T> class concurrent_vector { 64

public: typedef size_t size_type; typedef T value_type; typedef ptrdiff_t difference_type; typedef T& reference; typedef const T& const_reference; // whole vector operations concurrent_vector() {} concurrent_vector( const concurrent_vector& ); concurrent_vector& operator=( const concurrent_vector&); ~concurrent_vector(); void clear(); // concurrent operations size_type grow_by( size_type delta ); void grow_to_at_least( size_type new_size ); size_type push_back( const_reference value ); reference operator[]( size_type index ); const_reference operator[]( size_type index ) const; // parallel iteration typedef implementation-defined iterator; typedef implementation-defined const_iterator; typedef generic_range_type<iterator> range_type; typedef generic_range_type<const_iterator> const_range_type; range_type range( size_t grainsize ); const_range_type range( size_t grainsize ) const; // capacity size_type size() const; bool empty() const; size_type capacity() const; void reserve( size_type n ); size_type max_size() const; // STL support iterator begin(); iterator end(); const_iterator begin() const; const_iterator end() const; } }; typedef implementation-defined reverse_iterator; typedef implementation-defined const_reverse_iterator; iterator rbegin(); iterator rend(); const_iterator rbegin() const; const_iterator rend() const; 65

4.3.1 4.3.1.1 concurrent_vector() 4.3.1.2 concurrent_vector( const concurrent_vector& src ) src 4.3.1.3 concurrent_vector& operator=( const concurrent_vector& src ) src *this 4.3.1.4 ~concurrent_vector() 4.3.1.5 void clear() size()==0 66

4.3.2 concurrent_vector<t> 4.3.2.1 size_type grow_by( size_type delta ) delta T() T value_type k (k..k+delta) 4.3.2.2 void grow_to_at_least( size_type n ) n T() T value_type 4.3.2.3 size_t push_back( const_reference value ); value 4.3.2.4 reference operator[]( size_type index ) 67

4.3.2.5 const_reference operator[]( size_type index ) const; const 4.3.3 const_range_type range_type Range (3.2) 15 const_range_type const_iterator range_type iterator concurrent_vector parallel_for (3.4) parallel_reduce (3.5) parallel_scan (3.6) 17: concurrent_vector Range R R::iterator R::begin() const R::iterator R::end() const 4.3.3.1 range_type range( size_t grainsize ) / concurrent_vector 4.3.3.2 const_range_type range( size_t grainsize ) const concurrent_vector 68

4.3.4 4.3.4.1 size_type size() const grow_by (4.3.2.1) grow_to_at_least (4.3.2.2) 4.3.4.2 bool empty() const size()==0 4.3.4.3 size_type capacity() const : STL concurrent_vector 4.3.4.4 void reserve( size_type n ) n n>max_size() std::length_error 4.3.4.5 size_type max_size() const 69

4.3.5 concurrent_vector<t> ISO C++ 24.1.4 std::vector concurrent_vector<t> ISO C++ 66 4.3.5.1 iterator begin() iterator 4.3.5.2 iterator end() iterator 4.3.5.3 const_iterator begin() const const_iterator 4.3.5.4 const_iterator end() const const_iterator 4.3.5.5 iterator rbegin() const_reverse_iterator(end()) 4.3.5.6 iterator rend() const_reverse_iterator(begin()) 70

4.3.5.7 const_reverse_iterator rbegin() const const_reverse_iterator(end()) 4.3.5.8 const_ reverse_iterator rend() const const_reverse_iterator(begin()) 71

5 5.1 Allocator Allocator ISO C++ 32 "Allocator " ISO C++ ISO C++ ( 20.1.5 4) 18 Allocator A B 18: Allocator typedef T* A::pointer typedef const T* A::const_pointer typedef T& A::reference typedef const T& A::const_reference typedef T A::value_type typedef size_t A::size_type typedef ptrdiff_t A::difference_type template<typename U> struct rebind { typedef A<U> A::other; }; A() throw() T const T T const T U A( const A& ) throw() template<typename U> A( const A& ) ~A() throw() T* A::address( T& x ) const const T* A::const_address( const T& x ) const 72

const T* A::allocate( size_type n, void* hint=0 ) n void A::deallocate( T* p, size_t n ) size_type A::max_size() const throw() n allocate void A::construct( T* p, const T& value ) void A::destroy( T* p ) bool operator==( const A&, const B& ) bool operator!=( const A&, const B& ) new(p) T(value) p->t::~t() true false scalable_allocator (5.2) cached_aligned_allocator (5.3) Allocator 5.2 scalable_allocator<t> template<typename T> class scalable_allocator; #include "tbb/scalable_allocator.h" scalable_allocator scalable_allocator 18 allocator std::allocator scalable_allocator 73

scalable_allocator std::allocator scalable_allocator Allocator (5.1) PSL CTG McRT 5.2.1 C extern "C" { void* scalable_calloc ( size_t nobj, size_t size ); void scalable_free( void* ptr ); void* scalable_malloc( size_t size ); void* scalable_realloc( void* ptr, size_t size ); } #include "tbb/scalable_allocator.h" C scalable_x C x scalable_x C scalable_x C scalable_x 74

5.3 cache_aligned_allocator<t> template<typename T> class cache_aligned_allocator; #include "tbb/cache_aligned_allocator.h" cache_aligned_allocator cache_aligned_allocator 18 allocator std::allocator cache_aligned_allocator cache_aligned_allocator 128 cache_aligned_allocator 75

namespace tbb { template<typename T> class NFS_Allocator { public: typedef T* pointer; typedef const T* const_pointer; typedef T& reference; typedef const T& const_reference; typedef T value_type; typedef size_t size_type; typedef ptrdiff_t difference_type; template<typename U> struct rebind { typedef cache_aligned_allocator<u> other; }; #if _WIN64 char* _Charalloc( size_type size ); #endif /* _WIN64 */ cache_aligned_allocator() throw(); cache_aligned_allocator( const cache_aligned_allocator& ) throw(); template<typename U> cache_aligned_allocator( const cache_aligned_allocator<u>& ) throw(); ~cache_aligned_allocator(); pointer address(reference x) const; const_pointer address(const_reference x) const; pointer allocate( size_type n, void* hint=0 ); void deallocate( pointer p, size_type ); size_type max_size() const throw(); }; void construct( pointer p, const T& value ); void destroy( pointer p ); template<> class cache_aligned_allocator<void> { public: typedef void* pointer; typedef const void* const_pointer; typedef void value_type; template<typename U> struct rebind { typedef cache_aligned_allocator<u> other; }; }; template<typename T, typename U> bool operator==( const cache_aligned_allocator<t>&, 76

const cache_aligned_allocator<u>& ); template<typename T, typename U> bool operator!=( const cache_aligned_allocator<t>&, const cache_aligned_allocator<u>& ); } std::allocator 5.3.1 pointer allocate( size_type n, void* hint=0 ) size 5.3.2 void deallocate( pointer p, size_type n ) p allocate(n) p 5.3.3 char* _Charalloc( size_type size ) : 64 Windows* Windows ISO 77

5.4 aligned_space template<typename T, size_t N> class aligned_space; #include "tbb/aligned_space.h" aligned_space T[N] aligned_space namespace tbb { template<typename T, size_t N> class aligned_space { public: aligned_space(); ~aligned_space(); T* begin(); T* end(); }; } 5.4.1 aligned_space() 78

5.4.2 ~aligned_space() 5.4.3 T* begin() 5.4.4 T* end() begin()+n 79

6 6.1 Mutex Mutex (MUTual EXclusion) mutex 6.1.1 Mutex mutex C++ 1. 2. (mutex ) mutex lock lock 2 { } // Construction of mylock acquires lock on mymutex M::scoped_lock mylock( mymutex );... actions to be performed while holding the lock... // Destruction of mylock releases lock on mymutex 19 mutex M Mutex 80

19: Mutex M() ~M() typename M::scoped_lock M::scoped_lock() M::scoped_lock(M&) M::~scoped_lock() M::scoped_lock::acquire(M&) bool M::scoped_lock::try_acquire(M&) mutex mutex mutex mutex () mutex mutex true false M::scoped_lock::release() 20 Mutex 20: Mutex mutex mutex OS OS 3 spin_mutex 1 queuing_mutex 1 spin_rw_mutex 1 queuing_rw_mutex 1 mutex 6.1.2 mutex OS Mutex 81

class mutex; #include "tbb/mutex.h" mutex Mutex (46.1.1) mutex OS OS mutex Mutex (6.1.1) 6.1.3 spin_mutex Mutex class spin_mutex; #include "tbb/spin_mutex.h" spin_mutex Mutex (6.1.1) spin_mutex spin_mutex 82

spin_mutex mutex Mutex (6.1.1) 6.1.4 queuing_mutex Mutex class queuing_mutex; #include "tbb/queuing_mutex.h" queuing_mutex Mutex (6.1.1) mutex queuing_mutex queuing_mutex queuing_mutex queuing_mutex queuing_mutex Mutex (6.1.1) 6.1.5 ReaderWriterMutex ReaderWriterMutex Mutex (write =true) 83

(write =false) write ReaderWriterMutex ReaderWriterMutex mutex 21 ReaderWriterMutex RW 21: ReaderWriterMutex RW() ~RW() typename RW::scoped_lock RW::scoped_lock() RW::scoped_lock(RW&, bool write=true) mutex mutex mutex mutex RW::~scoped_lock() ( ) RW::scoped_lock::acquire(RW&, bool write=true) bool RW::scoped_lock::try_acquire(RW&, bool write=true) mutex mutex true false RW::scoped_lock::release() bool RW::scoped_lock::upgrade_to_writer() bool RW::scoped_lock::downgrade_to_reader() ReaderWriterMutex spin_rw_mutex (6.1.6) queuing_rw_mutex (6.1.7) ReaderWriterMutex 84

6.1.5.1 ReaderWriterMutex() ReaderWriterMutex 6.1.5.2 ~ReaderWriterMutex() ReaderWriterMutex ReaderWriterMutex 6.1.5.3 ReaderWriterMutex::scoped_lock() mutex scoped_lock 6.1.5.4 ReaderWriterMutex::scoped_lock( ReaderWriterMutex& rw, bool write =true) mutex rw scoped_lock write true 6.1.5.5 ReaderWriterMutex::~scoped_lock() ReaderWriterMutex 6.1.5.6 void ReaderWriterMutex:: scoped_lock:: acquire( ReaderWriterMutex& rw, bool write=true ) mutex rw write true 85

6.1.5.7 bool ReaderWriterMutex:: scoped_lock::try_acquire( ReaderWriterMutex& rw, bool write=true ) mutex rw write true true false 6.1.5.8 void ReaderWriterMutex:: scoped_lock::release() 6.1.5.9 bool ReaderWriterMutex:: scoped_lock::upgrade_to_writer() false true 6.1.5.10 bool ReaderWriterMutex:: scoped_lock::downgrade_to_reader() false true 86

: spin_rw_mutex queuing_rw_mutex true false 6.1.6 spin_rw_mutex ReaderWriterMutex class spin_rw_mutex; #include "tbb/spin_rw_mutex.h" spin_rw_mutex ReaderWriterMutex (6.1.1) spin_rw_mutex spin_rw_mutex spin_rw_mutex mutex ReaderWriterMutex (6.1.5) 6.1.7 queuing_rw_mutex ReaderWriterMutex class queuing_rw_mutex; 87

#include "tbb/queuing_rw_mutex.h" queuing_rw_mutex ReaderWriterMutex (6.1.1) mutex queuing_rw_mutex queuing_rw_mutex queuing_rw_mutex queuing_rw_mutex ReaderWriterMutex (6.1.5) 6.2 atomic<t> template<typename T> atomic; #include "tbb/atomic.h" atomic<t> read write fetch-and-add fetch-and-store compare-andswap T T x atomic<float*> float 4 ++x 4 x atomic<void*> IA-32 64 88

Itanium 22 22: acquire read release write full fetch_and_store, fetch_and_add, compare_and_swap : atomic<t> atomic<t> atomic<t> namespace tbb { enum memory_semantics { acquire, release }; struct atomic<t> { typedef T value_type; template<memory_semantics M> value_type fetch_and_add( value_type addend ); value_type fetch_and_add( value_type addend ); template<memory_semantics M> value_type fetch_and_increment(); value_type fetch_and_increment(); template<memory_semantics M> value_type fetch_and_decrement(); value_type fetch_and_decrement(); template<memory_semantics M> value_type compare_and_swap( value_type new_value, value_type comparand ); 89