Nexus7 2 Skia 3 4 skia 5 2. Skia 2D Android 2D Skia 2.1 Skia Skia 2D Skia Google Chrome Mozilla Firefox Android Chorome OS Android 2D Skia [7]. Androi

Size: px

Start display at page:

Download "Nexus7 2 Skia 3 4 skia 5 2. Skia 2D Android 2D Skia 2.1 Skia Skia 2D Skia Google Chrome Mozilla Firefox Android Chorome OS Android 2D Skia [7]. Androi"

とよみえんの
5 years ago
Views:

1 Android 2D SKIA OSCAR 1,a) Android 2D Skia OSCAR OSCAR Parallelizable C C Skia Android Skia Oprofile OSCAR Parallelizable C Parallelizable C 0xbench NVIDIA Tegra3 (ARM Cortex-A9 4 ) Nexus7 Skia Android core0 3 Skia DrawRect [fps]DrawArc [fps]DrawCircle [fps] 1. [1] NVIDIA Tegra3[2] Qualcomm Snapdragon[3], Samsung Exynoso[4] OpenMP MPI[5] API OSCAR compiler[6] 1 Waseda University. a) tgoto@kasahara.cs.waseda.ac.jp 2D 2D skia[7], Quartz[8], cairo[9] OSCAR Oprofile OSCAR Android 2D Skia Google c 2013 Information Processing Society of Japan 1

2 Nexus7 2 Skia 3 4 skia 5 2. Skia 2D Android 2D Skia 2.1 Skia Skia 2D Skia Google Chrome Mozilla Firefox Android Chorome OS Android 2D Skia [7]. Android Java API(Application Programming Interface) android.graphics.canvas [10] API Canvas drawrect drawimage JNI(Java Native interface) Skia [11]Skia JNI Java Android Skia Android Skia Skia Skia 1 [12] Path Generation, Rasterization, Shading, (Bit-Level Block Transfer)[12] Path Generation Rasterization 1 Skia Shading BitBlit Rastererization Shading 2.2 0xbench, Android 0xbench. 0xbench, 0xlab Android [13], C library and system call, OpenGL-ES, 2D canvas, Garbage collection in Dalvik, JavaScript engine Skia 2D Canvas 2D Canvas android.graphic.canvas FPS 2D canvas DrawRect, DrawArc, DrawCircle2 3 2 DrawRect () Canvas drawrect 300 DrawArc 17 drawarc 500 DrawCircle2 drawrect 6 drawcircle 300 c 2013 Information Processing Society of Japan 2

OSCAR C Fortran OSCAR API OSCAR API API OpenMP DMA OSCAR OpenMP OSCAR API API OSCAR API 1 parallel

pthread create pthread join OSCAR OSCAR API 3.

3 2 2D 3. Oprofile OSCAR 3.1 OSCAR OSCAR API OSCAR [14], [15], [16] 3 [6], [17]OSCAR Parallelizable C Fortran Parallelizable C OSCAR C Fortran OSCAR API OSCAR API API OpenMP DMA OSCAR OpenMP OSCAR API API OSCAR API 1 parallel sections API oscar thread create oscar thread join 2 pthread oscar thread create oscar thread join pthread create pthread join OSCAR OSCAR API 3.2 OProfile Oprofile [18][19]Oprofile Oprofile for Tegra (version 0.9.6) [20] Skia Oprofile OSCAR OSCAR 3 HotSpot Oprofile OSCAR OSCAR Parallelizable C Parallelizable C 4. Skia Skia 3 c 2013 Information Processing Society of Japan 3

C++ コード分離 3 4.1 Skia Oprofile Application Profiling 2.2 DrawRect 5(a) SkRGB16 Blitter::blitRect 2.

4 C++ コード分離 Skia Oprofile Application Profiling 2.2 DrawRect 5(a) SkRGB16 Blitter::blitRect 2.1 BitBlit Blit xy (destiniation) DrawArc 5(b) SkRGB16 Blitter::blitH 82% SkRGB16 Blitter::blitRect DrawCircle2 5(c) SkRGB16 Blitter::blitAntiH 78%, SkRGB16 Blitter::blitRect 9% DrawRect blit blit 4.2 Skia 3.3 device 変数の依存解消 4 Skia DrawRect Original Source Code After Tuning Code 4 DrawRect OSCAR SkRGB16 Blitter::blitRect Parallelizable C C while for OSCAR for device device OSCAR OSCAR BitBlit height width 2.1 BitBlit 5. Skia OSCAR Skia c 2013 Information Processing Society of Japan 4

情報処理学会研究報告! (& # * (& # " "' (!$$% ' +!! % ( "! " #% )$' & " ( &' & ) & )% & #!(( & #!(! $ &!!! #!! #! "! %! "!!!! (! (( &! ( $ &!!!! ' " "' (!$$%!!!! ( &' "!! '! ( 図 6 OSCAR ランタイムライブラリに適応したスレッドプール処 " #% )$' 理フロー " #!

& # 図 5 各ベンチマークテストにおけるアプリケーション領域でのプロファイル結果回はスレッドプールを用いた並列化の仕組みを導入した OSCAR コンパイラが生成する並列化済みソースコードは OSCAR API で記述されたものでありこの並列化済みコードを OSCAR API 標準解釈系を用いることでランタ表 1 Nexus7 性能一覧イムライブラリ関数を含んだコードに変換される

3GHz single-core mode) とスレッド処理の終了待ちを行う oscar thread join 関数を CPU core quad-core スレッドプールを用いる形で実装した各関数のスレッド GPU NVIDIA GeForce ULP GPU Frequency 416MHz GPU core twelve-core RAM 1GB 生成した後生成されたスレッドは

5 情報処理学会研究報告! (& # * (& # " "' (!$$% ' +!! % ( "! " #% )$' & " ( &' & ) & )% & #!(( & #!(! $ &!!! #!! #! "! %! "!!!! (! (( &! ( $ &!!!! ' " "' (!$$%!!!! ( &' "!! '! ( 図 6 OSCAR ランタイムライブラリに適応したスレッドプール処 " #% )$' 理フロー " #!(( & #!( ( " #!(( & #!( $(! のであるそのため並列化部分の実行時に毎回スレッド生成を行うとオーバーヘッドが問題となるそこで今 & *!& # 図 5 各ベンチマークテストにおけるアプリケーション領域でのプロファイル結果回はスレッドプールを用いた並列化の仕組みを導入した OSCAR コンパイラが生成する並列化済みソースコードは OSCAR API で記述されたものでありこの並列化済みコードを OSCAR API 標準解釈系を用いることでランタ表 1 Nexus7 性能一覧イムライブラリ関数を含んだコードに変換されるこの関 CPU ARM Cortex-A9 NVIDIA Tegra 3 数においてスレッド生成を行う oscar thread create 関数 CPU Frequency 1.2GHz (1.3GHz single-core mode) とスレッド処理の終了待ちを行う oscar thread join 関数を CPU core quad-core スレッドプールを用いる形で実装した各関数のスレッド GPU NVIDIA GeForce ULP GPU Frequency 416MHz GPU core twelve-core RAM 1GB 生成した後生成されたスレッドは処理関数受付と関数 Display 1280x800 WXGA pixels 実行を繰り返し行うルーチンループに入るメインスレッ間での処理フローを図 6 で示す oscar thread create はメインスレッドで実行され初回のみ pthread でスレッドをドからはスレッドプールに実行関数のポインタが渡される 5.1 評価環境本節では Skia の性能評価を行う際に用いた端末や設定スレッドプールでは実行関数のポインタを確認次第関数を実行し終了時にその関数ポインタの値を NULL となど評価環境について述べるする oscar thread join ではこの関数ポインタが NULL Nexus7. に変更されるのを待つことで join 同期を行う本論文では評価に用いた携帯端末として ARM Cortex- A9 ４コアを用いた NVIDIA Tegra3 チップを搭載した 2012 年度版 Nexus7 を用いた 4 コア動作時各コアは最大 1.2[GHz] で動作する Nexus7 の詳細については表 1 に示す [21] 5.2 ARM プロセッサにおけるクロックサイクル計測手法 ARM Cortex-A9 プロセッサには Performance Monitor Unit(PMU) が搭載されている [22] PMU は各コアの様々な処理イベントの調査が可能となっており今回はそプロセスのコアバインドの中のサイクルカウント (CCNT) レジスタを用いてクロッ並列化した Skia の評価にあたってはカーネルの init 部ク数の計測数を行ったただし CCNT レジスタへのユー分に一部改変を行うことで Android OS やその他処理をザーモードでのアクセスはユーザイネーブル (USERNE) core0 に割り当て残る 3 コアを並列化されたプログラムレジスタのビット値が 1 である必要があり USEREN レが動作するよう処理のスレッド割り当てを行ったこれにジスタは特権モードでしかアクセス出来ないそのためよりバックグラウンドで処理されるプロセスが Skia の今回は USEREN レジスタを変更するカーネルモジュール並列処理実行に影響するのを避け安定してプログラムのを作成しこれを計測前に実行させることで skia からク効率的実行及び評価を行う事が可能となるロック数の計測が可能となるようにしたクロック数の計スレッドプール測においては並列化部分の前と後でクロック数の差分をまた今回の並列化対象となっている BitBlit 処理は各ピクセル毎にビット演算や簡単な整数演算を行うもので取っておりサイクルカウント取得にかかるオーバーヘッド分も差し引いて算出したあり処理の粒度が非常に小さく高頻度で実行されるも 2013 Information Processing Society of Japan 5

2 blitter Sequential Parallelized DrawRect 742634 267821 DrawArc 2182 1140 DrawCircle2 8013 2764 3 FPS Sequential Parallelized DrawRect 22.82 43.

2 DrawRect, DrawArc, DrawCircle2 SkRGB Blitter::blitRect, SkRGB16 Blitter::blitH, SkRGB16 Blitter::blitAntiH 2 7 DrawRect 742634 3 26821 DrawArc

82[fps] 43.57[fps] DrawArc 38.58[fps] 50.98[fps] DrawCircle2 33.86[fps] 50.77[fps] DrawRect 1.91 DrawArc 1.32 DrawCircle2 1.

6 2 blitter Sequential Parallelized DrawRect DrawArc DrawCircle FPS Sequential Parallelized DrawRect DrawArc DrawCircle blitter 8 FPS 5.3 Nexus7 2.2 DrawRect, DrawArc, DrawCircle2 SkRGB Blitter::blitRect, SkRGB16 Blitter::blitH, SkRGB16 Blitter::blitAntiH 2 7 DrawRect DrawArc DrawCircle DrawRect 2.77 DrawArc 1.91 DrawCircle FPS Nexus7 FPS FPS 0xbench FPS JAVA Skia DrawRect 22.82[fps] 43.57[fps] DrawArc 38.58[fps] 50.98[fps] DrawCircle [fps] 50.77[fps] DrawRect 1.91 DrawArc 1.32 DrawCircle DrawRect Systrace DrawCircle2 FPS Android 60 2 Systrace[10] Skia CPU.9 DrawRect Systrace (a) Skia DrawRect CPU Skia CPU1, CPU2, CPU0 4 (b) Skia DrawRect (a) 2 Skia CPU1,2,3 CPU0 Skia c 2013 Information Processing Society of Japan 6

7 10 Skia GPU FPS 5.5 Hardware Acceralation(GPU) Android Version 3.0 Hardware Acceralation 2.1 Android Canvas API OpenGL ES GPU [10][12] <application android:hardwareaccelerated= true > Harware Acceralation GPU 10. DrawRect [fps] GPU 53.31[fps] DrawArc [fps] GPU 39.98[fps]DrawCircle [fps] 10.1[fps] DrawArc DrawCircle2 GPU DrawRect GPU GPU 3 DrawArc 1,28 DrawCircle Oprofile OS- CAR 20 Android 2D Skia DrawRect DrawArc 1.91 DrawCircle DrawRect 1.91 DrawArc 1.32 DrawCircle GPU 3 DrawArc 1.28 DrawCircle2 5.1 [1] Blake, G., Dreslinski, R. and Mudge, T.: A survey of multicore processors, IEEE SIGNAL PROCESSING MAGAZINE, No. November, pp (2009). [2] NVIDIA Corporation: Whitepaper NVIDIA Tegra Multi-processor Architecture, pp [3] QUALCOMM Inc.: Snapdragon S4 Processors : System on Chip Solutions for a New Mobile Age (2012). [4] Samsung Electronics Co., L.: White Paper of Exynos 5, pp. 1 8 (2011). [5] Mallón, D., Taboada, G. and Teijeiro, C.: Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures, Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, 2009., pp (2009). [6] Kasahara, H., Obata, M. and Ishizaka, K.: Automatic coarse grain task parallel processing on smp using openmp, Workship on Lan- guages and Compilers for Parallel Computing, pp (2001). [7] Google: skia 2D Graphics Library. [8] Apple Inc.: Quartz 2D Programming Guide, Technical report (2012). [9] Worth, C. and Packard, K.: Xr: Cross-device rendering for vector graphics, Ottawa Linux Symposium (2003). [10] Google: Android Developers. [11] Kim, Y.-J., Cho, S.-J., Kim, K.-J., Hwang, E.-H., Yoon, S.-H. and Jeon, J.-W.: Benchmarking Java application using JNI and native C application on Android (2012). [12] Jim Huang: Hardware Accelerated 2D Rendering for Android, Android Builders Summit 2013 (2013). [13] 0xlab: 0xbench. [14] Ishizaka, K., Obata, M. and Kasahara, H.: Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor, Proc. of 14th International Workshop on Languages and Compilers for Parallel Computing (LCPC2001) (2001). [15] Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K. and Kasahara, H.: Hierarchical Parallelism Control for Multigrain Parallel Processing, Lecture Notes in Computer Science, Vol. 2481, pp (2005). [16] Shirako, J., Oshiyama, N., Wada, Y., Shikano, H., Kimura, K. and Kasahara, H.: Compiler Control Power Saving Scheme for Multi Core Processors, Lecture Notes in Computer Science, Vol. 4339, pp (2007). [17] Kimura, K., Wada, Y., Nakano, H., Kodaka, T., Shirako, J., Ishizaka, K. and Kasahara, H.: Multigrain Parallel Processing on Compiler Cooperative Chip Multiprocessor, Proc. of 9th Workshop on Interaction between Compilers and Computer Architectures (INTERACT- 9) (2005). [18] Cohen, W.: Tuning Programs with OProfile, Wide Open Magazine, pp (2004). [19] Lee, N. and Lim, S.-S.: A whole layer performance analysis method for Android platforms, th IEEE Symposium on Embedded Systems for Real-Time Multimedia, pp. 1 1 (online), DOI: /ESTIMedia (2011). [20] NVIDIA: NVIDIA Developer Zone. [21] ASUSTeK Computer Inc.: Nexus7 Specifications. [22] ARM Corporation: Cortex-A9 Technical Reference Manual. c 2013 Information Processing Society of Japan 7

Nexus7 2 Skia 3!"#$%&'(')"#+(, 4 5"#$., skia 5 0$"1(2, -".#')/"#+(, 2. Skia 2D Android 2D.+9):'%6"2', 6".7, 3#34#, 1'.#("#*+(% 86"2', Skia 6+1

Nexus7 2 Skia 3!#$%&'(')#*+(, 4 5#$., skia 5 0$1*(2, -.#')*/#*+(, 2. Skia 2D Android 2D.+9):'%*62', 6.7, 3*#34*#, 1'.#*(#*+(% 862', Skia 6+1 Android 2D SKIA OSCAR 1,a) 1 1 1 1 1 1 Android 2D Skia OSCAR OSCAR Parallelizable C C Skia Android Skia Oprofile OSCAR Parallelizable C Parallelizable C 0xbench NVIDIA Tegra3 (ARM Cortex-A9 4 ) Nexus7