Android 2D SKIA OSCAR 1,a) 1 1 1 1 1 1 Android 2D Skia OSCAR OSCAR Parallelizable C C Skia Android Skia Oprofile OSCAR Parallelizable C Parallelizable C 0xbench NVIDIA Tegra3 (ARM Cortex-A9 4 ) Nexus7 Skia Android core0 3 Skia DrawRect 1.91 43.57[fps] DrawArc 1.32 50.98[fps] DrawCircle2 1.5 50.77[fps] 1. [1] NVIDIA Tegra3[2] Qualcomm Snapdragon[3], Samsung Exynoso[4] OpenMP MPI[5] API OSCAR compiler[6] 1 Waseda University. a) tgoto@kasahara.cs.waseda.ac.jp 2D 2D skia[7], Quartz[8], cairo[9] OSCAR Oprofile OSCAR Android 2D Skia Google
Nexus7 2 Skia 3!"#$%&'(')"#*+(, 4 5"#$., skia 5 0$"1*(2, -".#')*/"#*+(, 2. Skia 2D Android 2D.+9):'%*6"2', 6".7, 3*#34*#, 1'.#*("#*+(% 86"2', Skia 6+1*;'1, 1'.#*("#*+(%86"2', 2.1 Skia Skia 2D Skia Google Chrome Mozilla Firefox Android Chorome OS Android 2D Skia [7]. Android Java API(Application Programming Interface) android.graphics.canvas [10] API Canvas drawrect drawimage JNI(Java Native interface) Skia [11] Skia JNI Java Android Skia Android 2.1.1 Skia Skia Skia 1 [12] Path Generation, Rasterization, Shading, (Bit-Level Block Transfer)[12] Path Generation Rasterization 1 Skia Shading BitBlit Rastererization Shading 2.2 0xbench, Android 0xbench. 0xbench, 0xlab Android [13], C library and system call, OpenGL-ES, 2D canvas, Garbage collection in Dalvik, JavaScript engine Skia 2D Canvas 2D Canvas android.graphic.canvas FPS 2D canvas DrawRect, DrawArc, DrawCircle2 3 2 DrawRect () Canvas drawrect 300 DrawArc 17 drawarc 500 DrawCircle2 drawrect 6 drawcircle 300
2 2D 3. Oprofile OSCAR 3.1 OSCAR OSCAR API OSCAR [14], [15], [16] 3 [6], [17] OSCAR Parallelizable C Fortran Parallelizable C OSCAR C Fortran OSCAR API OSCAR API API OpenMP DMA OSCAR OpenMP OSCAR API API OSCAR API 1 parallel sections API oscar thread create oscar thread join 2 pthread oscar thread create oscar thread join pthread create pthread join OSCAR OSCAR API 3.2 OProfile Oprofile [18][19] Oprofile Oprofile for Tegra (version 0.9.6) [20] 20 50000 3.3 Skia Oprofile OSCAR OSCAR 3 HotSpot Oprofile OSCAR OSCAR Parallelizable C Parallelizable C 4. Skia Skia 3
?47,7;2!),;,7(*&"/4)5+&6,*+92 ')/<*+&%+94*:2 =,7()8&6,*+2 9+*+5:2 >/:91/:&"/4)5+&6,*+2!"#$%&'()(**+*,-(.*+&#/01,*+)2 $7(*8-+3&%+94*:2! /)2 >/:91/:&$7(*8-+)2 '()(**+*,-,7;&>/:91/:2 @7A/)0(:,/72 '()(**+*,-+3&"/4)5+&6,*+2!"#$#%&'()*+",-(.*/-0 void SkRGB16_Blitter::blitRect(int x, int y, int width, int height) { SkASSERT(x + width <= fdevice.width() && y + height <= fdevice.height()); uint16_t* SK_RESTRICT device = fdevice.getaddr16(x, y); unsigned devicerb = fdevice.rowbytes(); SkPMColor src32 = fsrccolor32; while (--height >= 0) { blend32_16_row(src32, device, width); device = (uint16_t*)((char*)device + devicerb); } } 123-"(4+%#%$0 C++コード 分 離 void SkRGB16_Blitter::blitRect(int x, int y, int width, int height) { SkASSERT(x + width <= fdevice.width() && y + height <= fdevice.height()); uint16_t* SK_RESTRICT device = fdevice.getaddr16(x, y); unsigned devicerb = fdevice.rowbytes(); SkPMColor src32 = fsrccolor32; SkRGB16_Blitter_blitRect_oscar(width, height, device, devicerb, src32); }! 3 4.1 Skia Oprofile Application Profiling 2.2 DrawRect 5(a) SkRGB16 Blitter::blitRect 2.1 BitBlit Blit xy (destiniation) DrawArc 5(b) SkRGB16 Blitter::blitH 82% SkRGB16 Blitter::blitRect DrawCircle2 5(c) SkRGB16 Blitter::blitAntiH 78%, SkRGB16 Blitter::blitRect 9% DrawRect blit blit 4.2 Skia 3.3 void SkRGB16_Blitter_blitRect_oscar(int width, int height, uint16_t* device, unsigned devicerb, SkPMColor src32) { int i; uint16_t* devicetmp; for (i = height; i > 0; i--){ devicetmp = (uint16_t*)((char*)device + (devicerb*(height- i))); blend32_16_row(src32, devicetmp, width); } }! device 変 数 の 依 存 解 消 4 Skia DrawRect Original Source Code After Tuning Code 4 DrawRect OSCAR SkRGB16 Blitter::blitRect Parallelizable C C while for OSCAR for device device OSCAR OSCAR BitBlit height width 2.1 BitBlit 5. Skia OSCAR Skia
!"#$%&' ()*+,'!"!+-*AB..C!"!+-*ABDE1 <3'&51 6,67,+@8()99:&821 &34851 F+?,-71 &&38851 MainThread! Additional Thread 1! Additional Thread 2! 0%)$')12(%)$*. +!34516%"'1'78)-. 0%)$')12(%)$*. +!34516%"'1'78)-. 7"(=))(:>+?1 838&51 6,67,+(&82()99:1 83;<51!"#$%&'(%)$*&#%)$')+,-.!"#$%&'(%)$*&#%)$')+/-. Transfer FunctionPointer! 9$7'1:!%13);'. 9$7'1:!%13);'. -./01234156""$%77856"/$9"' :+);<,'!"#$%&'(%)*++,-../)*+01 2&32451 <=3#'7!31>?B. <=3#'7!31>?@. <=3#'7!31>?A. CD0EF1>$%$44)47G)*1D)#'7!3. FunctionPointer=null! FunctionPointer=null! =>?@A%>B/$9"'!"0)<=>#?1@..%-,>"3 B67B83 D+=,-@3 '67&83 D/EGH->IJ-K1!"#$%&'(%)$*&H!73+,-.!"#$%&'(%)$*&H!73+/-. Check FunctionPointer!!?<,-%)*++,-../)*+23 B6C483!"0)<=>#?1@..>AA3 B6;483!"#$%&'(%)*++,-../)*+#,9+3 56:;83 6 OSCAR!"#$%&'(%)*++,-../)*+01+*23 456'783 E9FGH->IJ*-9),B3 5 1 Nexus7 CPU ARM Cortex-A9 NVIDIA Tegra 3 CPU Frequency 1.2GHz (1.3GHz single-core mode) CPU core quad-core GPU NVIDIA GeForce ULP GPU Frequency 416MHz GPU core twelve-core RAM 1GB Display 1280x800 WXGA pixels 5.1 Skia 5.1.1 Nexus7. ARM Cortex- A9 NVIDIA Tegra3 2012 Nexus7 4 1.2[GHz] Nexus7 1 [21] 5.1.2 Skia init Android OS core0 3 Skia 5.1.3 BitBlit OSCAR OSCAR API OSCAR API oscar thread create oscar thread join 6 oscar thread create pthread NULL oscar thread join NULL join 5.2 ARM ARM Cortex-A9 Performance Monitor Unit(PMU) [22] PMU (CCNT) CCNT (USERNE) 1 USEREN USEREN skia
2 blitter Sequential Parallelized DrawRect 742634 267821 DrawArc 2182 1140 DrawCircle2 8013 2764 3 FPS Sequential Parallelized DrawRect 22.82 43.57 DrawArc 38.58 50.98 DrawCircle2 33.86 50.77 *")% *%!")%!"##$% 70890:25-6% ;-,-66065<0=%!"'($% '"(% '%!"#!$% 6/78/914,5% :,+,55/54;/<%!"##$%"&'()*+,!% &")% &% (")% &"'&$%!"##$%"&'()*+,!"(%!% )"(%!"&'$%!"()$% (% +,-./012% +,-.3,1% +,-.45,160!% -#./01('23, )% *+,-./01% *+,-2+0% *+,-34+05/'% -#./01('23, 7 blitter 8 FPS 5.3 Nexus7 2.2 DrawRect, DrawArc, DrawCircle2 SkRGB Blitter::blitRect, SkRGB16 Blitter::blitH, SkRGB16 Blitter::blitAntiH 2 7 DrawRect 742634 3 26821 DrawArc 2182 1140 DrawCircle2 8013 2764 DrawRect 2.77 DrawArc 1.91 DrawCircle2 2.90 5.4 FPS Nexus7 FPS FPS 0xbench 1 5.3 FPS JAVA Skia 3 8 3 DrawRect 22.82[fps] 43.57[fps] DrawArc 38.58[fps] 50.98[fps] DrawCircle2 33.86[fps] 50.77[fps] DrawRect 1.91 DrawArc 1.32 DrawCircle2 1.50!"#$%&'%()*"+,$-*".!/#0"1"++%+*2%3,$-*". 9 DrawRect Systrace DrawCircle2 FPS Android 60 2 Systrace[10] Skia CPU.9 DrawRect Systrace (a) Skia DrawRect CPU Skia CPU1, CPU2, CPU0 4 (b) Skia DrawRect (a) 2 Skia CPU1,2,3 CPU0 Skia
!"#$!%&'()"(%)#(*+,-./ 9.7:7;/8<=>7/<?;<@AB% A/./88287C2D<=>7/<?;<6AB%,(%!"#"$%!(#&'%!(#**%!(% )"#!*% "&#&'% )(% "(% +(% $(#$% $(% (% -./01234% -./05.3% -./067.382+% 0(,*1'&%23/ 10 Skia GPU FPS 5.5 Hardware Acceralation(GPU) Android Version 3.0 Hardware Acceralation 2.1 Android Canvas API OpenGL ES GPU [10][12] <application android:hardwareaccelerated= true > Harware Acceralation GPU 10. DrawRect 3 43.57[fps] GPU 53.31[fps] DrawArc 3 50.98[fps] GPU 39.98[fps] DrawCircle2 50.77[fps] 10.1[fps] DrawArc DrawCircle2 GPU DrawRect GPU GPU 3 DrawArc 1,28 DrawCircle2 5.10 5.6 Oprofile OS- CAR 20 Android 2D Skia DrawRect 3 2.77 DrawArc 1.91 DrawCircle2 2.90 DrawRect 1.91 DrawArc 1.32 DrawCircle2 1.50 GPU 3 DrawArc 1.28 DrawCircle2 5.1 [1] Blake, G., Dreslinski, R. and Mudge, T.: A survey of multicore processors, IEEE SIGNAL PROCESSING MAGAZINE, No. November, pp. 26 37 (2009). [2] NVIDIA Corporation: Whitepaper NVIDIA Tegra Multi-processor Architecture, pp. 1 12. [3] QUALCOMM Inc.: Snapdragon S4 Processors : System on Chip Solutions for a New Mobile Age (2012). [4] Samsung Electronics Co., L.: White Paper of Exynos 5, pp. 1 8 (2011). [5] Mallón, D., Taboada, G. and Teijeiro, C.: Performance Evaluation of MPI, UPC and OpenMP on Multicore Architectures, Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, 2009., pp. 174 184 (2009). [6] Kasahara, H., Obata, M. and Ishizaka, K.: Automatic coarse grain task parallel processing on smp using openmp, Workship on Lan- guages and Compilers for Parallel Computing, pp. 1 15 (2001). [7] Google: skia 2D Graphics Library. [8] Apple Inc.: Quartz 2D Programming Guide, Technical report (2012). [9] Worth, C. and Packard, K.: Xr: Cross-device rendering for vector graphics, Ottawa Linux Symposium (2003). [10] Google: Android Developers. [11] Kim, Y.-J., Cho, S.-J., Kim, K.-J., Hwang, E.-H., Yoon, S.-H. and Jeon, J.-W.: Benchmarking Java application using JNI and native C application on Android (2012). [12] Jim Huang: Hardware Accelerated 2D Rendering for Android, Android Builders Summit 2013 (2013). [13] 0xlab: 0xbench. [14] Ishizaka, K., Obata, M. and Kasahara, H.: Coarse Grain Task Parallel Processing with Cache Optimization on Shared Memory Multiprocessor, Proc. of 14th International Workshop on Languages and Compilers for Parallel Computing (LCPC2001) (2001). [15] Obata, M., Shirako, J., Kaminaga, H., Ishizaka, K. and Kasahara, H.: Hierarchical Parallelism Control for Multigrain Parallel Processing, Lecture Notes in Computer Science, Vol. 2481, pp. 31 44 (2005). [16] Shirako, J., Oshiyama, N., Wada, Y., Shikano, H., Kimura, K. and Kasahara, H.: Compiler Control Power Saving Scheme for Multi Core Processors, Lecture Notes in Computer Science, Vol. 4339, pp. 362 376 (2007). [17] Kimura, K., Wada, Y., Nakano, H., Kodaka, T., Shirako, J., Ishizaka, K. and Kasahara, H.: Multigrain Parallel Processing on Compiler Cooperative Chip Multiprocessor, Proc. of 9th Workshop on Interaction between Compilers and Computer Architectures (INTERACT- 9) (2005). [18] Cohen, W.: Tuning Programs with OProfile, Wide Open Magazine, pp. 53 62 (2004). [19] Lee, N. and Lim, S.-S.: A whole layer performance analysis method for Android platforms, 2011 9th IEEE Symposium on Embedded Systems for Real-Time Multimedia, pp. 1 1 (online), DOI: 10.1109/ESTIMedia.2011.6088515 (2011). [20] NVIDIA: NVIDIA Developer Zone. [21] ASUSTeK Computer Inc.: Nexus7 Specifications. [22] ARM Corporation: Cortex-A9 Technical Reference Manual.