(SAD) x86 MPSADBW H.264/AVC H.264/AVC SAD SAD x86 SAD MPSADBW SAD 3x3 3 9 SAD SAD SAD x86 MPSADBW SAD 9 SAD SAD 4.6

Similar documents
2016 [1][2] H.264/AVC HEVC HEVC

H.264/AVC 2 H.265/HEVC 1 H.265 JCT-VC HM(HEVC Test Model) HM 5 5 SIMD HM 33%

4.1 % 7.5 %

GPGPU

2017 (413812)

12 DCT A Data-Driven Implementation of Shape Adaptive DCT

7,, i

28 TCG SURF Card recognition using SURF in TCG play video

SURF,,., 55%,.,., SURF(Speeded Up Robust Features), 4 (,,, ), SURF.,, 84%, 96%, 28%, 32%.,,,. SURF, i

, (GPS: Global Positioning Systemg),.,, (LBS: Local Based Services).. GPS,.,. RFID LAN,.,.,.,,,.,..,.,.,,, i

28 Horizontal angle correction using straight line detection in an equirectangular image

2. CABAC CABAC CABAC 1 1 CABAC Figure 1 Overview of CABAC 2 DCT 2 0/ /1 CABAC [3] 3. 2 値化部 コンテキスト計算部 2 値算術符号化部 CABAC CABAC

25 II :30 16:00 (1),. Do not open this problem booklet until the start of the examination is announced. (2) 3.. Answer the following 3 proble

SOM SOM(Self-Organizing Maps) SOM SOM SOM SOM SOM SOM i

評論・社会科学 84号(よこ)(P)/3.金子

AtCoder Regular Contest 073 Editorial Kohei Morita(yosupo) A: Shiritori if python3 a, b, c = input().split() if a[len(a)-1] == b[0] and b[len(


1 Fig. 1 Extraction of motion,.,,, 4,,, 3., 1, 2. 2.,. CHLAC,. 2.1,. (256 ).,., CHLAC. CHLAC, HLAC. 2.3 (HLAC ) r,.,. HLAC. N. 2 HLAC Fig. 2

soturon.dvi

ABSTRACT The movement to increase the adult literacy rate in Nepal has been growing since democratization in In recent years, about 300,000 peop

kut-paper-template.dvi

1 Table 1: Identification by color of voxel Voxel Mode of expression Nothing Other 1 Orange 2 Blue 3 Yellow 4 SSL Humanoid SSL-Vision 3 3 [, 21] 8 325

29 jjencode JavaScript

[2] OCR [3], [4] [5] [6] [4], [7] [8], [9] 1 [10] Fig. 1 Current arrangement and size of ruby. 2 Fig. 2 Typography combined with printing

Fig. 1 Schematic construction of a PWS vehicle Fig. 2 Main power circuit of an inverter system for two motors drive

..,,,, , ( ) 3.,., 3.,., 500, 233.,, 3,,.,, i


untitled

& Vol.5 No (Oct. 2015) TV 1,2,a) , Augmented TV TV AR Augmented Reality 3DCG TV Estimation of TV Screen Position and Ro

28 Docker Design and Implementation of Program Evaluation System Using Docker Virtualized Environment

Takens / / 1989/1/1 2009/9/ /1/1 2009/9/ /1/1 2009/9/30,,, i

浜松医科大学紀要

25 Removal of the fricative sounds that occur in the electronic stethoscope

17 Proposal of an Algorithm of Image Extraction and Research on Improvement of a Man-machine Interface of Food Intake Measuring System

kiyo5_1-masuzawa.indd

Table 1. Reluctance equalization design. Fig. 2. Voltage vector of LSynRM. Fig. 4. Analytical model. Table 2. Specifications of analytical models. Fig

161 J 1 J 1997 FC 1998 J J J J J2 J1 J2 J1 J2 J1 J J1 J1 J J 2011 FIFA 2012 J 40 56

DTN DTN DTN DTN i

Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Virtual Window System Social Networking

, IT.,.,..,.. i


1 Web [2] Web [3] [4] [5], [6] [7] [8] S.W. [9] 3. MeetingShelf Web MeetingShelf MeetingShelf (1) (2) (3) (4) (5) Web MeetingShelf

58 10

ron.dvi

Web Web Web Web i

2

_Y05…X…`…‘…“†[…h…•

在日外国人高齢者福祉給付金制度の創設とその課題

Journal of Geography 116 (6) Configuration of Rapid Digital Mapping System Using Tablet PC and its Application to Obtaining Ground Truth


P2P P2P peer peer P2P peer P2P peer P2P i

自分の天職をつかめ

PC PDA SMTP/POP3 1 POP3 SMTP MUA MUA MUA i

LAN LAN LAN LAN LAN LAN,, i


13....*PDF.p

untitled

Fig. 3 Flow diagram of image processing. Black rectangle in the photo indicates the processing area (128 x 32 pixels).

Input image Initialize variables Loop for period of oscillation Update height map Make shade image Change property of image Output image Change time L

Visual Evaluation of Polka-dot Patterns Yoojin LEE and Nobuko NARUSE * Granduate School of Bunka Women's University, and * Faculty of Fashion Science,

Introduction Purpose This training course demonstrates the use of the High-performance Embedded Workshop (HEW), a key tool for developing software for

untitled


0801391,繊維学会ファイバ12月号/報文-01-西川

FabHetero FabHetero FabHetero FabCache FabCache SPEC2000INT IPC FabCache 0.076%

paper.dvi

〈論文〉興行データベースから「古典芸能」の定義を考える

23 Fig. 2: hwmodulev2 3. Reconfigurable HPC 3.1 hw/sw hw/sw hw/sw FPGA PC FPGA PC FPGA HPC FPGA FPGA hw/sw hw/sw hw- Module FPGA hwmodule hw/sw FPGA h

258 5) GPS 1 GPS 6) GPS DP 7) 8) 10) GPS GPS ) GPS Global Positioning System

Deep Learning Deep Learning GPU GPU FPGA %

IPSJ SIG Technical Report Vol.2016-CE-137 No /12/ e β /α α β β / α A judgment method of difficulty of task for a learner using simple

A pp CALL College Life CD-ROM Development of CD-ROM English Teaching Materials, College Life Series, for Improving English Communica

24_ChenGuang_final.indd

n 2 n (Dynamic Programming : DP) (Genetic Algorithm : GA) 2 i

A Nutritional Study of Anemia in Pregnancy Hematologic Characteristics in Pregnancy (Part 1) Keizo Shiraki, Fumiko Hisaoka Department of Nutrition, Sc

2 The Bulletin of Meiji University of Integrative Medicine 3, Yamashita 10 11

A Feasibility Study of Direct-Mapping-Type Parallel Processing Method to Solve Linear Equations in Load Flow Calculations Hiroaki Inayoshi, Non-member


1 Web Web 1,,,, Web, Web : - i -

Y X X Y1 X 2644 Y1 Y2 Y1 Y3 Y1 Y1 Y1 Y2 Y3 Y2 Y3 Y1 Y1 Y2 Y3 Y1 Y2 Y3 Y1 X Lexis X Y X X2 X3 X2 Y2 Y1 Y1

情報処理学会研究報告 IPSJ SIG Technical Report Vol.2017-CG-166 No /3/ HUNTEXHUNTER1 NARUTO44 Dr.SLUMP1,,, Jito Hiroki Satoru MORITA The

20 Method for Recognizing Expression Considering Fuzzy Based on Optical Flow

千葉県における温泉地の地域的展開

™…

社会学部紀要 114号☆/22.松村

「プログラミング言語」 SICP 第4章 ~超言語的抽象~ その6

2

2

2007-Kanai-paper.dvi

2 3

206“ƒŁ\”ƒ-fl_“H„¤‰ZŁñ

Development of Induction and Exhaust Systems for Third-Era Honda Formula One Engines Induction and exhaust systems determine the amount of air intake

23 Study on Generation of Sudoku Problems with Fewer Clues

16.16%

) , , ,063 6,555 2)

Abstract This paper concerns with a method of dynamic image cognition. Our image cognition method has two distinguished features. One is that the imag

2013 Future University Hakodate 2013 System Information Science Practice Group Report biblive : Project Name biblive : Recording and sharing experienc

14 CRT Color Constancy in the Conditions of Dierent Cone Adaptation in a CRT Display

,4) 1 P% P%P=2.5 5%!%! (1) = (2) l l Figure 1 A compilation flow of the proposing sampling based architecture simulation

1., 1 COOKPAD 2, Web.,,,,,,.,, [1]., 5.,, [2].,,.,.,, 5, [3].,,,.,, [4], 33,.,,.,,.. 2.,, 3.., 4., 5., ,. 1.,,., 2.,. 1,,

Transcription:

SAD 23 (410M520)

(SAD) x86 MPSADBW H.264/AVC H.264/AVC SAD SAD x86 SAD MPSADBW SAD 3x3 3 9 SAD SAD SAD x86 MPSADBW SAD 9 SAD SAD 4.6

Abstract In recent years, the high definition of video image has made progress. The encoding for compressing an increasing number of data volumes of video image by this high definition progresses the sophistication of method and is greatly increasing the throughput. Because the motion estimation processing occupies most of the encode processing, the speeding up was being studied since early times. But the SAD operation instruction which embedded on the general purpose processor stopped advance since the MPSADBW instruction of x86 processor and is an obstacle the speeding up of software processing to don t correspond H.264/AVC encoding. Therefore, in this paper, I speed up the motion estimation by the realization of the highly parallel SAD operation instruction that is able to correspond with any the variable block sizes of H.264/AVC. The motion estimation does the block matching between the current picture and the reference picture, and calculates the SAD for this block matching. X86 processor has the MPSADBW instruction as the instruction of multiple SAD operations. But this instruction is limited to the horizontal SAD operations and have disadvantages that can t efficiently execute the motion estimation of tracking type that is the basic motion estimation of software processing at the moment. It can parallelize only three points at a time even if it used the estimation of tracking type which uses the square pattern of 3x3 which is using for high degree of data reuse in this laboratory. Hence, in order to solve this problem, in this paper, I proposed the highly parallel SAD operation instruction set that is able to parallelize the SAD operations of nine points for the square pattern at a time, and evaluated its effectivity. In addition, I designed the circuit structure which executes this proposed instruction set. It is able to speed up the motion estimation by using this instruction set. I evaluated the number of cycles that required to the SAD operations of nine points and the rate of speeding up between the MPSADBW instruction of x86 processor and the proposed highly parallel SAD operation instruction set. As a result, the performance of processing speed improved about 4.6 times and it was sped up the motion estimation.

1 1 1.1............................ 1 1.2............................ 1 2 SAD 2 2.1............................ 2 2.2........................ 2 2.3....................... 4 2.4........ 5 2.5 MPSADBW.................. 7 2.6 SAD............ 8 3 9 3.1.................... 9 3.1.1 SAD................. 10 3.1.2........................ 11 3.1.3........................ 11 3.1.4................... 11 3.1.5 SAD........ 12 3.2............................ 21 3.2.1 SAD................ 21 3.2.2............ 24 4 25 5 31 31 32 A (mjpegtools ) 33 A.1 mjpegtools.................. 33 A.2............................ 33 i

B x264 34 B.1......................... 34 B.2.............................. 35 C yasm 35 ii

2.1......... 3 2.2 SAD.......................... 4 2.3..................... 5 2.4 4x4.............. 6 2.5 MPSADBW SAD.............. 7 2.6 MPSADBW 8 SAD........ 8 3.7......................... 13 3.8 16x16 SAD................... 14 3.9 16x8 SAD................... 15 3.10 8x16 SAD................... 17 3.11 8x8 SAD.................... 17 3.12 8x4 SAD.................... 18 3.13 16 SAD.............................. 19 3.14 8 SAD 20 3.15 SAD.................. 22 3.16.......... 23 3.17............... 25 4.18 SAD............... 26 4.19................ 27 4.20 HD SAD................ 28 4.21 4Kx2K SAD.............. 29 4.22 UHD SAD............... 29 iii

2.1 MPSADBW........... 9 4.2 9 SAD............................. 26 4.3 SAD.................... 30 4.4..................... 30 iv

1 1.1 (High Definition : HD) 16 (Ultra High Definition : UHD) 2020 H.264/AVC [1,2] 7 1.2 (SAD) SAD SAD 1 () 3x3 (SAD ) x86 SSE4(Streaming SIMD Extensions 4) SIMD(Single Instruction stream Multiple Data stream) 1 SAD MPSADBW(Multiple Packed Sums of Absolute Difference Byte Word) [3 5] MPSADBW SAD MPSADBW SAD 3x3 SAD 3x3 SAD 3x3 SAD SAD 1

2 SAD 2.1 2 2 (SAD) SAD SAD 1 2.2 1 3x3 ( ) 2.1 2.1 4x4 9 9 8 3x3 () 2

1 2 3 4x4 4 5 6 7 8 9 1 2.1: EPZS EPZS 3

2.3 2 2 (Sum of Absolute Differences : SAD) C R MxN C R SAD 1 SAD(C, R) = N 1 y=0 M 1 x=0 C xy R xy (1) M=4N=4 4x4 SAD 2.2 C 123 47 39 84 5 18 2 8 124 103 49 54 38 45 86 71 C - R 9 20 3 12 C - R 15 7 2 15 163 126 47 35 76 6 31 1 9 SAD R 128 65 41 76 133 69 41 74 88 61 47 56 132 78 36 67 2.2: SAD SAD SAD 2 4

SAD 2.4 H.264/AVC 7 16x1616x88x168x88x44x84x4 7 2.3 16x16 16x8 8x16 8x8 8x4 4x8 4x4 2.3: 16x1616x88x168x88x44x8 6 7 4x4 4x4 2.4 7 1 SAD 16x1616x8 16 8x168x88x4 8 SAD 4x84x4 4 4x4 4x4 SAD 5

16 16 16 16 8 16x16 16x8 8 8 8 8 16 4 8 8x16 8x8 8x4 4 4 4 8 4 4x8 4x4 2.4: 4x4 4x4 SAD SAD 16x16 4x4 SAD 16 16 SAD 16x16 SAD 6

4x4 H.264/AVC SAD 2.5 MPSADBW MPSADBW x86 SIMD SSE4 1 MPSADBW 2 4 SAD 8 1 SAD 4 8 8 1 2 SAD 4 1 8 SAD SAD 1 MPSADBW SAD SAD MPSADBW SAD 2.5 MPSADBW 8 SAD 2.6 23 5 73 56 8 13 5 72 28 35 43 16 18 34 27 95 15 1 69 43 5 13 6 87 64 7 45 6 54 38 20 86 MPSADBW SAD 184 142 86 125 96 87 131 56 H G F E D C B A 2.5: MPSADBW SAD 7

54 38 20 86 SAD [A] SAD [B] 18 34 27 95 16 18 34 27 SAD [C] SAD [D] SAD [E] 43 16 18 35 43 16 18 28 35 43 16 34 SAD [F] SAD [G] 72 28 35 5 72 28 35 43 SAD [H] 13 5 72 28 2.6: MPSADBW 8 SAD MPSADBW 8 SAD SAD 3x3 SAD MPSADBW SAD 2.6 SAD MPSADBW 2.1 3 MPSADBW (3x3) MPSADBW (3x3) MPSADBW (5x3) 8

5x 3 MPSADBW 1 2.1: MPSADBW [WxH] [3x3] [3x3] [5x3] MPSADBW [] 1 1.08 1.18 MPSADBW 1.18 9 9 SAD 9 SAD SAD 3 3.1 9 SAD SAD SAD (Highly Parallel Multiple Packed Sums of Absolute Difference Byte Word : HPMPSADBW) : 3x3 9 SAD 9

(Move Input : MovIn) : SAD (Move Output : MovOut) : SAD (Add and Compare : AddCom) : SAD SAD SAD SAD 3.1.1 SAD SAD (Highly Parallel Multiple Packed Sums of Absolute Difference Byte Word : HPMPSADBW) 3x3 9 SAD 4x1 SAD 36 36 4x1 SAD 16 8 16 SAD 1 8 SAD 2 SAD hpmpsadbw 3 1 (r1) 2 (r2) 3 0 1 (i) (i) 0 16 SAD 1 8 SAD hpmpsadbw r1r2i hpmpsadbw addcom hpmpsadbw addcom 10

3.1.2 (Move Input : MovIn) SAD SAD 1 3 SAD 1 SAD 2 SAD 1 9 SAD movin 1 1 SAD (r1) movin r1 3.1.3 (Move Output : MovOut) SAD SAD 0 movout 1 1 (r1) 0 movout r1 3.1.4 (Add and Compare : AddCom) SAD 2 2 SAD 11

SAD 2 addcom 4 1 (r1) 23 (r2r3) 4 SAD (r4) addcom r1r2r3r4 addcom hpmpsadbw addcom hpmpsadbw 3.1.5 SAD 9 SAD SAD SAD SAD hpmpsadbw addcom 256bit R0R36 576bit T0T5 2 32bit 8 r0r295 144bit 4 t0t19 3.7 R0R15 R16R33 SAD SAD SAD 12

16x16 SAD 3.8 16x8 SAD 3.9 8x16 SAD 3.10 8x8 SAD 3.11 8x4 SAD 3.12 256bit 32bit 576bit 144bit R0 r7 r6 r5 r4 r3 r2 r1 r0 T0 t3 t2 t1 t0 R1 T1 Rn Tn 3.7: 1 movin R16 2 movin R17 3 hpmpsadbw R0, R18, 0 4 hpmpsadbw R1, R19, 0 5 hpmpsadbw R2, R20, 0 6 hpmpsadbw R3, R21, 0 7 movout T0 8 hpmpsadbw R4, R22, 0 # 8, 9 9 addcom t4, t3, t2, r272 10 hpmpsadbw R5, R23, 0 # 10, 11 11 addcom t5, t1, t0, r272 13

12 hpmpsadbw R6, R24, 0 13 hpmpsadbw R7, R25, 0 14 movout T0 15 hpmpsadbw R8, R26, 0 # 15, 16 16 addcom t6, t3, t2, r272 17 hpmpsadbw R9, R27, 0 # 17, 18 18 addcom t7, t1, t0, r272 19 hpmpsadbw R10, R28, 0 # 19, 20 20 addcom t4, t4, t6, r272 21 hpmpsadbw R11, R29, 0 # 21, 22 22 addcom t5, t5, t7, r272 23 movout T0 24 hpmpsadbw R12, R30, 0 # 24, 25 25 addcom t6, t3, t2, r272 26 hpmpsadbw R13, R31, 0 # 26, 27 27 addcom t7, t1, t0, r272 28 hpmpsadbw R14, R32, 0 # 28, 29 29 addcom t8, t4, t5, r272 30 hpmpsadbw R15, R33, 0 31 movout T0 32 addcom t3, t3, t2, r273 33 addcom t1, t1, t0, r273 34 addcom t6, t6, t3, r273 35 addcom t7, t7, t1, r273 36 addcom t9, t6, t7, r273 37 addcom t10, t4, t6, r274 38 addcom t11, t5, t7, r274 39 addcom t12, t8, t9, r274 40 addcom t13, t12, t12, r275 41 addcom t14, t10, t11, r276 3.8: 16x16 SAD 14

1 movin R16 2 movin R17 3 hpmpsadbw R0, R18, 0 4 hpmpsadbw R1, R19, 0 5 hpmpsadbw R2, R20, 0 6 hpmpsadbw R3, R21, 0 7 movout T0 8 hpmpsadbw R4, R22, 0 # 8, 9 9 addcom t8, t3, t2, r272 10 hpmpsadbw R5, R23, 0 # 10, 11 11 addcom t9, t1, t0, r273 12 hpmpsadbw R6, R24, 0 13 hpmpsadbw R7, R25, 0 14 movout T1 15 addcom t10, t7, t6, r274 16 addcom t11, t5, t4, r275 17 addcom t12, t8, t10, r276 18 addcom t13, t9, t11, r277 19 addcom t14, t12, t13, r278 20 addcom t15, t3, t7, r279 21 addcom t16, t2, t6, r279 22 addcom t17, t1, t5, r279 23 addcom t18, t0, t4, r279 24 addcom t19, t14, t14, r279 25 addcom t19, t15, t16, r280 26 addcom t19, t17, t18, r281 3.9: 16x8 SAD 15

1 movin R16 2 movin R17 3 hpmpsadbw R0, R18, 1 4 hpmpsadbw R1, R20, 1 5 movout T0 6 hpmpsadbw R4, R22, 1 # 6, 7 7 addcom t4, t3, t1, r272 8 hpmpsadbw R5, R24, 1 # 8, 9 9 addcom t5, t2, t0, r273 10 movout T0 11 hpmpsadbw R8, R26, 1 # 11, 12 12 addcom t6, t3, t1, r274 13 hpmpsadbw R9, R28, 1 # 13, 14 14 addcom t7, t2, t0, r275 15 movout T0 16 hpmpsadbw R12, R30, 1 # 16, 17 17 addcom t8, t3, t1, r276 18 hpmpsadbw R13, R32, 1 # 18, 19 19 addcom t9, t2, t0, r277 20 movout T0 21 addcom t3, t3, t1, r278 22 addcom t2, t2, t0, r279 23 addcom t0, t4, t5, r280 24 addcom t1, t6, t7, r281 25 addcom t10, t8, t9, r282 26 addcom t11, t3, t2, r283 27 addcom t12, t4, t6, r284 28 addcom t13, t5, t7, r284 29 addcom t14, t8, t3, r284 30 addcom t15, t9, t2, r284 31 addcom t16, t0, t1, r284 32 addcom t17, t10, t11, r285 33 addcom t18, t16, t17, r286 16

34 addcom t19, t18, t18, r288 35 addcom t19, t12, t13, r289 36 addcom t19, t14, t15, r290 3.10: 8x16 SAD 1 movin R16 2 movin R17 3 hpmpsadbw R0, R18, 1 4 hpmpsadbw R1, R20, 1 5 movout T0 6 hpmpsadbw R4, R22, 1 # 6, 7 7 addcom t4, t3, t1, r272 8 hpmpsadbw R5, R24, 1 # 8, 9 9 addcom t5, t2, t0, r273 10 movout T0 11 addcom t3, t3, t1, r274 12 addcom t2, t2, t0, r275 13 addcom t0, t4, t5, r276 14 addcom t1, t3, t2, r277 15 addcom t6, t4, t3, r278 16 addcom t7, t5, t2, r278 17 addcom t8, t0, t1, r278 18 addcom t9, t8, t8, r279 19 addcom t9, t6, t7, r280 3.11: 8x8 SAD 17

1 movin R16 2 movin R17 3 hpmpsadbw R0, R18, 1 4 hpmpsadbw R1, R20, 1 5 movout T0 6 addcom t3, t3, t1, r272 7 addcom t2, t2, t0, r273 8 addcom t1, t3, t2, r274 9 addcom t0, t1, t1, r275 3.12: 8x4 SAD SAD 16 3.13 8 3.14 16 1 SAD 4x4 SAD SAD 8 2 SAD 4x2 SAD SAD SAD 18

4x4 A B C D 16x16 E F G H I J K L 16 M N O P 16 16x8 8x16 8x16 16x8 8x8 8x8 8x8 8x8 8x4 8x4 8x4 8x4 8x4 8x4 8x4 8x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 A B C D E F G H I J K L M N O P 4x8 4x8 4x8 4x8 4x8 4x8 4x8 4x8 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 A B C D E F G H I J K L M N O P 3.13: 16 SAD 19

4x2 4x1 8x16 A a0 a1 4x4 a0 c0 a1 c1 e0 g0 e1 g1 i0 k0 i1 k1 m0 o0 m1 o1 8 b0 d0 b1 d1 f0 h0 f1 h1 j0 l0 j1 l1 n0 p0 n1 p1 16 8x8 8x8 8x4 4x8 4x8 8x4 8x4 4x8 4x8 8x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x4 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 4x2 A B C D E F G H I J K L M N O P 3.14: 8 SAD 20

3.2 SAD 2 2 3.2.1 SAD SAD 16 SAD 1 8 SAD 2 16 8 1 SAD SIMD 6 4 4 2 6 4x1 SAD SAD 6 4 SAD SAD SAD SAD () 4 16 8 0 16 1 8 2 SAD SAD 3.15 21

SAD 576 80 80 80 80 256 256 3.15: SAD 22

4x1 SAD 4x1 SAD SAD 3.16 16 16 16 4x1 SAD 16 16 8 8 8 8 32 32 3.16: 23

3.2.2 SAD SAD SAD SAD 3.7 576 n 144 SAD SIMD 9 SAD SAD 144 2 16 SAD 2 32 32 SAD 24

32 144 16 16 144 144 576 SAD 3.17: 4 H.264/AVC SAD 16 8 SAD x86 MPSADBW 9 SAD 25

4.2 MPSADBW SAD MPSADBW SAD 4.2: 9 SAD [WxH] 16x16 16x8 8x16 8x8 8x4 MPSADBW 219 123 123 60 27 35 24 32 18 10 9 SAD MPSADBW 4.18 SAD MP- SADBW SAD 1 4.18: SAD MPSADBW 4.19 26

MPSADBW 1 4.19: 4.18 9 SAD MPSADBW 26% 4.19 9 SAD MPSADBW 4.3 9 SAD SAD (hpmpsadbw ) SAD (addcom ) SAD SAD 2.7 H.264/AVC x264 x264 27

x264 SAD CrowdRunDucksTakeOffOldTownCross 3 500 (HD : 1920x1080)4Kx2K(4Kx2K : 3840x2160) (UHD : 7680x4320) 3 4Kx2K MPSADBW 16x1616x88x168x88x4 SAD HD 4.20 4Kx2K 4.21 UHD 4.22 4.20: HD SAD 28

4.21: 4Kx2K SAD 4.22: UHD SAD 29

x264 SAD 4.3 4.4 4.3 4.4 MPSADBW 1 4.3: SAD HD 4Kx2K UHD MPSADBW 1 1 1 1 0.22 0.21 0.22 0.22 4.4: HD 4Kx2K UHD MPSADBW 1 1 1 1 4.52 4.78 4.62 4.64 x264 SAD 78% 4.6 9 SAD SAD 30

5 SIMD 3x3 9 SAD H.264/AVC SAD x86 MPSADBW 4.6 SAD SAD 8 SAD 2 31

[1] ITU-TH.264http://www.itu.int/rec/T-REC-H.264/e2011/6 [2] () H.264/AVC 2006 [3] IntelIntel 64 and IA-32 Architectures Software Developer s Manual Volume 1: Basic Architecture253665-041US2011/12 [4] IntelIntel 64 and IA-32 Architectures Software Developer s Manual Combined Volumes 2A, 2B, and 2C: Instruction Set Reference, A-Z 325383-041US2011/12 [5] Intel 64 IA-32 248966-024JA2011/4 32

A (mjpegtools ) A.1 mjpegtools mjpegtools x264 Y4M mjpegtools (mjpegtools-[ ].tar.gz) http://sourceforge.net/projects/mjpeg/files/mjpegtools/ mjpegtools (x86 [moule ] )./configure --prefix=/home/username/mjpegtools ( ) make install () /home/username/mjpegtools/bin/ ppmtoy4m A.2 cat *.ppm > input.txt (PPM 1 )./ppmtoy4m -o 0 -n 500 -I p -F 25:1 -S 420mpeg2 < input.txt > output.y4m ppmtoy4m ( ) -o : -n : -F : (fps) 33

B x264 B.1 x264 (last_x264.tar.bz2) http://www.videolan.org/developers/x264.html x264./configure --prefix=/home/username/x264 ( ) make (x264 ) make install (x264 ) /home/username/x264/bin x264./configure yasm (--disable-asm)./configure Found yasm 0.7.2.2153 <- Minimum version is yasm-1.0.0 <- If you really want to compile without asm, configure with --disable-asm. yasm configure./configure : AS= yasm : AS= /home/username/yasm/bin/yasm 34

B.2 ( )./x264 --psnr --partitions p8x8,p4x4,b8x8,i8x8,i4x4 -o output.264 input.y4m C yasm yasm (yasm-[].tar.gz) http://yasm.tortall.net/download.html [Source.tar.gz] yasm./configure --prefix=/home/username/yasm ( ) make install ( ) /home/username/yasm/bin/ yasm 35