1,3,a) 2,3 2,3 CPU WA iodrive WA P-WA I/O P-WA SN CAS Compare And Swap fetch-and-add P-WA TPC-C New-Order 3 UPDATE 1 P-WA 425,531 tps WA 2.42 TPC-C New-Order P-WA 17,391 tps WA 1.18 1. 10 [15] MasterCard 4 / [13] [5] [1][2][6][12]. Atomicity Consistency Isolation Durability AD 1 Graduate School of Systems and Information Engineering, University of Tsukuba 2 Faculty of Engineering, Information and Systems, University of Tsukuba 3 CREST JST CREST a) kamiya@hpcs.cs.tsukuba.ac.jp AD SQ SQ WA: Write Ahead ogging [4] WA WA WA HDD WA I/O PostgreSQ[17] Oracle[16] [7] iodrive[3] HDD I/O WA iodrive WA P-WA [22] P-WA WA WA WA WA WA WA c 2015 Information Processing Society of Japan 1
WA WA iodrive P-WA [22] SN CAS CAS SN [10] fetch-and-add [22] TPC-C fetch-and-add TPC-C ARIES[14] WA WA WA ARIES WA Aether[7] Deuteronomy[11][10] [20] Aether Deuteronomy WA WA WA WA Silo[19][21] FOEDUS[8] ARIES 2 WA 3 WA P-WA 4 TPC-C New-Order P-WA 5 6 2. WA Write-Ahead ogging WA WA Worker Thread : og Record In Memory HDD Worker Worker Worker WA File 1 WA Buffer WA Algorithm 1 log insert with lock(log) 1: WAbuffer.lock() #WA 2: SN WAbuffer.insert(log) 3: if log.type == COMMIT then 4: # 5: WAbuffer.ncommit WAbuffer.ncommit +1 6: if WAbuffer.ncommit == NGROUP or WAbuffer.full then 7: WAbuffer.flush() #WA WA 8: end if 9: end if 10: WAbuffer.unlock() #WA 11: return SN WA WA 1 [22] WA ACID Atomicity Durability WA WA WA 2.1 WA log insert with lock Algorithm 1 WA 1 WA 2 ID SN og Sequence Number c 2015 Information Processing Society of Japan 2
Worker Thread : og Record WA Buffer In Memory iodrive Worker Worker Worker File 1 File 2 File N Algorithm 2 log insert(log,buffer id) 1: #buffer id WA 2: SN WAbuffer[buffer id].insert(log) 3: if log.type == COMMIT then 4: # 5: WAbuffer[buffer id].ncommit WAbuffer[buffer id]. ncommit +1 6: if WAbuffer[buffer id].ncommit == NGROUP or WAbuffer[buffer id].full() then 7: WAbuffer[buffer id].flush() #WA WA 8: end if 9: end if 10: return SN 2 P-WA WA 10 WA WA 6 WA WA 7 WA WA WA WA WA I/O 3. P -WA WA 3.1 P -WA iodrive WA WA WA -P-WA 2 [22] P-WA 1 WA WA 3.1.1 WA WA WA WA WA WA 3.1.2 WA WA WA P-WA WA WA WA WA 3.1.3 (log insert) P-WA WA log insert Algorithm 2 log insert with lock Algorithm 1 WA buffer id WA buffer id WA 2 WA 6 WA WA 7 3.2 P-WA WA N O NlogN 3.2.1 WA WA WA WA P-WA WA WA WA SN og Sequence Number WA SN ID[5] SN WA SN WA c 2015 Information Processing Society of Japan 3
1 (1) CPU Intel R Xeon R CPU E5-2665 2 8 2 64GB iodrive SC, 160GB, VRG5T VS v3.3.3, ow-evel Formatting Throughput(tps) 450,000 400,000 350,000 300,000 250,000 200,000 150,000 100,000 50,000 P-WA SN WA SN 3.2.2 fetch-and-add SN SN global SN global SN global SN [22] CAS Compare And Swap CAS SN [10] fetch-and-add 4. 1 4.1 4.1.1 UPDATE 3 (UPDATE-1, UPDATE-5&READ-5, UPDATE-10 WA WA P-WA 512 Bytes 16 65536 int READ UPDATE READ UPDATE UPDATE-1 UPDATE 1 [BEGIN,UPDATE, END] 3 UPDATE- 5&READ-5 UPDATE 5 READ 5 [BEGIN,UPDATE 5,END] 7 UPDATE-10 UPDATE 10 [BEGIN,UPDATE 10,END] 12 Throughput(tps) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 スレッド数 UPDATE- 1(P- WA) UPDATE- 1( 従来 WA) UPDATE- 5&READ- 5(P- WA) UPDATE- 5&READ- 5( 従来 WA) UPDATE- 10(P- WA) UPDATE- 10( 従来 WA) 3 3 450,000 400,000 350,000 300,000 250,000 200,000 150,000 100,000 50,000 0 UPDATE- 1 UPDATE- 5&READ- 5 UPDATE- 10 P- WA 従来 WA 4 16 3 4.1.2 P-WA WA 3 16 P-WA WA 4 16 4 UPDATE-1 P-WA 425,531 tps WA 175,131 tps 2.42 UPDATE-5&READ-5 P-WA WA 1.75 UPDATE-10 P-WA WA 1.91 4.2 TPC-C 4.2.1 TPC-C New-Order WA WA P-WA New-Order TPC-C 4.1 2 1024 Bytes16 c 2015 Information Processing Society of Japan 4
Algorithm 3 New-Order 1: BEGIN; 2: SEECT FROM Warehouse; 3: SEECT FROM District; 4: UPDATE District; 5: INSERT INTO Order; 6: INSERT INTO NewOrder; 7: 8: OOP 9: SEECT FROM Item; 10: SEECT FROM Stock; 11: UPDATE Stock; 12: INSERT INTO Orderine; 13: END OOP 14: 15: COMMIT; Throughput(tps) 20,000 18,000 16,000 14,000 12,000 10,000 8,000 6,000 4,000 2,000 0 5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 スレッド数 P- WA 従来 WA TPC-C New-Order ol cnt 1% rollback SEECT 2+2*ol cnt UPDATE 1+ol cnt INSERT 2+ol cnt New-Order SQ Alogorithm 3 4.2.2 5 1 6 P-WA WA SEECT INSERT WA P-WA 16 P-WA 17,391 tps WA 14,705 tps 1.18 4.1 SN New-Order INSERT P-WA 5. WA 2 ARIES WA ARIES Aether[7] Deuteronomy[11][10] [20] Aether WA Deuteronomy KVS KVS I/O WA WA WA WA SN GSN GSN WA WA ARIES WA [14] ARIES WA Silo[19][21] FOEDUS[8] Silo FOEDUS epoch [9] ARIES 6. iodrive c 2015 Information Processing Society of Japan 5
P-WA fetch-and-add SN TPC-C UPDATE 1 P-WA 425,531 tps WA 2.42 TPC-C New-Order P-WA 17,391 tps WA 1.18 SN S2P P-WA JST CREST JST CREST EBD JST CREST #25280043HA [1] Chatzistergiou, A., Cintra, M. and Viglas, S. D.: REWIND: Recovery Write-Ahead System for In-Memory Non-Volatile Data-Structures, PVDB, Vol. 8, No. 5, pp. 497 508 (2015). [2] Coburn, J., Bunker, T., Schwarz, M., Gupta, R. and Swanson, S.: From ARIES to MARS: transaction support for next-generation, solid-state drives, SOSP, pp. 197 212 (2013). [3] Fusion-io: Application Acceleration Enterprise Flash Memory Platform Fusion-io:, http://www.fusionio. com/. : 2015-04-15. [4] Gawlick, D., Gray, J., Iimura, W. and Obermarck, R.: Method and apparatus for logging journal data using a log write ahead data set (1985). US Patent 4,507,751. [5] Gray, J.: The Transaction Concept: Virtues and imitations, VDB, pp. 144 154 (1981). [6] Huang, J., Schwan, K. and Qureshi, M. K.: NVRAMaware ogging in Transaction Systems, PVDB, Vol. 8, No. 4, pp. 389 400 (2014). [7] Johnson, R., Pandis, I., Stoica, R., Athanassoulis, M. and Ailamaki, A.: Aether: A Scalable Approach to ogging, PVDB, Vol. 3, No. 1, pp. 681 692 (2010). [8] Kimura, H.: FOEDUS: OTP Engine for a Thousand Cores and NVRAM, SIGMOD Conference (2015). [9] Kung, H. T. and Robinson, J. T.: On Optimistic Methods for Concurrency Control, ACM Trans. Database Syst., Vol. 6, No. 2, pp. 213 226 (1981). [10] evandoski, J., omet, D., Sengupta, S., Stutsman, R. and Wang, R.: High Performance Transactions in Deuteronomy, CIDR (2015). [11] evandoski, J. J., omet, D., Mokbel, M. F. and Zhao, K.: Deuteronomy: Transaction Support for Cloud Data, CIDR, pp. 123 133 (2011). [12] Malviya, N., Weisberg, A., Madden, S. and Stonebraker, M.: Rethinking main memory OTP recovery, ICDE, pp. 604 615 (2014). [13] MasterCard: Processing: Brilliance Behind the Scenes of Commerce MasterCard, http: //www.mastercard.com/us/company/en/whatwedo/ processing_brilliance_behind_commerce.html. : 2015-04-15. [14] Mohan, C., Haderle, D. J., indsay, B. G., Pirahesh, H. and Schwarz, P. M.: ARIES: A Transaction Recovery Method Supporting Fine-Granularity ocking and Partial Rollbacks Using Write-Ahead ogging, ACM Trans. Database Syst, Vol. 17, No. 1, pp. 94 162 (1992). [15] NYSE: NYSE, New York Stock Exchange > About Us > News & Events > News Releases > Press Release 06-03- 2009:, http://www1.nyse.com/press/1244024115279. html. : 2015-04-15. [16] Oracle: Oracle Hardware and Software, Engineered to Work Together:, http://www.oracle.com/index.html. : 2015-05-01. [17] PostgreSQ: PostgreSQ: The world s most advanced open source database:, http://www.postgresql.org/. : 2015-05-01. [18] Ramakrishnan, R. and Gehrke, J.: Database Management Systems, McGraw Hill Higher Education (2002). [19] Tu, S., Zheng, W., Kohler, E., iskov, B. and Madden, S.: Speedy transactions in multicore in-memory databases, SOSP, pp. 18 32 (2013). [20] Wang, T. and Johnson, R.: Scalable ogging through Emerging Non-Volatile Memory, PVDB, Vol. 7, No. 10, pp. 865 876 (2014). [21] Zheng, W., Tu, S., Kohler, E. and iskov, B.: Fast Databases with Fast Durability and Recovery Through Multicore Parallelism, OSDI, pp. 465 477 (2014). [22],, : P-WA:, OS Vol. 2015-OS-133, No. 18, pp. 1 10 (2015). c 2015 Information Processing Society of Japan 6