DEIM Forum 12 C2-6 Hadoop 112-86 2-1-1 E-mail: momo@ogl.is.ocha.ac.jp, oguchi@computer.org Web Hadoop Distributed File System Hadoop I/O I/O Hadoop A Study about the Remote Data Access Control for Hadoop Distributed File System Asuka MOMOSE and Masato OGUCHI Ochanomizu University 2-1-1 Otsuka, Bunkyouku Tokyo 112-86 JAPAN E-mail: momo@ogl.is.ocha.ac.jp, oguchi@computer.org 1. [1] Hadoop 1
Distributed File System HDFS [2] 2. Web Web 2. 1 Hadoop Distributed File System Hadoop Distributed File System HDFS [2] Hadoop Apache Software Foundation Hadoop Common HDFS MapReduce hbase 1 HDFS Hadoop MapReduce 1 Hadoop 2. 2 Hadoop HDFS Namenode JobTracker TastTracker 2 Hadoop MapReduce Map Reduce Mapper Reducer TaskTracker Hadoop 3. Rocks [3] Hadoop-..2 Hadoop 1. 2.MB Hadoop JobTracker TaskTracker Master Slave 2 Namenode TaskTracker TaskTracker Slave Slave Task Hadoop TestDFSIO Map/Reduce URL, Map N-gram URL Reduce N-gram n= 4. 4. 1 dummynet Namenode1 3 1 3 2 LocalArea Namenode 3 dummynet 1 OS CPU Main Memory Master node Linux 2.6.9-..2. Elsmp(CentOS 4) Intel(R) Xeon(R) @3.6GHz 4.GB Slave node Linux 2.6.9-..2. Elsmp(CentOS 4) Quad-Core Intel(R) Xeon(R) @1.6GHz 2.GB 2
4. 2 TestDFSIO TestDFSIO MB I/O Throughput ( RTT) msec msec 1 4 1 T hroughput ( M B/se c ) T hroughput (MB/sec ) 2 4 6 8 12 14 16 18 4 Write 1Replica 2Replica 3Replica 2 4 6 8 12 14 16 18 Read 1Replica 2Replica 3Replica 1 4. 3 HDFS 12 RTT 6 HDFS exec time(sec) 9 8 7 6 4 2 4 6 8 12 14 16 18 6 4. 4 Hadoop RandomWrite MB, 1GB RandomWrite RandomWrite Sort RTT ( 7 8) TestDFS I/O Sort Test exec time (sec) msec 2msec msec msec RTT 7 RandomWrite. Hadoop 3
Test exec time (sec) 7 6 4 msec 2msec msec msec RTT Namenode 4 4 3 Local node 1 Local node 2 8 Sort write1 write2 write3 write4 write. 1 Wireshark [4] 3 Namenode Wireshark Namenode. 2 1 9 3 RTT HDFS 3 write Local node 1 Local node 2 read1 read2 read3 read4 read 11 read 9 8 7 Local node 1 6 Local node 2 4 Total 6 16 9 1 11 RTT=msec 3 1 Namenode 1 Namenode % %. 3 HDFS 6. HDFS Hadoop 6. 1 Hadoop 12 4
rack Namenode 12 rack dummynet rack1 Remote 13 HDFS RTTmsec 1.9% % % % % % -% 6. 2 Hadoop (i)1st (ii)2nd Remote rack Local rack HDFS 2Remote rack % Hadoop HDFS 2 (Remote rack : Local rack) Math.random.,. : % - %.4,.6 9 : 11 4% - %.3,.7 8 : 12 4% - 6%.2,.8 7 : 13 3% - 6%.1,.9 6 : 14 % - 7% 6. 3 HDFS 3 Math.Random.8 2 13:7 3 (Remote rack : Local rack) Rack Node Blocks rack 1 122 rack 2 127 rack1 3 62 rack1 4 6 7. HDFS 7. 1 HDFS HDFS time(sec) xec e 3 13 % 3% 4% 4% % 4 [1] HDFS 14 optimized rack MB/sec) ( optimized rack simple rack 2 4 6 8 12 14 16 18 14 Write HDFS HDFS %
MB/sec) ( optimized rack simple rack 2 4 6 8 12 14 16 18 8. 2 HDFS HDFS HDFS 9. Read 7. 2 1.9% RTTmsec RTTmsec RTTmsec 22.9% RTTmsec - RTTmsec - 8. 8. 1 Web Hadoop Distributed File System Hadoop I/O I/O [1] Hadoop DE &PRMU ( ) 6 Vol.111 No.76 pp.19-24 11 6. [2] Dhruba Borthakur HDFS Architecture 8 The Apache Software Foundation [3] Rocks Cluster http://www.rocksclusters.org/ [4] Wireshark http://www.wireshark.org/ [] Sanjay Ghemawat Howard Gobioff and Shun-Tak Leung The Google File System ACM SIGOPS Operating Systems Review, Vol.37, No., pp.29-43, December 3 [6] Tom WhiteHadoop O Reilly Japan, Inc [7] Jason Venner Pro Hadoop 9 Apress [8] Gfarm v2 7-HPC-113 pp.7-12 7 12 [9] Gfarm MapReduce SWopp Vol.-HPC-126 8 6