2 2015 4 20
1 (4/13) : ruby 2 / 49
2 ( ) : gnuplot 3 / 49
1 1 2014 6 IIJ / 4 / 49
1 ( ) / 5 / 49
( ) 6 / 49
(summary statistics) : (mean) (median) (mode) : (range) (variance) (standard deviation) 7 / 49
(mean): x = 1 n (median): { xr+1 m, m = 2r + 1 x median = (x r + x r+1 )/2 m, m = 2r n i=1 (mode): x i f(x) mode median mean median mode mean x 8 / 49
(percentiles) pth-percentile: p% median = 50th-percentile 100 90 80 total observations (%) 70 60 50 40 30 20 10 0-4 -3-2 -1 0 1 2 3 4 sorted variable x 9 / 49
(range): (variance): σ 2 = 1 n (x i x) 2 n i=1 (standatd deviation): σ 68% (mean ± stddev) 95% (mean ± 2stddev) f(x) 1 mean median exp(-x**2/2) 0.8 0.6 σ 0.4 0.2 0-5 -4-3 -2-1 0 1 2 3 4 5 68% x 95% 10 / 49
(variance): σ 2 = 1 n (x i x) 2 n i=1 σ 2 = 1 n (x i x) 2 n i=1 = 1 n (x 2 i n 2x i x + x 2 ) i=1 = 1 n n ( x 2 i 2 x n x i + n x 2 ) i=1 i=1 = 1 n x 2 i n 2 x2 + x 2 i=1 = 1 n x 2 i n x2 i=1 11 / 49
: 12 / 49
: 1/N ( ) 1/N : 1 13 / 49
: ( ) ( ) (population): (sample) : ( ) : ( ) population samples estimate estimate 14 / 49
( ) N(µ, σ/ n) n 15 / 49
(normal distribution) N(µ, σ) 2 : µ σ f(x) 1 mean median exp(-x**2/2) 0.8 0.6 σ 0.4 0.2 0-5 -4-3 -2-1 0 1 2 3 4 5 68% x 95% 16 / 49
(sample mean): x x = 1 n n i=1 x i (sample variance): s 2 s 2 = 1 n 1 n (x i x) 2 i=1 (sample standard deviation): s : n (n 1) (degree of freedom): x 1 17 / 49
(standard error) : (SE) SE = σ/ n n 1/ n ( ) N(µ, σ) µ SE = σ/ n 18 / 49
(sample variance): s 2 s 2 = 1 n 1 n (x i x) 2 i=1 (n 1) x µ S 2 σ 2 x µ N(µ, σ/ n) (n 1)/n E(S 2 ) = n 1 n σ2 σ 2 = n n 1 S2 = 1 n 1 n (x i x) 2 i=1 19 / 49
normalized traffic volume 4 2 0-2 -4 cdf 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 1.5 1 0.5 0-0.5-1 0 500 1000 1500 2000 2500 3000 3500 time (sec) 0-4 -3-2 -1 0 1 2 3 4 normalized traffic volume -1.5-1.5-1 -0.5 0 0.5 1 1.5 20 / 49
: sample data from a book: P. K. Janert Gnuplot in Action # Minutes Count 133 1 134 7 135 1 136 4 137 3 138 3 141 7 142 24... :2,355 :171.3 :14.1 :176 21 / 49
: (2) 180 160 140 120 count 100 80 60 40 20 0 120 140 160 180 200 220 240 finish time (minutes) 22 / 49
: (3) 2500 2000 1500 rank 1000 500 0 120 140 160 180 200 220 240 finish time (minutes) 23 / 49
XY XY : 0 ( ) XY 3D ( : ) 24 / 49
25 / 49
X Y 4 normalized traffic volume 2 0-2 -4 0 500 1000 1500 2000 2500 3000 3500 time (sec) 26 / 49
(1/2) X : Y : 160 140 120 frequency 100 80 60 40 20 0-4 -3-2 -1 0 1 2 3 4 normalized traffic volume 27 / 49
(2/2) ( ) ( ) 28 / 49
(probability density function; pdf) 1 : X x f(x) = P [X = x] 0.04 0.035 0.03 0.025 pdf 0.02 0.015 0.01 0.005 0-4 -3-2 -1 0 1 2 3 4 normalized traffic volume 29 / 49
(cumulative distribution function; cdf) : x f(x) = P [X = x] : x F (x) = P [X <= x] 1 0.9 0.8 0.7 0.6 cdf 0.5 0.4 0.3 0.2 0.1 0-4 -3-2 -1 0 1 2 3 4 normalized traffic volume 30 / 49
CDF CDF CDF 1800 ping rtt 18 ping rtt 1600 16 1400 14 1200 12 histogram 1000 800 histogram 10 8 600 6 400 4 200 2 0 300 400 500 600 700 800 900 1000 response time (msec) 0 300 400 500 600 700 800 900 1000 response time (msec) 1 0.9 0.8 0.7 0.6 CDF 0.5 0.4 0.3 0.2 0.1 8241 samples 100 samples 0 300 400 500 600 700 800 900 1000 response time (msec) ( ) ( )100 ( )CDF 31 / 49
(interquartile range) interquartile range (IQR): ( - ) ( 50%) ( ): ( ) : 25/50/75-percentiles : min/max inner fance (Q 1 1.5IQR, Q 3 + 1.5IQR) max upper quartile mean median lower quartile min 32 / 49
(original vs 100 samples) : min max 2000 1 0.9 1500 0.8 0.7 1000 CDF 0.6 0.5 0.4 500 0.3 0.2 0 original 100 samples 0.1 8241 samples 100 samples 0 300 400 500 600 700 800 900 1000 response time (msec) 33 / 49
(scatter plots) 2 X : X Y : Y X Y 1.5 1.5 1.5 1 1 1 0.5 0.5 0.5 0 0 0-0.5-0.5-0.5-1 -1-1 -1.5-1.5-1 -0.5 0 0.5 1 1.5-1.5-1.5-1 -0.5 0 0.5 1 1.5-1.5-1.5-1 -0.5 0 0.5 1 1.5 : ( ) 0.7 ( ) 0.0 ( ) -0.5 34 / 49
gnuplot http://gnuplot.info/ grace GUI http://plasma-gate.weizmann.ac.il/grace/ gnuplot Mac: gnuplot Homebrew/MacPorts (XQuatrz ) Windows: windows 35 / 49
: filename = ARGV[0] count = 0 file = open(filename) while text = file.gets count += 1 end file.close puts count count.rb $ ruby count.rb foo.txt Ruby #!/usr/bin/env ruby count = 0 ARGF.each_line do line count += 1 end puts count 36 / 49
: : P. K. Janert Gnuplot in Action http://web.sfc.keio.ac.jp/~kjc/classes/sfc2015s-measurement/marathon.txt 37 / 49
: ( ) # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ sum = 0 # sum of data n = 0 # the number of data ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt end end mean = Float(sum) / n printf "n:%d mean:%.1f\n", n, mean % ruby mean.rb marathon.txt n:2355 mean:171.3 38 / 49
: : σ 2 = 1 n n i=1 (x i x) 2 # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ data = Array.new sum = 0 # sum of data n = 0 # the number of data ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt for i in 1.. cnt data.push min end end end mean = Float(sum) / n sqsum = 0.0 data.each do i sqsum += (i - mean)**2 end var = sqsum / n stddev = Math.sqrt(var) printf "n:%d mean:%.1f variance:%.1f stddev:%.1f\n", n, mean, var, stddev % ruby stddev.rb marathon.txt n:2355 mean:171.3 variance:199.9 stddev:14.1 39 / 49
: : σ 2 = 1 n n i=1 x2 i x2 # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ sum = 0 # sum of data n = 0 # the number of data sqsum = 0 # sum of squares ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt sqsum += min**2 * cnt end end mean = Float(sum) / n var = Float(sqsum) / n - mean**2 stddev = Math.sqrt(var) printf "n:%d mean:%.1f variance:%.1f stddev:%.1f\n", n, mean, var, stddev % ruby stddev2.rb marathon.txt n:2355 mean:171.3 variance:199.9 stddev:14.1 40 / 49
: # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ data = Array.new ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i for i in 1.. cnt data.push min end end end data.sort! # just in case data is not sorted n = data.length # number of array elements r = n / 2 # when n is odd, n/2 is rounded down if n % 2!= 0 median = data[r] else median = (data[r - 1] + data[r])/2 end printf "r:%d median:%d\n", r, median % ruby median.rb marathon.txt r:1177 median:176 41 / 49
: gnuplot gnuplot 42 / 49
plot "marathon.txt" using 1:2 with boxes ( ) set boxwidth 1 set xlabel "finish time (minutes)" set ylabel "count" set yrange [0:180] set grid y plot "marathon.txt" using 1:2 with boxes notitle 160 140 120 100 80 60 40 "marathon.txt" using 1:2 count 180 160 140 120 100 80 60 40 20 0 120 140 160 180 200 220 240 20 0 120 140 160 180 200 220 240 finish time (minutes) 43 / 49
: CDF : # Minutes Count 133 1 134 7 135 1 136 4 137 3 138 3 141 7 142 24... : # Minutes Count CumulativeCount 133 1 1 134 7 8 135 1 9 136 4 13 137 3 16 138 3 19 141 7 26 142 24 50... 44 / 49
: CDF (2) ruby code: re = /^(\d+)\s+(\d+)/ cum = 0 ARGF.each_line do line begin if re.match(line) # matched time, cnt = $~.captures cum += cnt.to_i puts "#{time}\t#{cnt}\t#{cum}" end end end gnuplot command: set xlabel "finish time (minutes)" set ylabel "CDF" set grid y plot "marathon-cdf.txt" using 1:($3 / 2355) with lines notitle 45 / 49
CDF 1 0.9 0.8 0.7 0.6 CDF 0.5 0.4 0.3 0.2 0.1 0 120 140 160 180 200 220 240 finish time (minutes) 46 / 49
: gnuplot> set terminal png gnuplot> set output "plotfile.png" gnuplot> replot gnuplot> load "scriptfile" gnuplot> quit 47 / 49
2 ( ) : gnuplot 48 / 49
3 (4/27) : 49 / 49