2 212 4 13
1 (4/6) : ruby 2 / 35
( ) : gnuplot 3 / 35
( ) 4 / 35
(summary statistics) : (mean) (median) (mode) : (range) (variance) (standard deviation) 5 / 35
(mean): x = 1 n (median): { xr+1 m, m = 2r + 1 x median = (x r + x r+1 )/2 m, m = 2r n i=1 (mode): x i f(x) mode median mean median mode mean x 6 / 35
(percentiles) pth-percentile: p% median = 5th-percentile 1 9 8 total observations (%) 7 6 5 4 3 2 1-4 -3-2 -1 1 2 3 4 sorted variable x 7 / 35
(range): (variance): σ 2 = 1 nx (x i x) 2 n i=1 (standatd deviation): σ 68% (mean ± stddev) 95% (mean ± 2stddev) f(x) 1 mean median exp(-x**2/2).8.6.4.2-5 -4-3 -2-1 1 2 3 4 5 68% x 95% 8 / 35
(variance): σ 2 = 1 nx (x i x) 2 n i=1 σ 2 = 1 nx (x i x) 2 n i=1 = 1 nx (xi 2 2x i x + x 2 ) n i=1 = 1 nx nx n ( xi 2 2 x x i + n x 2 ) i=1 i=1 = 1 nx xi 2 2 x 2 + x 2 n i=1 = 1 nx xi 2 x 2 n i=1 9 / 35
normalized traffic volume 4 2-2 -4 cdf 1.9.8.7.6.5.4.3.2.1 1.5 1.5 -.5-1 5 1 15 2 25 3 35 time (sec) -4-3 -2-1 1 2 3 4 normalized traffic volume -1.5-1.5-1 -.5.5 1 1.5 1 / 35
: sample data from a book: P. K. Janert Gnuplot in Action # Minutes Count 133 1 134 7 135 1 136 4 137 3 138 3 141 7 142 24... :2,355 :171.3 :14.1 :176 11 / 35
: (2) 18 16 14 12 count 1 8 6 4 2 12 14 16 18 2 22 24 finish time (minutes) 12 / 35
: (3) 25 2 15 rank 1 5 12 14 16 18 2 22 24 finish time (minutes) 13 / 35
XY XY : ( ) XY 3D ( : ) 14 / 35
15 / 35
X Y 4 normalized traffic volume 2-2 -4 5 1 15 2 25 3 35 time (sec) 16 / 35
(1/2) X : Y : 16 14 12 frequency 1 8 6 4 2-4 -3-2 -1 1 2 3 4 normalized traffic volume 17 / 35
(2/2) ( ) ( ) 18 / 35
(probability density function; pdf) 1 : X x f (x) = P[X = x].4.35.3.25 pdf.2.15.1.5-4 -3-2 -1 1 2 3 4 normalized traffic volume 19 / 35
(cumulative distribution function; cdf) : x f (x) = P[X = x] : x F (x) = P[X <= x] 1.9.8.7.6 cdf.5.4.3.2.1-4 -3-2 -1 1 2 3 4 normalized traffic volume 2 / 35
CDF CDF CDF 18 ping rtt 18 ping rtt 16 16 14 14 12 12 histogram 1 8 histogram 1 8 6 6 4 4 2 2 3 4 5 6 7 8 9 1 response time (msec) 3 4 5 6 7 8 9 1 response time (msec) 1.9.8.7.6 CDF.5.4.3.2.1 8241 samples 1 samples 3 4 5 6 7 8 9 1 response time (msec) ( ) ( )1 ( )CDF 21 / 35
(scatter plots) 2 X : X Y : Y X Y 1.5 1.5 1.5 1 1 1.5.5.5 -.5 -.5 -.5-1 -1-1 -1.5-1.5-1 -.5.5 1 1.5-1.5-1.5-1 -.5.5 1 1.5-1.5-1.5-1 -.5.5 1 1.5 : ( ).7 ( ). ( ) -.5 22 / 35
gnuplot http://gnuplot.info/ grace GUI http://plasma-gate.weizmann.ac.il/grace/ 23 / 35
: : P. K. Janert Gnuplot in Action http://web.sfc.keio.ac.jp/~kjc/classes/sfc212s-measurement/marathon.txt 24 / 35
: ( ) # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ sum = # sum of data n = # the number of data ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt end end mean = Float(sum) / n printf "n:%d mean:%.1f\n", n, mean % ruby mean.rb marathon.txt n:2355 mean:171.3 25 / 35
: : σ 2 = 1 n n i=1 (x i x) 2 # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ data = Array.new sum = # sum of data n = # the number of data ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt for i in 1.. cnt data.push min end end end mean = Float(sum) / n sqsum =. data.each do i sqsum += (i - mean)**2 end var = sqsum / n stddev = Math.sqrt(var) printf "n:%d mean:%.1f variance:%.1f stddev:%.1f\n", n, mean, var, stddev % ruby stddev.rb marathon.txt n:2355 mean:171.3 variance:199.9 stddev:14.1 26 / 35
: : σ 2 = 1 n n i=1 x 2 i x 2 # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ sum = # sum of data n = # the number of data sqsum = # su of squares ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i sum += min * cnt n += cnt sqsum += min**2 * cnt end end mean = Float(sum) / n var = Float(sqsum) / n - mean**2 stddev = Math.sqrt(var) printf "n:%d mean:%.1f variance:%.1f stddev:%.1f\n", n, mean, var, stddev % ruby stddev2.rb marathon.txt n:2355 mean:171.3 variance:199.9 stddev:14.1 27 / 35
: # regular expression to read minutes and count re = /^(\d+)\s+(\d+)/ data = Array.new ARGF.each_line do line if re.match(line) min = $1.to_i cnt = $2.to_i for i in 1.. cnt data.push min end end end data.sort! # just in case data is not sorted n = data.length # number of array elements r = n / 2 # when n is odd, n/2 is rounded down if n % 2!= median = data[r] else median = (data[r - 1] + data[r])/2 end printf "r:%d median:%d\n", r, median % ruby median.rb marathon.txt r:1177 median:176 28 / 35
: gnuplot gnuplot 29 / 35
plot "marathon.txt" using 1:2 with boxes ( ) set boxwidth 1 set xlabel "finish time (minutes)" set ylabel "count" set yrange [:18] set grid y plot "marathon.txt" using 1:2 with boxes notitle 16 14 12 1 8 6 4 "marathon.txt" using 1:2 count 18 16 14 12 1 8 6 4 2 12 14 16 18 2 22 24 2 12 14 16 18 2 22 24 finish time (minutes) 3 / 35
: CDF : # Minutes Count 133 1 134 7 135 1 136 4 137 3 138 3 141 7 142 24... : # Minutes Count CumulativeCount 133 1 1 134 7 8 135 1 9 136 4 13 137 3 16 138 3 19 141 7 26 142 24 5... 31 / 35
: CDF (2) ruby code: re = /^(\d+)\s+(\d+)/ cum = ARGF.each_line do line begin if re.match(line) # matched time, cnt = $~.captures cum += cnt.to_i puts "#{time}\t#{cnt}\t#{cum}" end end end gnuplot command: set boxwidth 1 set xlabel "finish time (minutes)" set ylabel "CDF" set grid y plot "marathon-cdf.txt" using 1:($3 / 2355) with lines notitle 32 / 35
CDF 1.9.8.7.6 CDF.5.4.3.2.1 12 14 16 18 2 22 24 finish time (minutes) 33 / 35
( ) : gnuplot 34 / 35
3 (4/2) : 35 / 35