3 ichii@ms.u-tokyo.ac.jp
DHCP (Dynamic Host Configuration Protocol) IP IP CIDR 157.82.16.1/22 255.255.252.0 dest net 157.82.16.0/24 default next hop - 157.82.16.1 hop 0 1 2002/5/29 2
IP link layer 0.0.0.0 Ethernet LAN 255.255.255.255 this network cf. IP hostall 1 157.82.19.255 2002/5/29 3
DHCP DHCP IP DHCP Mac IP netmask, default router etc. 2002/5/29 4
DHCPDHCP client DHCP server startup DHCPDISCOVER (broadcast) DHCPOFFER (unicast) DHCPREQUEST (broadcast) DHCPACK (unicast) (lease expire request/ack shutdown DHCPRELEASE (unicast) 2002/5/29 5
TCP (Transmission Control Protocol) UDP (User Datagram Protocol) DNS (Domain Name System) 2002/5/29 6
application transport network data link network physical data link network physical data link physical network data link physical network data link physical logical end-end transport network data link physical enhance application transport network data link physical 2002/5/29 7
application transport network data link network (TCP) physical data link network data link physical physical network data link physical network data link ( best-effort ), physical network data link (UDP) physical application (RTP) transport network data link physical logical end-end transport 2002/5/29 8
Multiplexing/demultiplexing Demultiplexing: segment header segment application-layer data Ht M Hn segment P1 M application transport network P3 receiver M M application transport network P4 M P2 application transport network 2002/5/29 9
Multiplexing/demultiplexing Multiplexing: 32 bits demultiplexing source port # dest port # multiplexing/demultiplexing: src, dest IP src, dest other header fields application data (message) well-known port TCP/UDP 2002/5/29 10
Well-Known Ports purple: [263] % cat /etc/services # # Network services, Internet style # # @(#)services 8.1 (Berkeley) 6/9/93 # BSDI services,v 2.29 2001/04/17 20:55:18 jch Exp # tcpmux 1/tcp # TCP port multiplexer (RFC1078) echo 7/tcp echo 7/udp discard 9/tcp sink null discard 9/udp sink null systat 11/tcp users systat 11/udp users daytime 13/tcp daytime 13/udp netstat 15/tcp qotd 17/tcp quote qotd 17/udp quote chargen 19/tcp ttytst source chargen 19/udp ttytst source ftp 21/tcp ssh 22/tcp # Secure shell 2002/5/29 11
Multiplexing/demultiplexing: host A source port: x dest. port: 23 server B Web client host C source port:23 dest. port: x telnet Source IP: C Dest IP: B source port: y dest. port: 80 Source IP: C Dest IP: B source port: x dest. port: 80 Web client host A Source IP: A Dest IP: B source port: x dest. port: 80 Web server B Web server 2002/5/29 12
UDP: User Datagram Protocol [RFC 768] UDP best effort UDP UDP UDP 2002/5/29 13
UDP: 32 bits source port # dest port # length checksum UDP DNS SNMP Application UDP data (message) NFS UDP 2002/5/29 14
UDP checksum 0 10 checksum 0 16 11111...1 01 NO 1 YES UDP checksum 2002/5/29 15
Reliable Transport unreliableip reliable reliable Link 2002/5/29 16
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581 socket door point-to-point: reliable MSS: maximum segment size connection-oriented: TCP window size send & receive buffers application writes data TCP send buffer segment application reads data TCP receive buffer 2002/5/29 17 socket door handshaking
Pipelined protocols ack RTT (round trip time) throughput 2002/5/29 18
TCP segment structure URG: urgent data (generally not used) ACK: ACK # valid PSH: push data now (generally not used) RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) 32 bits source port # dest port # head len sequence number acknowledgement number not used UAP R S F checksum rcvr window size ptr urgent data Options (variable length) application data (variable length) counting by bytes of data (not segments!) # bytes rcvr willing to accept 2002/5/29 19
TCP ACK Host A User types C ACK: cumulative ACK ACK host ACKs receipt Q: of echoed C A: RFC Seq=43, ACK=80 Host B Seq=42, ACK=79, data = C Seq=79, ACK=43, data = C simple telnet scenario host ACKs receipt of C, echoes back C time 2002/5/29 20
TCP: reliable data transfer event: (simplified) wait for event event: y event: ACK y ACK 2002/5/29 21
TCP: reliable data transfer Simplified TCP sender 00 sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compue new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */ 2002/5/29 22
TCP ACK [RFC 1122, RFC 2581] Event TCP Receiver action delayed ACK. 500ms ACK ACK cumulative ACK delayed ACK duplicate ACK ACK 2002/5/29 23
TCP: Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 ACK=120 ACK=100 X timeout Seq=100 timeout Seq=92 timeout loss Seq=92, 8 bytes data Seq=92, 8 bytes data ACK=120 ACK=100 time lost ACK scenario time premature timeout, cumulative ACKs 2002/5/29 24
TCP RcvBuffer = size of TCP Receive Buffer RcvWindow RcvWindow = amount of spare room in Buffer ACK RcvWindow receiver buffering 2002/5/29 25
RTT Q: Q: RTT SampleRTT: RTT ACK RTT cumulatively ACK SampleRTT premature timeout SampleRTT 2002/5/29 26
RTT EstimatedRTT = (1-x)*EstimatedRTT + x*samplertt x = 0.1 EstimtedRTT EstimatedRTT Timeout = EstimatedRTT + 4*Deviation Deviation = (1-x)*Deviation + x* SampleRTT-EstimatedRTT 2002/5/29 27
TCP Three way handshake: Control segment RcvWindow Step 1: TCP SYN Step 2: SYNACK Socket clientsocket = new Socket("hostname","port number"); Socket connectionsocket = welcomesocket.accept(); ACK Step 3: 2002/5/29 28
TCP client server client closes socket: clientsocket.close(); close FIN Step 1: TCP FIN Step 2: ACK FIN timed wait closed ACK FIN ACK close 2002/5/29 29
TCP Step 3: client ACK timed wait Step 4: FIN closing timed wait closed FIN ACK FIN ACK server closing closed 2002/5/29 30
TCP 2002/5/29 31
+---------+ --------- active OPEN CLOSED ----------- +---------+<--------- create TCB ^ snd SYN passive OPEN CLOSE ------------ ---------- create TCB delete TCB V +---------+ CLOSE LISTEN ---------- +---------+ delete TCB rcv SYN SEND ----------- ------- V +---------+ snd SYN,ACK / snd SYN +---------+ <----------------- ------------------> SYN rcv SYN SYN RCVD <----------------------------------------------- SENT snd ACK ------------------ ------------------- +---------+ rcv ACK of SYN / rcv SYN,ACK +---------+ -------------- ----------- x snd ACK V V CLOSE +---------+ ------- ESTAB snd FIN +---------+ CLOSE rcv FIN V ------- ------- +---------+ snd FIN / snd ACK +---------+ FIN <----------------- ------------------> CLOSE WAIT-1 ------------------ WAIT +---------+ rcv FIN +---------+ rcv ACK of FIN ------- CLOSE -------------- snd ACK ------- V x V snd FIN V +---------+ +---------+ +---------+ FINWAIT-2 CLOSING LAST-ACK +---------+ +---------+ +---------+ rcv ACK of FIN rcv ACK of FIN rcv FIN -------------- Timeout=2MSL -------------- ------- x V ------------ x V snd ACK +---------+delete TCB +---------+ ------------------------> TIME WAIT ------------------> CLOSED +---------+ +---------+ TCP Connection State Diagram 2002/5/29 32
(congestion) 3 (?) 2002/5/29 33
Mbps Mbps 80Mbps Mbps 80Mbps Mbps Mbps 2002/5/29 34
Mbps Mbps 50Mbps Mbps 50Mbps Mbps Mbps 2002/5/29 35
... packet-loss sensitive TCPRTT throughput RTT-sensitive IP 2002/5/29 36
End-end : end system single bit indicating (original) TCP end system congestion (SNA, DECbit, TCP/IP ECN, ATM) TCP (ECN) 2002/5/29 37
TCP end-end control congestion window size Congwin Congwin MSS w 1 RTT throughput = w * MSS RTT Bytes/sec 2002/5/29 38
TCP slow start Congwin congestion avoidance Congwin threshold Congwin slow start congestion avoidance Congwin Congwin 2002/5/29 39
TCP Slowstart Slowstart algorithm initialize: Congwin = 1 for (each segment ACKed) Congwin++ until (loss event OR CongWin > threshold) RTT Host A Host B one segment two segments four segments CongWin RTT timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP) time 2002/5/29 40
TCP Congestion Avoidance Congestion avoidance /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 1 perform slowstart 1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs 2002/5/29 41
AIMD TCP Fairness TCP congestion avoidance: AIMD: additive increase, multiplicative decrease RTT window1 Fairness TCP connection 1 window N TCP 1/N TCP connection 2 bottleneck router capacity R 2002/5/29 42
Why is TCP fair? 2 throughput additive increase 1 throughput throughput R equal bandwidth share Conne ction 2 thro ughput Connection 1 throughput loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase R 2002/5/29 43
TCP Q: WWW HTML R TCP W: congestion window S: MSS (bits) O: (bits) WS/R > RTT + S/R window ACK WS/R < RTT + S/R window ACK 2002/5/29 44
TCP latency Modeling K:= O/WS Case 1: latency = 2RTT + O/R Case 2: latency = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] 2002/5/29 45
Slow Start Example: O/S = 15 segments K = 4 windows Q = 2 P = min{k-1,q} = 2 initiate TCP connection request object RTT first window = S/R second window = 2S/R third window = 4S/R Server stalls P=2 times. fourth window = 8S/R object delivered time at client time at server complete transmission 2002/5/29 46
DNS: Domain Name System IP 32 purple.ms.u-tokyo.ac.jp Domain Name System: Q: IP (resolve) 2002/5/29 47
URL http://www.ms.u-tokyo.ac.jp/~ichii/hiroshima2002/ ichii@ms.u-tokyo.ac.jp $ ssh as301.ecc.u-tokyo.ac.jp 2002/5/29 48
RFC1035 DOMAIN NAMES IMPLEMENTATION AND SPECIFICATION as amended by RFC1123 Requirements for Internet Hosts Application and Support, Section 2.1 2002/5/29 49
TLD: Top Level Domain gtld: generic TLD.com,.org,.net.int,.edu,.mil,.gov cctld: country code TLD.us,.jp, etc..uk (.gb ) New TLDs.biz,.info.aero,.coop,.museum,.name,.pro 2002/5/29 50
.jp AC, AD, CO, ED, GO, GR, NE, OR tokyo.jp, chiba.jp etc. JP < >.JP ( ) 2002/5/29 51
ICANN etc. ICANN: Internet Corporation for Assigned Names and Numbers Registry.org,.net,.com: VeriSign, Inc. Registrar JP ICANN accredited registrars JPNIC 2002.4.1 2002/5/29 52
idn: Internationalized Domain Name (Multilingual Domain Name) Multilingual Internet Names Consortium IETF idn WG implementation IE.jp DNS RealNames (2002.5/e) 2002/5/29 53
DNS : single point of failure... authoritative : IP 2002/5/29 54
DNS: (root) local name server host authoritative name server A M M 2002/5/29 55
root name server host surf.eurecom.fr 2 gaia.cs.umass.edu 5 IP 1. local DNS server, dns.eurecom.fr 2. dns.eurecom.fr root name server 1 local name server dns.eurecom.fr 3. root name server authoritative name server dns.umass.edu 6 requesting host surf.eurecom.fr 3 4 authorititive name server dns.umass.edu gaia.cs.umass.edu 2002/5/29 56
DNS root name server 2 authoratiative name server local name server dns.eurecom.fr 7 3 authoritative name server 1 8 requesting host surf.eurecom.fr 6 intermediate name server dns.umass.edu 4 5 authoritative name server dns.cs.umass.edu gaia.cs.umass.edu 2002/5/29 57
DNS: 2 root name server recursive query: root server iterated query: 2 local name server dns.eurecom.fr 1 8 I don t know this name, but ask this requesting host surf.eurecom.fr server 3 4 7 iterated query intermediate name server dns.umass.edu 5 6 authoritative name server dns.cs.umass.edu gaia.cs.umass.edu 2002/5/29 58
DNS: (expire) authoritative 2002/5/29 59
DNS resource records (RR) Type=A Type=NS RR format: (name, value, type,ttl) name value IP name Type=CNAME name value (cannonical name) value Type=MX authoritative name value serverip 2002/5/29 60
DNS DNS : query reply identification: 16 query reply flags: query/reply recursion desired recursion available reply is authoritative 2002/5/29 61
RFC TCP RFC793, Transmission Control Protocol, 1981. RFC1122, Requirements for Internet Hosts --- Communication Layers, 1989. RFC1323, TCP Extensions for High Performance, 1992. RFC2018, TCP Selective Acknowledgment Options, 1996. RFC2581, TCP Congestion Control, 1999. UDP RFC768, User Datagram Protocol, 1980. DNS RFC1034, Domain Names --- Concepts and Facilities, 1987. RFC1035, Domain Names --- Implementation and Specification, 1987. 2002/5/29 62