TCP Yoshifumi Nishida nishida@csl.sony.co.jp
Contents Part1: TCP Part2: TCP Part3: TCP Part4: Part5: TCP Part6:
TCP TCP TCP
Transmission Control Protocol IP TCP application TCP UDP IP DataLink header IP header TCP header TCP Data
TCP 16-bit source port number 4bit header length 32-bit sequence number 32-bit acknowledgement number reserved (6 bits) URG ACK PSH RST SYN FIN 16-bit TCP checksum options (if any) data (if any) 16-bit destination port number 16-bit window size 16-bit urgent pointer TCP : RFC793.. RFC2581..
TCP TCP UNIX Application TCP
Piggyback A feedback information B shutdown
(1) negotiate MTU (Max Segment Size) A B
(2) IP : (127.0.0.1, 21) - (192.168.0.1, 20) (127.0.0.1, 21) - (192.168.0.1, 21) TCP
(3) 3way handshake (SYN) (ACK) 3 client server SYN SYN, ACK ACK
(4) active close ACK (2MSL) ACK passive close ACK FIN active close client FIN server Active Close ACK FIN Passive Close ACK
TCP CLOSE (send FIN) (5) SYN RECEIVED FIN WAIT 1 FIN WAIT 2 passive OPEN (create TCB) receive SYN (send SYN, ACK) receive ACK of SYN (no action) receive ACK of FIN (no action) CLOSED LISTEN ESTABLISHED CLOSE (delete TCB) SEND (send SYN) receive SYN (send ACK) CLOSE (send FIN) receive FIN (send ACK) receive FIN (send ACK) CLOSING active OPEN (create TCB, send SYN) SYN SENT receive SYN, ACK (send ACK) CLOSE WAIT CLOSE (send FIN) LAST ACK receive FIN (send ACK) TIME WAIT receive ACK of FIN (no action) timeout at 2MSL (delete TCB) receive ACK of FIN (no action) CLOSED
(6) client SYN SENT ESTABLISHED SYN SYN,ACK ACK server SYN_RCVD ESTABLISHED FIN_WAIT_1 FIN_WAIT_2 TIME_WAIT FIN ACK FIN ACK CLOSE_WAIT LAST_ACK CLOSED
(1) + Application (2500 byte) TCP (Initialseqno 10000) 10001 10500 11000 11500 12000
(ACK) (1) seqno +1 1000 1500 Data Data 1000 1500 2000 Data Data Data 2000 ACK 2000 Data ACK 1000
(2) IPv4 IPv6 IP 32bit sender IP address 32bit receiver IP address 0 proto number TCP segment length
(3)
(3) ACK Sender Reciever Sender Reciever data data ack ack window size = 1 window size = 4
TCP Nagle TCP TCP
TCP (berkley 500msec) 2MSL (berkley 200msec) Nagle
expire expire RTT(Round Trip Time) RTT Data ACK
RTT Timer RTT Data ACK RTT : rtt : srtt srtt = srtt + (1 - ) * rtt 0.9
rto = 2 srtt rto = rtt + 4 64
( ) 0 ACK expire 1 0 1 60 Sender Receiver ACK, window = 0 ACK, window = 1000 1 byte Data
probe Web
2MSL MSL(Max Segment Lifetime): ACK 2MSL RFC793 2 30 1
ACK ACK PiggyBack ACK ACK Timer Timer Data Data RTT RTT
Nagle (1) telnet rlogin ACK
Nagle (2) application Sender Data kernel Reciever Data Data ACK Data Sender application kernel Data Data Data ACK Reciever
(RFC813) 1/2 1/2
TCP TCP SYN : ACK : ACK FIN : Push RST
Push Sender Reciever application kernel kernel application Without Push With Push
RST TCP listen port RST RST Close RST Close Listen
(1)
(2) TCP header TCP Data
TCP TCP
: 1980 : 1980 : 1990 :
(1) IP
(2)
(3)
TCP
1988 TCP Tahoe Fast Retransmit 1990 Reno Fast Recovery 1996 NewReno Fast Recovery
TCP (1) TCP
TCP (2) (cwnd) min( ) ACK ACK
TCP (3) 1/2 ssthresh ( ) cwnd 1/2 cwnd ( ) cwnd < ssthresh cwnd > ssthresh
Fast Retransmit Tahoe Reno Fast Recovery Data 1000 Data 1500 Data 2000 Data 2500 Data 3000 ACK 1500 ACK 1500 ACK 1500 Data 1500
Fast Recovery (1) Tahoe Fast Retransmit =1 Fast Recovery Fast Retransmit 50%
Fast Recovery (2) cwnd ACK cwnd ssthresh ( 1/2)
Tahoe Window Size Limit Optimal ssthresh Time Reno,NewReno Window Size Limit Optimal ssthresh Time
TCP ssthresh < windowsize < limit
sender receiver data path sender receiver ack path
Path MTU discovery SACK( )
TCP TCP / RTT RFC793 65535 2Mbps, RTT 0.5 512000 12% RTT
= X TCP X bit X 14 : 65535 2^14 = 1,073,725,440 3way handshake client SYN scale=x server SYN, ACK scale=y scale=x scale=y ACK
RTT RTT
Path MTU discovery DF(Don t Fragment) ICMP DF MSS MSS 10 (RFC1191) MSS MTU
SACK(Selective Acknowledgement) RFC2018 TCP SACK Permitted Option SACK option 3way handshake negotiate SYN SACK Option
4 SACK option format KIND LEN Left Edge of First Block Right Edge of First Block Left Edge of n th Block Right Edge of n th Block
SACK : 5000 8500 MSS 500 5500,6500,7500 lost Trigger Segemnt 5000 ACK 5500 1st block Left Right 2nd block Left Right 3rd block Left Right 5500 (lost) 6000 5500 6000 6500 6500 (lost) 7000 7500 (lost) 8000 5500 5500 7000 7500 6000 6500 8000 8500 7000 7500 6000 6500
Pittuburgh Supercomuting center http://www.psc.edu/networking/perf_tune.html OS Win95 Win98 WinNT3.5/4.0 FreeBSD3.3 SunOS4.1 PMTU RFC1323 SACK YES NO NO YES YES YES YES NO NO YES YES NO NO NO NO Solaris2.6 YES YES YES Solaris7 YES YES YES
TCP Sequence number attack SYN flood Attack TCP IPsec filtering
Sequence number attack (1) TCP Sequence TCP X SYN A 1 A ACK B 3 SYN B,ACK A 2 B
Sequence number attack (2) Sequence Sequence 1 1 Sequence 4 src adr, src port, dst adr, dst port
(DoS) target SYN SYN flood attack (1) half open TCP connection cracker target SYN SYN, ACK Half open
SYN flood attack (2) 3way handshake timeout queue half open connection SYN
Explicit Congestion Notification Initial Large Window TCPVegas NewReno Rate-Halving TCPfriendly
ECN (Explicit Congestion Notification) ECN congestion IP TOS field 2bit RFC2481 ECT(ECN Capable Transport) Transport ECN CE(Congestion Experience) Sender Router Receiver CE bit
ECN TCP (1) TCP header 2 bit RFC2481 Reserved field 2bit ECN echo Congestion Window Reduce ECN echo Sender CWR Router CE bit Receiver ECN echo
ECN TCP (2) Handshake ECN Sender: ECN echo CWR SYN Receiver: ECN echo SYN + ACK TCP sender reserved field echo back client SYN ECN echo, CWR server SYN, ACK ECN echo ACK
ECN TCP (3) Receiver: CE ACK ECN echo Sender: Packet Loss CWR Receiver: CWR CE ACK ECN echo Sender CWR Router CE bit Receiver
Large Initial Window (1) RFC2414 1 MSS 1 RTT HTTP
Large Initial Window (2) min(4 * MSS, max(2 * MSS, 4380) MSS 1095 : 4 * MSS MSS 1095 2190 : 4380 MSS 2190 : 2 * MSS
TCPVegas TCPVegas (1) Brakmo TCP TCPVegas
TCPVegas (2) TCPVegas Actual Throughput Expected Throughput Actual < expected Actual > expected Actual = expected
NewReno RFC2582 MIT Hoe TCP Fast Retransmit,Fast Recovery 1RTT Reno Fast Retransmit NewReno Fast Retransmit Reno aggressive
Rate Halving MIT Hoe, PSC Mathis Fast recovery 1/2 Rate halving adjust 2ACK 1 self-clocking 50% burst
TCPfriendly ACIRI S.Floyd TCP UDP UDP UDP TCP TCP bandwidth = 1. 3MTU RTT Loss UDP flow flow
TCP SACK, ECN, (NewReno) Rate halving, Vegas CBQ, Diffserv TCP friendly Congestion Manager TCP
TCP RFC For More Information RFC793.. RFC813.. Silly Window Syndrome RFC1122.. Host Requirement RFC1323.. Extention for high performance RFC2414.. Large Initial Window RFC2418.. ECN RFC2581.. Congestion Control RFC2582.. NewReno algorithm IETF WG TCP Implementation (tcpimpl) TCP Over Satellite (tcpsat) Performance Implications of Link Characteristics (pilc)