TCP

1. Introduction

At the Transport Layer (equivalent to Layer 4 in the OSI model), two protocols exist:

TCP (Transmission Control Protocol) - breaks information into datagrams and sends them, carrying out resends, if required, and reassembles received datagrams, it gives 'reliable' delivery, a connection-oriented service between applications.
UDP (User Datagram Protocol) - does the same as TCP but it does not carry out any checking or resending of datagrams, so it is described as 'unreliable', a connectionless service (See UDP).

IP Datagrams are 'connectionless', however the TCP segment is 'connection-oriented'.

2. TCP Header

TCP allows clients to run concurrent applications using different port numbers and at full-duplex thereby giving a multiplexing ability. TCP labels each octet of data with a Sequence Number and a series of octets form a Segment, the sequence number of the first octet in the segment is called the Segment Sequence Number. TCP provides reliability with ACK packets and Flow Control using the technique of a Sliding Window. During the setup of a TCP connection the maximum segment size is determined based on the lowest MTU across the network.

The TCP header looks like this:

It is worth noting the following fields:

Source and Destination ports - this identifies the upper layer applications using the connection.
Sequence Number - this 32-bit number ensures that data is correctly sequenced. Each byte of data is assigned a sequence number. The first byte of data by a station in a particular TCP header will have its sequence number in this field, say 58000. If this packet has 700 bytes of data in it then the next packet sent by this station will have the sequence number of 58000 + 700 + 1 = 58701.
Acknowledgment Number - this 32-bit number indicates the next sequence number that the sending device is expecting from the other station.
HLEN - gives the number of 32 bit words in the header. Sometimes called the Data Offset field.
Reserved - always set to 0.
Code bits - these are flags that indicate the nature of the header. They are:
- URG - Urgent Pointer
- ACK - Acknowledgement
- PSH - Push function, causes the TCP sender to push all unsent data to the receiver rather than sends segments when it gets around to them i.e. when the buffer is full.
- RST - Reset the connection
- SYN - Synchronise sequence numbers
- FIN - End of data
Window - indicates the range of acceptable sequence numbers beyond the last segment that was successfully received. It is the allowed number of octets that the sender of the ACK is willing to accept before an acknowledgement.
Urgent Pointer - shows the end of the urgent data so that interrupted data streams can continue. When the URG bit is set, the data is given priority over other data streams.
Option - mainly only the TCP Maximum Segment Size (MSS) sometimes called Maximum Window Size or Send Maximum Segment Size (SMSS). A segment is a series of data bytes within a TCP header.

3. Port Numbers

Applications open Port numbers, used by TCP and UDP to keep tabs of different communications occurring around the network. Generally, port numbers below 255 were originally for public applications (Assigned Internet protocol numbers); 255 is reserved.

Port numbers 256 to 1023 are for saleable applications by various manufacturers and are considered as 'Privileged', 'Well-Known' or Extended Assigned port numbers.

Port numbers above 1024 (1024 is reserved) are not regulated, are considered as Unprivileged, or Registered, and these ports are commonly free to be used used by clients talking to Well-Known port numbers.

Applications open port numbers (the TCP/IP model differs from the OSI model in that the Application layer sits straight on top of layer 4) and communicate to each other via these port numbers. A telnet server with IP address 10.1.1.1 uses port number 23, however if two clients operating from IP address 10.1.1.2 attach themselves to the server then the server needs to distinguish between the two conversations. This is achieved by the clients randomly picking two port numbers above 1023, say 1024 and 1025. The client connection is referenced as a Socket and is defined as the IP address plus the port number, e.g. 10.1.1.1.TCP.1025 and 10.1.1.1.TCP.1026. The server socket is 10.1.1.1.TCP.23. This is how TCP multiplexes different connections.

The following table lists some commonly used port numbers:

TCP	Application	Port Number
	FTP	20 (Data), 21 (Control, or Program)
	Telnet	23
	SMTP	25
	HTTP	80
UDP
	DNS	53
	Bootp	67/68
	TFTP	69
	NTP	123
	SNMP	161

RFC 793 (TCP) and RFC 1323 (TCP Extensions) describe TCP in detail whilst both RFC 1500 and RFC 1700 define the Well-Known port numbers for both TCP and UDP. A good list can also be found at https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.xhtml.

4. Sequence Numbers

Each octet has its own sequence number so that each one can be acknowledged if necessary. In practice octets are acknowledged in batches, the size of which is determined by the window size (see below). The sequence number is a 32-bit binary number, although very large there is a finite number range that is used (0 to 2³²-1), whereby it cycles back to zero. In order to track the sequence numbers required for checking, the arithmetic has to be performed as modulo 2³².

5. TCP Operation

5.1 Three-way Handshake

If a source host wishes to use an IP application such as active FTP for instance, it selects a port number which is greater than 1023 and connects to the destination station on port 21. The TCP connection is set up via three-way handshaking:

This begins with a SYN (Synchronise) segment (as indicated by the code bit) containing a 32-bit Sequence number A called the Initial Send Sequence (ISS) being chosen by, and sent from, host 1. This 32-bit sequence number A is the starting sequence number of the data in that packet and increments by 1 for every byte of data sent within the segment, i.e. there is a sequence number for each octet sent. The SYN segment also puts the value A+1 in the first octet of the data.
Host 2 receives the SYN with the Sequence number A and sends a SYN segment with its own totally independent ISS number B in the Sequence number field. In addition, it sends an increment on the Sequence number of the last received segment (i.e. A+x where x is the number of octets that make up the data in this segment) in its Acknowledgment field. This Acknowledgment number informs the recipient that its data was received at the other end and it expects the next segment of data bytes to be sent, to start at sequence number A+x. This stage is aften called the SYN-ACK. It is here that the MSS is agreed.
Host 1 receives this SYN-ACK segment and sends an ACK segment containing the next sequence number (B+y where y is the number of octets in this particular segment), this is called Forward Acknowledgement and is received by Host 2. The ACK segment is identified by the fact that the ACK field is set. Segments that are not acknowledged within a certain time span, are retransmitted.

TCP peers must not only keep track of their own initiated Sequence numbers but also those Acknowledgment numbers of their peers.

Closing a TCP connection is achieved by the initiator sending a FIN packet. The connection only closes when an ACK has been sent by the other end and received by the initiator.

Maintaining a TCP connection requires the stations to remember a number of different parameters such as port numbers and sequence numbers. Each connection has this set of variables located in a Transmission Control Block (TCB).

5.2 Piggybacking ACKs

When the receiver receives a data segment, it checks the sequence number and if it matches the next segment that the receiver expected, then the data is received in order. Often the Acknowledgement can be piggy-backed on to normal traffic rather than wait for a response every time, in fact TCP may be set up to wait 200ms just to see if any data is required to be sent, just so that it can piggyback ACKs. If the receiver does not receive a data segment in order e.g. a packet was dropped, then the receiver sends an ACK for the sender to retransmit the missing segment.

5.3 Transmission Timeout

Because every TCP network has its own characteristics, the delay between sending a segment and receiving an acknowledgement varies. Different methods are available for calculating this Transmission Timeout and will depend on the stack. TCP maintains a retransmission timer for each connection. This retransmission timer is used when TCP expects to receive an acknowledgment from the other end. Once data is sent, TCP monitors this Retransmission Time-Out (RTO) and also a Round Trip Time (RTT). If an ACK is not received by the time the RTO expires, TCP retransmits the data using an exponentially increasing value for the RTO. This doubling is called an Exponential Back-Off The RTO is calculated as a linear function of the RTT and its value changes over time with changes in routing and traffic load. Typically RTT+4*mean deviation.

6. Sliding Window

6.1 Buffers

Buffers are used at each end of the TCP connection to speed up data flow when the network is busy. Flow Control is managed using the concept of a Sliding Window. A Window is the maximum number of unacknowledged bytes that are allowed in any one transmission sequence, or to put it another way, it is the range of sequence numbers across the whole chunk of data that the receiver (the sender of the window size) is prepared to accept in its buffer. The receiver specifies the current Receive Window size in every packet sent to the sender. The sender can send up to this amount of data before it has to wait for an update on the Receive Window size from the receiver. The sender has to buffer all its own sent data until it receives ACKs for that data. The Send Window size is determined by whatever is the smallest between the Receive Window and the sender's buffer. When TCP transmits a segment, it places a copy of the data in a retransmission queue and starts a timer. If an acknowledgment is not received for that segment (or a part of that segment) before the timer runs out, then the segment (or the part of the segment that was not acknowledged) is retransmitted.

6.2 Sliding Window Operation

The current sequence number of the TCP sender is y.
The TCP receiver specifies the current negotiated window size x in every packet. This often specified by the operating system or the application, otherwise it starts at 536 bytes.
The TCP sender sends a datagram with the number of data bytes equal to the receiver's window size x and waits for an ACK from the receiver. The window size can be many thousands of bytes!
The receiver sends an ACK with the value y + x i.e. acknowledging that the last x bytes have been received OK and the receiver is expecting another transmission of bytes starting at byte y + x.
After a successful receipt, the window size increases by an additional x, this is called the Slow Start for new connections.
The sender sends another datagram with 2x bytes, then 3x bytes and so on up to the MSS as indicated in the TCP Options.
If the receiver has a full buffer, then the window size is reduced to zero. In this state, the window is said to be Frozen and the sender cannot send any more bytes until it receives a datagram from the receiver with a window size greater than zero.
If the data fails to be received as determined by the timer which is set as soon as data is set until receipt of an ACK, then the window size is cut by half e.g. from 4x to 2x. Failure could be due to congestion e.g. a full buffer on the receiver, or faults on the media.
On the next successful transmission, the slow ramp up starts again.

RFC 813 describes strategies for TCP windows.

6.3 Window Size

The window size could be used up in one go if a segment was large enough, however normally the window is used up by several segments of hundreds of bytes each. A Window size of one means that each byte of data is required to be acknowledged before the next one is sent. This is inefficient and therefore the window size is often much larger and is normally a Sliding Window (as described earlier) which is dynamically negotiated during a TCP session depending on the number of errors that occur in a connection. The 'sliding' element describes the octets that are allowed to be transmitted from a stream of octets that form a chunk of data. As the transmission of this chunk of data progresses, the window slides along the octets as octets are transmitted and acknowledged i.e. as data is acknowledged the window advances along the data octets. When the sender receives an ACK, this determines where the trailing edge of the window sits. The Receive Window size determines where the leading edge of the window sits. As the window slides along, any unsent data can be sent immediately as this implies that there is room in the receiver buffer. If the window size is slowly decreasing then it shows that the application is slow to take the data off the TCP stack. If the receiver indicates a window size of 0, then the sender cannot send any more bytes until the receiver sends a packet with a window size greater than 0.

Take the scenario where the sender has a sequence of bytes to send, say numbered 1 to 20, to a receiver who has a window size of ten. The sender then would place a window around the first ten bytes and transmit them in one go. It would then wait for an acknowledgment. The receiver then sends an ACK of 11 meaning that it successfully received the first 10 bytes, and is now expecting byte 11. At this point, the sender moves the sliding window (of size 10) 10 bytes along to cover bytes 11 to 20. The sender then transmits these 10 bytes in one go.

Applications determine the initial window size and you can see this size for each device at the initial synchronisation (the three-way handshake). Windows uses 8760 bytes for Ethernet by default, although this can be changed in the registry. The number 8760 is 6 x 1460 which is the amount of data a full Ethernet frame can carry and is the MSS for Ethernet by default, which is shared during the synchronisation. When sizing a window, 6-8 times the packet size is considered the most efficient. In the old days of the Internet (early 1980s) when protocols such as X.25 were prevalent, users were often advised to assume a much smaller datagram size of 576 (from RFC 791), although no longer necessary, you may come across smaller MSS and window size settings as a result.

The less errors that occur on the network, the larger the window is allowed to get and the more bandwidth is used for data. The only problem with a large window size is that if there is a transmission failure at any point, the whole segment has to be resent thereby taking up bandwidth anyway.

One thing to be aware of with TCP protocols is the slow ramping up of the window size. For instance, if you are sending a 10Mb file using FTP, it may take 1Mb of transfer before the transfer occurs at optimum speed. This is because the window size starts off small so that much of the initial traffic is header rather than data. Downloading small files using FTP does not reach the optimum data download speed, downloading large files is more efficient. This mechanism is called Slow Start and is outlined in RFC 2001.

The window size is the maximum number of bytes of data that can be transmitted in one segment without acknowledgement. Another way of looking at this is that the window size decides the amount of data that can be sent within the RTT. Here are some examples:

An 8KB window size would take 32ms to be transmitted on a 2Mbps serial link ((8192 * 8)/2048000 = 0.032s). The RTT is therefore 64ms. So for every 64ms, 8KB is transmitted because packets can only be sent for 32ms of that time as we are having to await for ACKs i.e. we are not able to use the full capability of the bandwidth. Multiply this up and we find that an 8KB window gives us a maximum data throughput of 8192 * 8 * 1000/64 = 1024000bps (1Mbps), irrespective of the potential speed of the link.
An 8KB window size would take 400ms on a satellite link one way. The RTT is therefore 800ms. So for every 800ms, 8KB is transmitted because packets can only be sent for 400ms of that time as we are having to await for ACKs i.e. again we are not able to use the full capability of the bandwidth. Multiply this up and we find that an 8KB window gives us a maximum data throughput of 8192 * 8 * 1000/800 = 81920bps or about 82kbps, irrespective of the potential speed of the link. This is because of the enormous delay.
An 8KB window size would take 7s to be transmitted on a 9600bps serial link ((8192 * 8)/9600 = 6.83s). Most of the 8KB window will be buffered because of the serialisation delay as bits are sent much more slowly.

The TCP 16-bit window size field allows a maxmimum size of 65535 bytes for the window size so 64KB can be sent every RTT. For a satellite link with 800ms RTT the maximum throughput with this maximum sized window is given by 65535 * 8 * 1000/800 = 655350bps or about 660Kbps. An expensive 2Mbps satellite link would not be fully utilised. One of the TCP Options allows you to scale the window size up to a 30-bit field this is the Window Scale Option described in RFC 1323.

It would be preferable to have a window size appropriate to the size of the link. There would be less buffering, the ACKs would return more quickly and more of the bandwidth would be used. Ideally you are looking for a Window Size >= Bandwidth * RTT. So a 128Kbps serial line with a RTT of 40ms would require a Window size of at least 128000/8 * 0.04 = 640 bytes. Similarly, a 2Mbps link with a 20ms RTT would need a window size of at least 2000000/8 * 0.02 = 5000 bytes. So a 128Kbps satellite link with a RTT of 800ms would require a Window size of at least 128000/8 * 0.8 = 12800 bytes. A technique such as this (although more complex) is used by the Packeteer product that spoofs the TCP connections between client and server and modifies the window sizes according to the characteristics of the links between them.

7. TCP Segment Transfer Example

Consider the following TCP segment transfer. This has been laid out in a similar format to that which you would see from a Network trace displayed in two-station format. We are just concentrating on the TCP sequence numbers and window sizes:

Type of segment	160.221.172.250	160.221.73.26
SYN	Seq.no. 17768656
	(next seq.no. 17768657)
	Ack.no. 0
	Window 8192
	LEN = 0 bytes
SYN-ACK		Seq.no. 82980009
		(next seq.no. 82980010)
		Ack.no. 17768657
		Window 8760
		LEN = 0 bytes
ACK	Seq.no. 17768657
	(next seq.no. 17768657)
	Ack.no. 82980010
	Window 8760
	LEN = 0 bytes





	Seq.no. 17768657
	(next seq.no. 17768729)
	Ack.no. 82980010
	Window 8760
	LEN = 72 bytes of data
		Seq.no. 82980010
		(next seq.no. 82980070)
		Ack.no. 17768729
		Window 8688
		LEN = 60 bytes of data
	Seq.no. 17768729
	(next seq.no. 17768885)
	Ack.no. 82980070
	Window 8700
	LEN = 156 bytes of data
		Seq.no. 82980070
		(next seq.no. 82980222)
		Ack.no. 17768885
		Window 8532
		LEN = 152 bytes of data
FIN	Seq.no. 17768885
	(next seq.no. 17768886)
	Ack.no. 82980222
	Window 8548
	LEN = 0 bytes
FIN-ACK		Seq.no. 82980222
		(next seq.no. 82980223)
		Ack.no. 17768886
		Window 8532
		LEN = 0 bytes
ACK	Seq.no. 17768886
	(next seq.no. 17768886)
	Ack.no. 82980223
	Window 8548
	LEN = 0 bytes

The value of LEN is the length of the TCP data which is calculated by subtracting the IP and TCP header sizes from the IP datagram size.

The session begins with station 160.221.172.250 initiating a SYN containing the sequence number 17768656 which is the ISS. In addition, the first octet of data contains the next sequence number 17768657. There are only zeros in the Acknowledgement number field as this is not used in the SYN segment. The window size of the sender starts off as 8192 octets as assumed to be acceptable to the receiver.
The receiving station sends both its own ISS (82980009) in the sequence number field and acknowledges the sender's sequence number by incrementing it by 1 (17768657) expecting this to be the starting sequence number of the data bytes that will be sent next by the sender. This is called the SYN-ACK segment. The receiver's window size starts off as 8760.
Once the SYN-ACK has been received, the sender issues an ACK that acknowledges the receiver's ISS by incrementing it by 1 and placing it in the acknowledgement field (82980010). The sender also sends the same sequence number that it sent previously (17768657). This segment is empty of data and we don't want the session just to keep ramping up the sequence numbers unnecessarily. The window size of 8760 is acknowledged by the sender.
From now on ACKs are used until just before the end of the session. The sender now starts sending data by stating the sequence number 17768657 again since this is the sequence number of the first byte of the data that it is sending. Again the acknowledgement number 82980010 is sent which is the expected sequence number of the first byte of data that the receiver will send. In the above scenario, the sender is intitially sending 72 bytes of data in one segment. The network analyser may indicated the next expected sequence number in the trace, in this case this will be 17768657 + 72 = 17768729. The sender has now agreed the window size of 8760 and uses it itself.
The receiver acknowledges the receipt of the data by sending back the number 17768729 in the acknowledgement number field thereby acknowledging that the next byte of data to be sent will begin with sequence number 17768729 (implicit in this is the understanding that sequence numbers up to and including 17768728 have been successfully received). Notice that not every byte needs to be acknowledged. The receiver also sends back the sequence number of the first byte of data in its own segment (82980010) that is to be sent. The receiver is sending 60 bytes of data. The receiver subtracts 72 bytes from its previous window size of 8760 and sends 8688 as its new window size.
The sender acknowledges the receipt of the data with the number 82980070 (82980010 + 60) in the acknowledgement number field, this being the sequence number of the next data byte expected to be received from the receiver. The sender sends 156 bytes of data starting at sequence number 17768729. The sender subtracts 60 bytes from its previous window size of 8760 and sends the new size of 8700.
The receiver acknowledges receipt of this data with the number 17768885 (17768729 + 156) since it was expecting it, and sends 152 bytes of data beginning with the sequence number 82980070. The receiver subtracts 156 bytes from the previous window size of 8688 and sends the new window size of 8532.
The sender acknowledges this with the next expected sequence number 82980070 + 152 = 82980222 and sends the expected sequence number 17768885 in a FIN because at this point the application wants to close the session. The sender subtracts 152 bytes from its previous window size of 8700 and sends the new size of 8548.
The receiver sends an FIN-ACK acknowledging the FIN and increments the acknowledgement sequence number by 1 to 17768886 which is the number it will expect on the final ACK. In addition the receiver sends the expected sequence number 82980223. The window size remains at 8532 as no data was received from the sender's FIN.
The final ACK is sent by the sender confirming the sequence number 17768886 and acknowledges receipt of 1 byte with the acknowledgement number 82980223. The window size finishes at 8548 and the TCP connection is now closed.

From the above you can see that if you have applications where data flow is largely unidirectional, you can have a scenario where there could be a long series of ACKs where the sequence numbers are the same as far as the data receiver is concerned. Also, you may have a frozen window whilst the application catches up which means that in the meantime acknowledgements are sent by the receiver with a window size of 0 until buffer space is freed up and an acknowledgement is sent with the window size ramped up again, thereby allowing the sender to send data again and the sequence numbers start increasing again.

The above example is a clean straightforward bi-directional data transfer session, however you often have multiple TCP sessions to sort through using different ports and sequence numbers, plus in any one session segments could be resent, sent in a row or the window is frozen due to the stack buffer being full all of which can make it interesting tracking sequence numbers. Be aware that the ACK only has to acknowledge the last sequence number received, so if four segments have been sent in a row, only one ACK is required. If sequence numbers do not arrive then the whole segment is lost with all the bytes of data within it, plus any segments that may have been sent in a row before the lost segment.

You will notice in the above example that the window size steadily decreased, this indicates that no data had been processed off the TCP stack by the time the session had finished. On a longer session you should see the window size creep up again as the buffer is emptied by the application. In the example the window sizes could easily be followed because the segment packets followed each other, however most often acknowledgements do not always follow and may be acknowledging more than one segment, this makes it more tricky to follow.

The above description details the simplest case of TCP connections, however you can get more complex scenarios where simultaneous connections are set up, or segments get lost or resent. The judicious use of RST (Reset) helps clean these connections up. You can follow step by step these different scenarios in RFC 793.

8. TCP Header Compression

TCP header compression reduces the TCP header from 40 to 5 bytes. This compression was devised by Van Jacobsen and is described in RFC 1144. Only a few bytes in the header change from one packet to another so the VJ algorithm only transfers the bytes that have changed. This should be used for protocols such as HTTP where larger numbers of small packets are used (e.g. keystrokes and button clicks), therefore many headers. Protocols such as FTP normally use large packet sizes and so TCP header compression is not going to have a significant benefit. Whenever TCP header compression is used make sure that it is configured at both ends otherwise protocols that use TCP, such as Telnet, will not operate.

Home

Disclaimer