Socket Buffers, Fragmentation and Segmentation - Linux

The shared info structure, skb_shared_info, is used to support IP fragmentation and TCP segmentation. A discussion of the socket buffers is not complete without discussing this structure. The shared info structure, also known as skb_shinfo, is defined in the file include/linux/skbuff.h.

struct skb_shared_info {

This field contains the reference count for this skb. It is incremented each time the buffer is cloned.

atomic_t dataref;

Nr_frags is the number of fragments in this packet. This field is used by TCP segmentation.

unsigned int nr_frags;

The next two fields are used for devices that have the capability of doing TCP segment processing in hardware. This is the network device feature, NET_F_TSO.

unsigned short tso_size; unsigned short tso_segs;

This field points to the list of fragments for this packet if it is fragmented.

struct sk_buff *frag_list;

This is the array of page table entries. Each entry is actually a TCP segment.

skb_frag_t frags[MAX_SKB_FRAGS]; } ;

This structure is placed at the end of the attached data buffer and pointed to by the end field in the socket buffer structure, which points to the end of the data portion of the packet. However, end is also used to find the beginning of skb_shinfo in the attached data buffer because it immediately follows the data portion of the regular packet. Skb_shared_info has several purposes, including IP fragmentation, TCP segmentation, and keeping track of cloned socket buffers. When used for IP fragmentation, skb_shared_info points to a list of sk_buffs containing IP fragments. When used for TCP segmentation, this structure contains an array of attached pages containing the segment data. The handling of TCP segments is more efficient than IP fragments. IP fragmentation is not quite as common as it was in earlier days of the Internet. Fragmentation is used when a network segment has a smaller MTU than the packet size. IP< fragmentation is necessary if the MTU of the outgoing device is smaller than the packet size. for more details about IP fragmentation. TCP segmentation, however, is far more common because it is the underlying mechanism for the transport of streaming data that occurs in most network traffic Skb_shinfo can also be used to hold TCP segments. When used this way, skb_shinfo contains an array of pointers to memory mapped pages containing TCP segments. TCP provides a streaming service that makes the data look like an uninterrupted sequence of bytes even though the data must be split up to fit into IP packets. See “Sending the Data from the Socket through UDP and TCP," for more information about TCP segmentation. When a socket buffer is cloned, skb_shared_info is copied to the new buffer.

The first field in the shared info structure, dataref, indicates that a socket buffer is cloned if the value is nonzero because it is incremented each time a socket buffer is cloned. (The cloned field in the socket buffer is also set to one when a socket buffer is cloned.) The next field in the shared info structure, frag_list, is used by the IP fragment reassembly facility. This is how each fragment on the list can share the same IP header. The IP headers for each fragment are almost identical. They differ only in the fragment ID field, the fragment offset, and the checksum. When the input processing in the IP protocol discovers that an incoming skb is actually an IP fragment, it places the packet on a special list containing the fragments. IP moves this list (without copying< the actual packet data) into a single datagram consisting of a head socket buffer followed by a list of socket buffers, each of which points to a single fragment. The frag_list field in the shared info area points to the list of socket buffers containing the fragments. Although each IP fragment occupies a separate socket buffer, the skb_shinfo structure itself is copied to each socket buffer when it is created. See Figure for an illustration of a socket buffer that points to an array of IP fragments.

Sk_buff with fragments.

Sk_buff with fragments.

The second field in the shared info structure, nr_frag, is not used for IP fragmentation; instead, it is for TCP segmentation. A socket buffer containing segments is indicated when this field contains a nonzero value. The value of nr_frag corresponds to the number of segment pages attached to the socket buffer, and the shared_info structure contains pointers to the segment pages in the field frags. The array of frags is placed in memory immediately after the nr_frag field. It can contain as many as six pages in the array. The actual number of locations in the array will depend on the hardware architecture and the configured page size, PAGE_SIZE. Each of the elements in the frags array points to a memory-mapped page in the Linux virtual page table array.

When a socket buffer created by TCP contains a chain of segments, each sequential segment’s data is in a separate memory mapped page pointed to by a location in the frags array. This is considerably more efficient than maintaining a redundant sk_buff structure for each TCP segment. Once TCP is in the ESTABLISHED state, the packet headers for subsequent segmentsn are nearly identical so the packet header can be shared among each of the segments. Processing time is saved during processing by not requiring Linux to copy a complete IP header for each segment.

Each location in the frags array consists of the skb_frag_t structure defined in file linux/include/linux/skbuff.h. The size of the frags array is calculated to hold a total of 64 Kbytes of data.

Page is a pointer to a page table entry. The next field, offset, is the offset from the start of the page to where the data begins. Size is the length of data in page.

Figure is an illustration of a socket buffer containing an array of TCP segments.

Sk_buff with segments.

Sk_buff with segments.

All rights reserved © 2020 Wisdom IT Services India Pvt. Ltd Protection Status

Linux Topics