1. IntroductionI have always thought that the duration of a socket in TIME_WAIT state under Linux is about 60 seconds. There are actually sockets online whose TIME_WAIT exceeds 100s. Because this involves the analysis of a complex bug that occurred recently. Therefore, the author went to the Linux source code to find out. 2. First, let's introduce the Linux environmentThe TIME_WAIT parameter is usually related to quintuple reuse. Here, the author first gives the kernel parameter settings of the machine to avoid confusion with other issues.
As you can see, we set tcp_tw_recycle to 0, which can avoid the problem caused by enabling tcp_tw_recycle and tcp_timestamps at the same time under NAT. 3. TIME_WAIT state transition diagramWhen talking about the TIME_WAIT state of a socket, it is necessary to show the TCP state transition diagram: The duration is 2MSL as shown in the figure. However, the figure does not indicate how long 2MSL is, but the author found the following macro definition in the Linux source code. #define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT * state, about 60 seconds */ As the English wording literally means, the TIME_WAIT state is destroyed after 60 seconds, so 2MSL must be 60 seconds, right? 4. Is the duration really as defined by TCP_TIMEWAIT_LEN?The author has always believed that the socket in TIME_WAIT state of 60 seconds can be recycled by the Kernel. Even the author did an experiment himself by telnetting a port number, artificially creating TIME_WAIT, and timing it himself, and it could be recycled in about 60 seconds. But when tracking down a problem, I found that TIME_WAIT can sometimes last up to 111s, otherwise I can't explain the phenomenon at all. This forced the author to overturn his own conclusion and re-read the kernel source code for TIME_WAIT state processing. Of course, this investigation will also be written into a blog and shared, so stay tuned_. 5. TIME_WAIT timer source codeWhen talking about when TIME_WAIT can be recycled, we have to talk about the TIME_WAIT timer, which is specifically used to destroy expired TIME_WAIT Sockets. When each socket enters TIME_WAIT, it will inevitably go through the following code branch:
Since our kernel does not enable tcp_tw_recycle, the final call is: /* TCP_TIMEWAIT_LEN 60 * HZ here */ inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN, TCP_TIMEWAIT_LEN); Okay, let's press this core function. 5.1、inet_twsk_scheduleBefore reading the source code, take a look at the general processing flow. The Linux kernel uses the time wheel to handle expired TIME_WAIT sockets, as shown in the following figure: The kernel divides the 60s time into 8 slots (INET_TWDR_RECYCLE_SLOTS), and each slot handles 7.5 (60/8) range of sockets in time_wait state. void inet_twsk_schedule(struct inet_timewait_sock *tw,struct inet_timewait_death_row *twdr,const int timeo, const int timewait_len) { ...... // Calculate the slot of the time wheel slot = (timeo + (1 << INET_TWDR_RECYCLE_TICK) - 1) >> INET_TWDR_RECYCLE_TICK; ...... // Logic of the slow time wheel. Since TCP\_TW\_RECYCLE is not enabled, timeo is always 60*HZ (60s) // All follow the slow_timer logic if (slot >= INET_TWDR_RECYCLE_SLOTS) { /* Schedule to slow timer */ if (timeo >= timewait_len) { slot = INET_TWDR_TWKILL_SLOTS - 1; } else { slot = DIV_ROUND_UP(timeo, twdr->period); if (slot >= INET_TWDR_TWKILL_SLOTS) slot = INET_TWDR_TWKILL_SLOTS - 1; } tw->tw_ttd = jiffies + timeo; // twdr->slot is the slot currently being processed // Under TIME_WAIT_LEN, this logic is generally 7 slot = (twdr->slot + slot) & (INET_TWDR_TWKILL_SLOTS - 1); list = &twdr->cells[slot]; } else{ // Run the short timer. Due to space constraints, I will not go into details here...... } ...... /* twdr->period 60/8=7.5 */ if (twdr->tw_count++ == 0) mod_timer(&twdr->tw_timer, jiffies + twdr->period); spin_unlock(&twdr->death_lock); } As can be seen from the source code, the timeout we passed in is TCP_TIMEWAIT_LEN. Therefore, each time a socket that has just entered the TIME_WAIT state will be linked to the slot farthest from the current processing slot (+7) for processing. As shown in the following figure: If the kernel keeps generating TIME_WAIT, the entire slow timer wheel will look like the following figure: All slots are filled with sockets in TIME_WAIT state. 5.2. Specific cleanup functionThe processing function passed in each time inet_twsk_schedule is called is: /*The tcp_death_row in the parameter is the structure that carries the time wheel processing function*/ inet_twsk_schedule(tw,&tcp_death_row,TCP_TIMEWAIT_LEN,TCP_TIMEWAIT_LEN) /* Specific processing structure */ struct inet_timewait_death_row tcp_death_row = { ...... /* slow_timer time wheel processing function*/ .tw_timer = TIMER_INITIALIZER(inet_twdr_hangman, 0, (unsigned long)&tcp_death_row), /* slow_timer time wheel auxiliary processing function*/ .twkill_work = __WORK_INITIALIZER(tcp_death_row.twkill_work, inet_twdr_twkill_work), /* Short time wheel processing function*/ .twcal_timer = TIMER_INITIALIZER(inet_twdr_twcal_tick, 0, (unsigned long)&tcp_death_row), }; Since we are mainly concerned with the processing time set to TCP_TIMEWAIT_LEN (60s), we will directly examine the slow_timer time wheel processing function, that is, inet_twdr_hangman. This function is relatively short: void inet_twdr_hangman(unsigned long data) { struct inet_timewait_death_row *twdr; unsigned int need_timer; twdr = (struct inet_timewait_death_row *)data; spin_lock(&twdr->death_lock); if (twdr->tw_count == 0) goto out; need_timer = 0; // If the number of time_wait sockets processed by this slot has reached 100 and has not been processed yet if (inet_twdr_do_twkill_work(twdr, twdr->slot)) { twdr->thread_slots |= (1 << twdr->slot); // Submit the remaining tasks to the work queue for processing schedule_work(&twdr->twkill_work); need_timer = 1; } else { /* We purged the entire slot, anything left? */ // Determine whether to continue processing if (twdr->tw_count) need_timer = 1; // If the current slot is processed, jump to the next slot twdr->slot = ((twdr->slot + 1) & (INET_TWDR_TWKILL_SLOTS - 1)); } // If further processing is required, run this function again after 7.5s if (need_timer) mod_timer(&twdr->tw_timer, jiffies + twdr->period); out: spin_unlock(&twdr->death_lock); } Although simple, there are many details in this function. The first detail is in inet_twdr_do_twkill_work. In order to prevent this slot from having too many time_waits and blocking the current process, it will return after processing 100 time_wait sockets. The remaining time_wait of this slot will be handled by the Kernel's work_queue mechanism. It is worth noting. Since the exact time is not determined in the slow_timer time wheel, all are deleted directly. So when it is the turn of a certain slot, for example, the slot of 52.5-60s, all time_wait of 52.5-60s are cleared directly. This is true even if time_wait has not reached 60s. The small time wheel (tw_cal) will accurately determine the time. Due to space constraints, I will not explain it in detail here.
5.3. Make an assumption firstWe assume that the data of a time wheel can be processed within one slot interval, that is, (60/8=7.5). Since the system has a tcp_tw_max_buckets setting, if the setting is reasonable, this assumption is still relatively reliable.
5.4. If TIME_WAIT <= 100 in a slotIf a slot's TIME_WAIT <= 100, naturally, our processing function will not enable work_queue. At the same time, slot+1 is added so that the next slot can be processed in the next period. As shown in the following figure: 5.5. If TIME_WAIT>100 in a slotIf the TIME_WAIT of a slot is greater than 100, the kernel will hand over the remaining tasks to the work_queue for processing. At the same time, the slot remains unchanged! That is to say, when the next period (7.5s later) arrives, the same slot will be processed. According to our assumption, the slot has been processed at this time, so the slot will be pushed forward at 7.5s. In other words, assuming that slot 0 is at the beginning, it takes 15 seconds to actually process slot 1! Assuming that the TIME_WAIT of each slot is > 100, it takes 15 seconds to process each slot. For this situation, the author wrote a program to simulate it. public class TimeWaitSimulator { public static void main(String[] args) { double delta = (60) * 1.0 / 8; // 0 means start of purge, 1 means purge completed // After purge completed, slot moves forward int startPurge = 0; double sum = 0; int slot = 0; while (slot < 8) { if (startPurge == 0) { sum += delta; startPurge = 1; if (slot == 7) { // Because it is assumed that after entering the work_queue, it will be cleaned up quickly // So when slot is 7, there is no need to wait for the last purge process of 7.5s System.out.println("slot " + slot + " has reached the last " + sum); break; } } if (startPurge == 1) { sum += delta; startPurge = 0; System.out.println("slot " + "move to next at time " + sum); //After cleaning, slot should move forward slot++; } } } } The results are shown below:
That is to say, when the processing reaches the time wheel of 52.5-60s, in fact, 112.5s have passed outside, and the processing is completely delayed. However, since the Socket (inet_timewait_sock) in the TIME_WAIT state occupies very little memory, it will not have much impact on the system's available resources. However, this will cause a pitfall in the NAT environment, which is also the bug mentioned earlier in the author's article. That is, if a socket in TIME_WAIT state can process the current slot within a period (7.5s), it can exist for a maximum of 112.5s! If the processing is not completed within 7.5 seconds, the response time wheel will have to continue to rotate with one or more perods. However, due to the limitation of tcp_tw_max_buckets, it should be impossible to meet such stringent conditions. 5.6. PAWS (Protection Against Wrapped Sequences) extends TIME_WAITIn fact, the above conclusion is not rigorous enough. The TIME_WAIT time can be further extended! Look at this source code: enum tcp_tw_status tcp_timewait_state_process(struct inet_timewait_sock *tw, struct sk_buff *skb, const struct tcphdr *th) { ...... if (paws_reject) NET_INC_STATS_BH(twsk_net(tw), LINUX_MIB_PAWSESTABREJECTED); if (!th->rst) { /* In this case we must reset the TIMEWAIT timer. * * If it is ACKless SYN it may be both old duplicate * and new good SYN with random sequence number <rcv_nxt. * Do not reschedule in the last case. */ /* If a packet that fails the wraparound check arrives, or an ack packet * resets the timer to a new 60 seconds */ if (paws_reject || th->ack) inet_twsk_schedule(tw, &tcp_death_row, TCP_TIMEWAIT_LEN, TCP_TIMEWAIT_LEN); /* Send ACK. Note, we do not put the bucket, * it will be released by caller. */ /* Send the ACK that should be returned in the current time wait state to the other end */ return TCP_TW_ACK; } inet_twsk_put(tw); /* Note that the packet verified by paws will return tcp_tw_success, so that the socket quintuple in time_wait state can also be reused successfully after the three-way handshake * / return TCP_TW_SUCCESS; } The above logic is shown in the following figure: Note the return TCP_TW_SUCCESS at the end of the code. The packet that passes the PAWS check will return TCP_TW_SUCCESS, so that the Socket (quintuple) in the TIME_WAIT state can also be reused successfully after the three-way handshake! The above is the detailed content of analyzing the duration of TIME_WAIT from the Linux source code. For more information about the duration of TIME_WAIT in the Linux source code, please pay attention to other related articles on 123WORDPRESS.COM! You may also be interested in:
|
<<: Use of Vue3 pages, menus, and routes
>>: Advantages and disadvantages of common MySQL storage engines
Preface Using Docker and VS Code can optimize the...
1. Analytical thinking 1. Eliminate the machine...
This article shares the installation tutorial of ...
This article shares the specific code of JavaScri...
Rendering If you want to achieve the effect shown...
Table of contents background Problem Description ...
The following is a bar chart using Flex layout: H...
Table of contents Pygame functions used Creating ...
Variables defined in SASS, the value set later wi...
Table of contents introduction Install Homebrew I...
Background Recently, I encountered such a problem...
MySQL download and installation (version 8.0.20) ...
CSS: Copy code The code is as follows: html,body{ ...
1. First go to the official website to download t...
describe This article introduces a method to impl...