Detailed explanation of the solution to the nginx panic problem

Detailed explanation of the solution to the nginx panic problem

Regarding the nginx panic problem, we first need to understand that during the nginx startup process, the master process will listen to the ports specified in the configuration file, and then the master process will call the fork() method to create each child process. According to the working principle of the process, the child process will inherit all the memory data and the listening ports of the parent process, which means that the worker process will also listen to each port after startup. Regarding the herd shock, it means that when a client has a request for a new connection, it will trigger the connection establishment event of each worker process, but only one worker process can handle the event normally, and the other worker processes will find that the event has expired, and thus re-enter the waiting state. This phenomenon of "startling" all worker processes due to an event is called the herd-startling problem. Obviously, if all worker processes are triggered, it will consume a lot of resources. This article mainly explains how nginx handles the herd problem.

1. Solution

In the previous article, we mentioned that when each worker process is created, the ngx_worker_process_init() method is called to initialize the current worker process. There is a very important step in this process, that is, each worker process calls the epoll_create() method to create a unique epoll handle for itself. For each port that needs to be listened to, there is a corresponding file descriptor. The worker process only adds the file descriptor to the epoll handle of the current process through the epoll_ctl() method and listens for the accept event. Only then will it be triggered by the client's connection establishment event and process the event. It can also be seen here that if the worker process does not add the file descriptor corresponding to the port to be listened to to the epoll handle of the process, then the corresponding event cannot be triggered. Based on this principle, nginx uses a shared lock to control whether the current process has the authority to add the port to be monitored to the epoll handle of the current process. In other words, only the process that obtains the lock will monitor the target port. In this way, it is ensured that only one worker process will be triggered each time an event occurs. The following figure shows a schematic diagram of the worker process working cycle:

One thing that needs to be explained about the process in the figure is that each worker process will try to acquire the shared lock after entering the loop. If it fails to acquire the lock, the file descriptor of the monitored port will be removed from the epoll handle of the current process (even if it does not exist). The main purpose of doing this is to prevent the loss of client connection events. Even though this may cause a small amount of panic problem, it is not serious. Imagine that, according to theory, the file descriptor of the monitored port is removed from the epoll handle when the current process releases the lock. Then, before the next worker process acquires the lock, the file descriptors corresponding to each port will not be monitored by any epoll handle during this period, which will cause the loss of events. If, on the other hand, the monitored file descriptors are removed only when the lock acquisition fails as shown in the figure, since the lock acquisition fails, it means that there must be a process that has already listened to these file descriptors, so it is safe to remove them at this time. However, one problem this will cause is that, according to the above figure, when the current process completes a loop, it will release the lock and then process other events. Note that the monitored file descriptor is not released during this process. At this time, if another process acquires the lock and listens to the file descriptor, then there are two processes listening to the file descriptor. Therefore, if a connection establishment event occurs on the client, two worker processes will be triggered. This problem is tolerable for two main reasons:

  1. The herd shock phenomenon that occurs at this time only triggers fewer worker processes, which is much better than shocking all worker processes every time;
  2. The main reason for this herd panic problem is that the current process releases the lock but does not release the monitored file descriptor. After releasing the lock, the worker process mainly handles the read and write events of the client connection and checks the flag. This process is very short. After processing, it will try to obtain the lock, and then release the monitored file descriptor. In comparison, the worker process that obtains the lock takes longer to wait for the client's connection establishment event, so the probability of a herd panic problem is relatively small.

2. Source code explanation

The method of initializing the event of the worker process is mainly carried out in the ngx_process_events_and_timers() method. Let's take a look at how this method handles the entire process. The following is the source code of this method:

void ngx_process_events_and_timers(ngx_cycle_t *cycle) {
 ngx_uint_t flags;
 ngx_msec_t timer, delta;

 if (ngx_trylock_accept_mutex(cycle) == NGX_ERROR) {
  return;
 }

 // Start processing events here. For the kqueue model, it points to the ngx_kqueue_process_events() method.
 // For the epoll model, it points to the ngx_epoll_process_events() method // The main function of this method is to get the event list in the corresponding event model and then add the event to ngx_posted_accept_events
 // queue or ngx_posted_events queue (void) ngx_process_events(cycle, timer, flags);

 // Start processing the accept event here and hand it over to the ngx_event_accept() method of ngx_event_accept.c;
 ngx_event_process_posted(cycle, &ngx_posted_accept_events);

 // Start releasing the lock if (ngx_accept_mutex_held) {
  ngx_shmtx_unlock(&ngx_accept_mutex);
 }

 // If it does not need to be processed in the event queue, process the event directly // For event processing, if it is an accept event, it will be handed over to the ngx_event_accept() method of ngx_event_accept.c for processing;
 // If it is a read event, it will be handled by the ngx_http_wait_request_handler() method of ngx_http_request.c;
 // For events that have been processed, they will eventually be handled by the ngx_http_keepalive_handler() method of ngx_http_request.c.

 // Start processing other events except accept event ngx_event_process_posted(cycle, &ngx_posted_events);
}

In the above code, we omitted most of the checking work and left only the skeleton code. First, the worker process calls the ngx_trylock_accept_mutex() method to acquire the lock. If the lock is acquired, it will listen to the file descriptors corresponding to each port. The ngx_process_events() method will then be called to process the events monitored in the epoll handle. Then the shared lock is released, and finally the read and write events of the connected clients are processed. Let's take a look at how the ngx_trylock_accept_mutex() method acquires a shared lock:

ngx_int_t ngx_trylock_accept_mutex(ngx_cycle_t *cycle) {
 // Try to use CAS algorithm to obtain shared lock if (ngx_shmtx_trylock(&ngx_accept_mutex)) {

  // ngx_accept_mutex_held is 1, indicating that the current process has acquired the lock if (ngx_accept_mutex_held && ngx_accept_events == 0) {
   return NGX_OK;
  }

  // Here, the file descriptor of the current connection is registered to the queue of the corresponding event, such as the change_list array of the kqueue model. // When nginx enables each worker process, by default, the worker process will inherit the socket handle monitored by the master process.
  // This leads to a problem, that is, when a port has a client event, all processes listening to the port will be woken up.
  // However, only one worker process can successfully handle the event, and other processes find that the event has expired after being awakened.
  // Therefore, it will continue to enter the waiting state, which is called the "herd shock" phenomenon.
  // The way nginx solves the herd panic phenomenon is through the shared lock method here, that is, only the worker process that obtains the lock can handle // client events, but in fact, the worker process re-adds the listening events of each port for the current worker process in the process of obtaining the lock.
  // Other worker processes will not monitor. That is to say, only one worker process will listen to each port at the same time.
  // This avoids the "herd shock" problem.
  // The ngx_enable_accept_events() method here is to re-add listening events for each port for the current process.
  if (ngx_enable_accept_events(cycle) == NGX_ERROR) {
   ngx_shmtx_unlock(&ngx_accept_mutex);
   return NGX_ERROR;
  }

  // The flag indicates that the lock has been successfully acquired ngx_accept_events = 0;
  ngx_accept_mutex_held = 1;

  return NGX_OK;
 }

 // The lock acquisition failed before, so we need to reset the state of ngx_accept_mutex_held and clear the events of the current connection if (ngx_accept_mutex_held) {
  // If the ngx_accept_mutex_held of the current process is 1, reset it to 0 and delete the monitoring // events of the current process on each port if (ngx_disable_accept_events(cycle, 0) == NGX_ERROR) {
   return NGX_ERROR;
  }

  ngx_accept_mutex_held = 0;
 }

 return NGX_OK;
}

In the above code, there are essentially three things done:

  1. Try to use the CAS method to obtain a shared lock through the ngx_shmtx_trylock() method;
  2. After acquiring the lock, the ngx_enable_accept_events() method is called to listen to the file descriptor corresponding to the target port;
  3. If the lock is not acquired, call the ngx_disable_accept_events() method to release the monitored file descriptor;

3. Summary

This article first explains the causes of the herd panic phenomenon, then introduces how nginx solves the herd panic problem, and finally explains how nginx handles the herd panic problem from the source code perspective.

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

<<:  Detailed explanation of the basic implementation principle of MySQL DISTINCT

>>:  Example analysis of the impact of MySQL index on sorting

Recommend

How to set static IP in CentOS7 on VirtualBox6 and what to note

Install CentOS 7 after installing VirtualBox. I w...

Two implementations of front-end routing from vue-router

Table of contents Mode Parameters HashHistory Has...

Detailed explanation of MySQL database triggers

Table of contents 1 Introduction 2 Trigger Introd...

The homepage design best reflects the level of the web designer

In the many projects I have worked on, there is b...

How to view the type of mounted file system in Linux

Preface As you know, Linux supports many file sys...

Solution to no Chinese input method in Ubuntu

There is no solution for Chinese input method und...

JS realizes the scrolling effect of announcement online

This article shares the specific code of JS to ac...

Vue uses openlayers to load Tiandi Map and Amap

Table of contents 1. World Map 1. Install openlay...

Things to note when designing web pages for small-screen mobile devices

The reason is that this type of web page originate...

Split and merge tables in HTML (colspan, rowspan)

The code demonstrates horizontal merging: <!DO...

Code for implementing simple arrow icon using div+CSS in HTML

In web design, we often use arrows as decoration ...

Navicat imports csv data into mysql

This article shares with you how to use Navicat t...