How to implement dynamic automatic up and down of upstream servers without reload based on nginx

How to implement dynamic automatic up and down of upstream servers without reload based on nginx

There are many introductions about nginx on the Internet. Here we will talk about how to dynamically bring upstream services (such as the Java1 service in the figure below) online and offline through nginx without a "gateway".

The traditional approach is to manually modify the upstream file of nginx, comment out or mark the Java1 configuration as down, and then reload nginx to make it take effect. Of course, you can make a script to automate the modification. However, for a busy nginx, a hasty reload will result in a slow response at best, or an avalanche of traffic loss at worst.

So how can nginx dynamically load upstream configuration? There are generally 3 solutions online:

  • Combine Lua script with nginx, which is the Openresty solution;
  • Add an additional port to each nginx server and modify the upstream by calling this port each time;
  • Add a database to nginx, store the upstream data in the database, and modify the upstream configuration by modifying the database data.

For a running production environment nginx, the third option is undoubtedly the lowest cost. Let’s take a closer look:

Technical solution: nginx1.16+nginx_upstream_check_module+nginx-upsync-module+consul

illustrate:

  • The consul here is the database mentioned above. It is not only a key/value type library, but also has a concise web management page that can easily manage key-value data;
  • nginx_upstream_check_module is Alibaba's open source health check module for upstream services;
  • nginx-upsync-module is an open source module of Weibo that can be combined with consul/etcd.

The following discusses the implementation details one by one through three aspects: consul cluster deployment, nginx transformation, and creating upstream data.

1. Deploy consul cluster

Official website: https://www.consul.io/

Assume that a Consul cluster is composed of the following three machines:

192.168.21.11
192.168.21.12
192.168.21.13
192.168.21.14 # This IP is the proxy IP, used to proxy the above 3 machines

1. Preparation

Download the consul compressed package from the official website and upload it to the above three servers respectively. The consul version here is 1.8.4:

unzip consul_1.8.4_linux_amd64.zip
mv consul /usr/local/bin/
[root@nginx-11 tmp]# consul
Usage: consul [--version] [--help] <command> [<args>]

Available commands are:
 acl Interact with Consul's ACLs
 agent Runs a Consul agent
 Interact with the catalog
 ....

Create consul data, log, and configuration file directories on the three machines respectively:

mkdir -p /data/consul/{data,log}
mkdir /etc/consul

2. Generate consul configuration file

The following takes the configuration file of 192.168.21.11 as an example:

[root@nginx-11 tmp]# cat /etc/consul/config.json
{
 "datacenter":"dc1",
 "primary_datacenter":"dc1",
 "bootstrap_expect":3,
 "start_join":[
 "192.168.21.11",
 "192.168.21.12",
 "192.168.21.13"
 ],
 "retry_join":[
 "192.168.21.11",
 "192.168.21.12",
 "192.168.21.13"
 ],
 "advertise_addr": "192.168.21.11",
 "bind_addr": "192.168.21.11",
 "client_addr": "0.0.0.0",
 "server":true,
 "connect":{
 "enabled":true
 },
 "node_name":"192.168.21.11",
 "ui": true,
 "data_dir":"/data/consul/data",
 "enable_script_checks":false,
 "enable_local_script_checks":true,
 "log_file":"/data/consul/log/",
 "log_level":"info",
 "log_rotate_bytes":100000000,
 "log_rotate_duration":"24h",
 "encrypt":"a2zC4ItisuFdpl7IqwoYz3GqwA5W1w2CxjNmyVbuhZ4=",
 "acl":{
 "enabled":true,
 "default_policy":"deny",
 "enable_token_persistence":true,
 "enable_key_list_policy":true,
 "tokens":{
 "master":"6c95012f-d086-4ef3-b6b9-35b60f529bd0"
 }
 }
}

illustrate:

  • In the configuration files of the other two servers, modify the advertise_addr, bind_addr, and node_name values ​​to the corresponding IP addresses. Other configurations do not need to be changed.
  • The parameter "bootstrap_expect":3 means that you want to deploy a cluster of 3 nodes. Please configure it according to the actual situation.
  • The values ​​corresponding to encrypt and tokens should be consistent on the three machines. The encrypt value can be generated by the consul keygen command, the token value can be generated by the uuidgen command, or both can be generated by these two tools;
  • For understanding of related parameters, please refer to: https://juejin.im/post/6844903860717240334

3. Create a consul cluster

Just start consul on three machines respectively:

consul agent -config-file=/etc/consul/config.json &

You can access the consul background interface by visiting http://192.168.21.14:8500 (or any IP:Port) through the browser, and enter the tokens value of the master above to see the specific content.

Notice:

  • In the acl configuration in the above configuration file, the "enable_key_list_policy" configuration must be added, and the value must be set to "true", otherwise anonymous users may not be able to access the consul configuration content.

4. Create consul access permissions for non-administrators

1) Create an access policy

Access consul through the browser, click ACL -> Access Controls -> Policies -> Create in the upper right corner to create a read-only "upstreams" kv policy named: readonlykv, and the Rules content is:

key_prefix "upstreams/" {
 policy = "list"
}

Create a kv policy that can write "upstreams", named: writekv, and the rules content is:

key_prefix "upstreams/" {
 policy = "write"
}

The screenshots of the two created strategies are as follows:

2) Create an access token

Add access to the read-only "upstreams" kv policy in the anonymous user token to allow the nginx module to read the consul configuration anonymously:
Click 00000002 and select readonlykv in Policies.

Create a token that can write to the "upstreams" kv, which is used by the script to modify the consul configuration with this token:
Access consul through the browser, click ACL -> Access Controls -> Tokens -> Create in the upper right corner, and select writekv in Policies.
The screenshots of the two tokens that have been modified/created are as follows:

At this point, the Consul cluster deployment is complete.

2. nginx transformation

1. Upgrade nginx

Download nginx related modules:

nginx-upsync-module: https://github.com/weibocom/nginx-upsync-module

nginx_upstream_check_module: https://github.com/xiaokai-wang/nginx_upstream_check_module

Notice:

  • When downloading the nginx_upstream_check_module module, please be sure to download it from xiaokai-wang's GitHub, and never download it from Alibaba's official GitHub, otherwise the version is incompatible and cannot be compiled;
  • Please back up your data before upgrading Nginx.

1) Patch nginx_upstream_check_module

cd nginx-1.16.0
patch -p1 < /usr/local/src/nginx-1.16/nginx_upstream_check_module-master/check_1.12.1+.patch

Note: I put the two downloaded nginx module source packages in the /usr/local/src/nginx-1.16/ path.

2) Compile nginx

./configure --prefix=/usr/local/nginx --add-module=/usr/local/src/nginx-1.16/nginx_upstream_check_module-master --add-module=/usr/local/src/nginx-1.16/nginx-upsync-module-master ...

illustrate:

I installed nginx under /usr/local/;

The ellipsis after the command is the module you want to install. Please add it according to the actual situation. You can see which modules are currently installed through nginx -V, and then add them.

3) Install nginx

make
# If it is a smooth upgrade, do not execute make install in this step

4) Upgrade nginx

#Back up the nginx binary file again mv /usr/local/nginx/sbin/nginx /usr/local/nginx/sbin/nginx16.old
#Replace the old nginx binary with the new one cp objs/nginx /usr/local/nginx/sbin/
#View the installed nginx module /usr/local/nginx/sbin/nginx -V

Reminder: After testing, it was found that when nginx1.6 is reloaded or the kill -USR2 command is sent, the old nginx process will not exit. Nginx needs to be restarted for it to take effect. I don’t know if it is a bug.

/usr/local/nginx/sbin/nginx -s stop
#If the old nginx process has not been launched, use kill -9 to force kill it ps -ef |grep nginx
#Open nginx
/usr/local/nginx/sbin/nginx 
# Description: Send the kill -USR2 command kill -USR2 `cat /usr/local/nginx/logs/nginx.pid`

At this point, the nginx upgrade is complete.

2. Configure nginx

1) First, configure the nginx display page to quickly understand the running status of nginx

cat nginx.conf
 server {
 listen 80;
 server_name localhost;

 # Display upstream in server 80, which is equivalent to global configuration. No configuration is required for other configuration files. # Browser access http://nginx-ip:80/upstream_show to view the specific configuration information of nginx upstream location = /upstream_show {
  upstream_show;
 }

 # Display the check details in server 80, which is equivalent to global configuration. Other configuration files do not need to be configured. # Browser access http://nginx-ip:80/status to view the health status of the upstream service. Red means there is a problem, and white means it is normal location /status {
  check_status;
 }

 # Display the status of nginx in server 80, which is equivalent to global configuration. Other configuration files do not need to be configured. # nginx native function location /NginxStatus {
  stub_status on;
  access_log off;
  allow 192.168.0.0/16;
  deny all;
 }
 }
     # Introduce specific server configuration. Each server needs to configure the nginx-upsync-module module configuration include /usr/local/nginx/conf/vhosts/*.conf;

2) Server configuration

http method detection

upstream rs1 {
 server 127.0.0.1:11111;
 upsync 192.168.21.14:8500/v1/kv/upstreams/rs1/ upsync_timeout=6m upsync_interval=500ms upsync_type=consul strong_dependency=off;
 upsync_dump_path /usr/local/nginx/conf/servers/servers_rs1.conf;

 check interval=1000 rise=2 fall=2 timeout=3000 type=http default_down=false;
 check_http_send "HEAD /health.htm HTTP/1.0\r\n\r\n";
 check_http_expect_alive http_2xx http_3xx;
}

server {
 listen 80;
...

TCP detection (TCP is the default detection method)

upstream rs2 {
 server 127.0.0.1:11111;
 upsync 192.168.21.14:8500/v1/kv/upstreams/rs2/ upsync_timeout=6m upsync_interval=500ms upsync_type=consul strong_dependency=off;
 upsync_dump_path /usr/local/nginx/conf/servers/servers_rs2.conf;

 check interval=1000 rise=2 fall=2 timeout=3000 type=tcp default_down=false;
}

server {
 listen 80;
...

illustrate:

  • It is recommended to use http detection, which is more accurate than tcp. This detection method is provided by nginx_upstream_check_module, which is powerful and has simple parameter explanations: a health check is performed every 1 second, with a timeout of 3 seconds each time. If two consecutive health checks succeed, the upstream service is considered healthy and will be put online or remain online; if two consecutive health checks fail, the upstream service is considered unhealthy and will be removed from the line. "/health.htm" is the health check interface of the upstream service, which is used to determine whether the service is healthy. For detailed parameter explanation, please refer to: http://tengine.taobao.org/document_cn/http_upstream_check_cn.html
  • Simple explanation of parameters: The nginx-upsync-module module will check the configuration from the consul database every 0.5 seconds, and each timeout is 6 minutes. For detailed parameter explanation, please refer to: https://github.com/weibocom/nginx-upsync-module
  • Nginx will create a servers subdirectory under the /usr/local/nginx/conf directory, and will automatically create relevant server configuration files under this subdirectory.

At this point, the nginx configuration modification is complete.

3. Create upstream data (consul key-value pairs)

You can create upstream data through web pages or scripts as follows:

1. Web page operation

If you need to create a directory, add "/" after the field to be created, such as: upstreams/.

In "Key/Value", you must first create the "upstreams" directory (with the letter s at the end), and then create the corresponding server name. The screenshot is as follows:

2. Command line operation

When using the command line, you do not need to create the "upstreams/" directory first. The command will automatically create the directory and server data.

The following takes the upstream service Java1 (IP is 192.168.20.100, port number is 8080, upstream group name is rs1) as an example:

Add a record

curl -X PUT http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token

After the above command is executed, an nginx upstream default configuration information will be formed, namely:

server 192.168.20.100:8080 weight=1 max_fails=2 fail_timeout=10s;

You can customize weights and other values ​​through the following command:

curl -X PUT -d "{\"weight\":100, \"max_fails\":2, \"fail_timeout\":10}" http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token
# or curl -X PUT -d '{"weight":100, "max_fails":2, "fail_timeout":10}' http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token

Deleting Records

curl -X DELETE http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token

Update weights

curl -X PUT -d "{\"weight\":100, \"max_fails\":2, \"fail_timeout\":10}" http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token
# or curl -X PUT -d '{"weight":100, "max_fails":2, "fail_timeout":10}' http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token

Offline service

curl -X PUT -d "{\"weight\":2, \"max_fails\":2, \"fail_timeout\":10, \"down\":1}" http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token
# or curl -X PUT -d '{"weight":2, "max_fails":2, "fail_timeout":10, "down":1}' http://192.168.21.14:8500/v1/kv/upstreams/rs1/192.168.20.100:8080?token=$token

Check which upstream servers are under upstream rs1

curl http://192.168.21.14:8500/v1/kv/upstreams/rs1?recurse

It is recommended to use command line operations and assemble the command lines into scripts to implement DevOps

4. Some insights

During the transformation of the dynamic discovery solution, I encountered many problems. The most difficult one was that nginx kept reporting errors in the test environment, and the upstream data could not be downloaded completely. After various investigations, no problems were found. In the middle, I suspected that it was a problem with consul. I changed to etcd but still reported the same error. Finally, through packet capture and tracking, I found that the Linux kernel parameters were improperly configured, which caused the queue overflow and the failure of the TCP three-way handshake, affecting the communication between nginx and consul.

Many solutions are theoretically fine, and some people have even successfully used them. However, if you actually implement them yourself, you will still encounter various problems, some of which are even fatal. At this time, you need to solve them patiently. I hope that everyone will give it a try when they see this article. If you encounter any problems, please calm down and patiently troubleshoot.

Another thing is that many people say that operation and maintenance does not create value. I think this is wrong. There are many values ​​that operation and maintenance need to reflect, and SRE is one of them.

This concludes this article about how to dynamically and automatically log upstream servers online and offline without reloading based on nginx. For more information about how to automatically log upstream servers online with nginx, please search previous articles on 123WORDPRESS.COM or continue browsing the following related articles. I hope you will support 123WORDPRESS.COM in the future!

You may also be interested in:
  • CentOS 7.2 builds nginx web server to deploy uniapp project
  • Implementation of deploying vue project to nginx/tomcat server
  • Linux server nginx uninstall and installation tutorial
  • Setting up a proxy server using nginx
  • Solution to nginx hiding version number and WEB server information
  • Detailed explanation of Nginx server load balancing strategy (6 types)
  • How to configure nginx to ensure that the frps server and web share port 80
  • How to configure multiple domain names on one nginx server

<<:  How to write beautiful HTML code

>>:  Detailed explanation of several ways to remove the gap between inline-block elements in CSS

Recommend

Getting started with JavaScript basics

Table of contents 1. Where to write JavaScript 2....

Detailed explanation of the correct use of the if function in MySQL

For what I am going to write today, the program r...

Navicat for MySql Visual Import CSV File

This article shares the specific code of Navicat ...

In-depth understanding of JavaScript callback functions

Table of contents Preface Quick Review: JavaScrip...

The latest virtual machine VMware 14 installation tutorial

First, I will give you the VMware 14 activation c...

Analysis of the Principles of MySQL Slow Query Related Parameters

MySQL slow query, whose full name is slow query l...

Summary of MySQL's commonly used concatenation statements

Preface: In MySQL, the CONCAT() function is used ...

MySQL SQL statement analysis and query optimization detailed explanation

How to obtain SQL statements with performance iss...

This article will help you understand the life cycle in Vue

Table of contents 1. beforeCreate & created 2...

Install Mininet from source code on Ubuntu 16.04

Mininet Mininet is a lightweight software defined...

Vue3.0 implements the magnifying glass effect case study

The effect to be achieved is: fixed zoom in twice...

A friendly alternative to find in Linux (fd command)

The fd command provides a simple and straightforw...

mysql5.7.18.zip Installation-free version configuration tutorial (windows)

This is the installation tutorial of mysql5.7.18....