GNU Parallel is a shell tool for executing computational tasks in parallel on one or more computers. This article briefly introduces the use of GNU Parallel. This cpu is multi-core. Generally, two cores work like this: This is how quad cores work: Here is how the 16 cores work: Okay, it’s not dark anymore. If you continue to criticize Intel, I will be beaten. One weekend morning when I was bored, I spent half a day going through the man page and tutorial of gnu parallel. Haha, I have to say that this half day is well worth spending, because I feel it can save me more than half a day in the future. This article does not attempt to translate the gnu parallel man page or tutorial. Because there are ready-made translations, you can see them here or here. But after seeing the weird ::: and the strange {}{#}{.}{\} placeholders in parallel a few times ago, I backed off. Such ugly syntax is unattractive. Fortunately, I looked at a few examples to calm myself down, and then tried it myself, and found that it was really a magical tool. The main purpose of this article is to lure you into using this tool and tell you why and how to use it. why There is only one purpose for using gnu parallel, and that is to be fast! Fast installation (wget -O - pi.dk/3 || curl pi.dk/3/) | bash The author said it takes 10 seconds to install. The actual situation in the country may not be enough. But it doesn’t take too long. In fact, it is a single-file Perl script with more than 10,000 lines (yes, you read that right, all modules are in this file, this is a feature~). After that, I wrote fabric scripts and copied them directly to each node machine. Then chmod the execution permission. grep a 1G log. Using parallel, and directly grep without parallel. The result is obvious, a difference of 20 times. This is much more effective than using ack or ag optimization. Note: This is the result of executing on a 48-core server. how The easiest way is to use xargs. There is a parameter -P in xargs that can take advantage of multiple cores. For example: $ time echo {1..5} |xargs -n 1 sleep real 0m15.005s user 0m0.000s sys 0m0.000s This line of xargs passes each echo number as a parameter to sleep, so the total sleep time is 1+2+3+4+5=15 seconds. If the -P parameter is used to allocate the data to 5 cores, each core will sleep for 1, 2, 3, 4, and 5 seconds, so the total sleep time after execution is 5 seconds. $ time echo {1..5} |xargs -n 1 -P 5 sleep real 0m5.003s user 0m0.000s sys 0m0.000s The preparation is over. Generally, the first mode of parallel is to replace xargs -P. For example, compress all HTML files. find . -name '*.html' | parallel gzip --best Parameter transfer mode The first mode is to use parallel parameter passing. The commands coming in from the front of the pipeline are passed as parameters to the commands that follow and are executed in parallel. for example huang$ seq 5 | parallel echo pre_placeholder_{} pre_placehoder_1 pre_placehoder_2 pre_placehoder_3 pre_placehoder_4 pre_placehoder_5 {} is a placeholder used to hold the incoming parameters. In cloud computing operations, batch operations are often performed, such as creating 10 cloud hard disks. seq 10 | parallel cinder create 10 --display-name test_{} Create 50 cloud hosts Copy the code as follows: seq 50 | parallel nova boot --image image_id --flavor 1 --availability-zone az_id --nic vnetwork=private --vnc-password 000000 vm-test_{} Deleting cloud hosts in batches nova list | grep some_pattern | awk '{print $2}' | parallel nova delete Rewrite the for loop As you can see, I actually replaced many places where loops need to be written with parallel, and enjoyed the convenience brought by parallelism. Universal abstraction, shell loop: (for x in `cat list`; do do_something $x done) | process_output Can be written directly cat list | parallel do_something | process_output If there are too many contents in the loop (for x in `cat list`; do do_something $x [... 100 lines that do something with $x ...] done) | process_output It's better to write a script doit() { x=$1 do_something $x [... 100 lines that do something with $x ...] } export -f doit cat list | parallel doit And it can also avoid a lot of troublesome escapes. --pipe mode Another mode is parallel --pipe At this time, the command in front of the pipeline is not used as a parameter, but as standard input to the following command. For example: cat my_large_log |parallel --pipe grep pattern Without --pipe, each line in mylog is expanded into a grep pattern line command. With --pipe, the command is no different from cat mylog | grep pattern, except that the commands are distributed to different cores for execution. Okay, that’s the basic concept! The rest are just the specific usage of various parameters, such as how many cores to use, place_holder replacement, various ways to pass parameters, parallel execution but ensuring the order of result output (-k), and the magical cross-node parallel computing. Just look at the man page to find out. bonus Having a small tool to convert to parallel at hand, in addition to making your daily execution faster, another benefit is to test concurrency. Many interfaces will have some bugs under concurrent operations. For example, some judgements are made at the code level that the database is not locked. As a result, concurrent requests are made, and each request is judged to be passed when it reaches the server. When they are written together, the limit is exceeded. Previously, the for loop was executed serially and did not trigger these problems. But if you really want to test concurrency, you have to write a script or use Python's mulitiprocessing to encapsulate it. But I have parallel at hand, and added the following two aliases in bashrc alias p='parallel' alias pp='parallel --pipe -k' It is very convenient to create concurrency in this way. I only need to add a p after the pipeline, and I can create concurrency at any time to observe the response. For example seq 50 | p -n0 -q curl 'example.com' Make concurrent requests based on the number of your cores. -n0 means that the seq output is not passed as a parameter to the subsequent command. Gossip time: Xianglin Sao of GNU As a lover of free software gossip, every time I discover a new and interesting software, I always google the keyword Then I saw a complaint on hacker news, which basically said that every time you trigger the execution of parallel, a text will pop up telling you that if you use this tool for academic purposes (many life science-related people are using this tool), you have to cite his paper, otherwise you will pay him 10,000 euros. I learned a word from this, called Nagware, which refers specifically to software that nags you like Tang Seng to get you to pay. Although I think the article should be cited if it is really used, as this student said:
In addition, the author really likes others to cite his software, so much so that I also saw it in NEWS: Principle time Directly quote the author's answer on stackoverflow
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU: GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time: in conclusion This article mainly introduces a real parallel tool, explains its two main modes, gives a tip, and gossips about the unknown side of the GNU world. Hope it's useful for you. The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM. |
<<: MySQL 5.6.27 Installation Tutorial under Linux
>>: Angular framework detailed explanation of view abstract definition
Install FFmpeg flac eric@ray:~$ sudo apt install ...
When customizing the installation of software, yo...
A colleague once told me to use a temporary table...
I recently encountered a problem at work. There i...
Table of contents Updatable Views Performance of ...
Effect html <div class="sp-container"...
Table of contents 1. Introduction 2. Several ways...
①. How to use the alias (CNAME) record: In the do...
Restart all stopped Docker containers with one co...
Table of contents The principle and function of l...
This article records the installation and configu...
<br />The Internet is constantly changing, a...
Table of contents first step: The second step is ...
In the previous sections, we discussed aspects of ...
Table of contents 1. Introduction to Nginx 2. Ima...