Specific use of GNU Parallel

Specific use of GNU Parallel

what is it?

GNU Parallel is a shell tool for executing computing tasks in parallel on one or more computers. A computing task can be a shell command or a script program with each line as input. Typical inputs are lists of files, hosts, users, URLs, or tables; a computation task can also be a command read from a pipe. GNU Parallel divides the input into chunks and executes them in parallel through pipelines.

If you know how to use the xargs and tee commands, you will find GNU Parallel very easy to use because GNU Parallel has the same options as xargs. GNU Parallel can replace most shell loops and complete computing tasks faster in a parallel manner.

GNU Parallel ensures that its output is the same as when the computing task is executed sequentially, so that the output of GNU Parallel can be conveniently used as the input of other programs.

For each line of input, GNU Parallel will run the specified command using that line as an argument. If no command is given, the line is executed as a command. Multiple input lines will be run in parallel. GNU Parallel is often used as a replacement for xargs or cat | bash.

guide

This tutorial demonstrates most of GNU Parallel's capabilities. This is intended to introduce an option in GNU Parallel rather than to provide examples of real-world use. Spend an hour following this tutorial and you'll fall in love with the command line.

preparation

To follow the examples in this tutorial, you first need to do the following:

parallel >= version 20130814

Install the latest version:

(wget -O - pi.dk/3 || curl pi.dk/3/) | bash

This command will also install the latest version of the guide

man parallel_tutorial

Most of this tutorial is also compatible with older versions.

abc-file

Makefile:

parallel -k echo ::: ABC > abc-file

def-file

Makefile:

parallel -k echo :::DEF > def-file

abc0-file

Makefile:

perl -e 'printf "A\0B\0C\0"' > abc0-file

abc_-file

Makefile:

perl -e 'printf "A_B_C_"' > abc_-file

tsv_file.tsv

Makefile:

perl -e 'printf "f1\tf2\nA\tB\nC\tD\n"' > tsv-file.tsv

num30000

Makefile:

perl -e 'for(1..30000){print "$_\n"}' > num30000

num1000000

Makefile:

perl -e 'for(1..1000000){print "$_\n"}' > num1000000

num_%header

Makefile:

(echo %head1; echo %head2; perl -e 'for(1..10){print "$_\n"}') > num_%header

Remote execution: ssh password-free login to $SERVER1 and $SERVER2

Makefile:

SERVER1=server.example.com
SERVER2=server2.example.net

Finally, the following command should run successfully:

ssh $SERVER1 echo works
ssh $SERVER2 echo works

Use ssh-keygen -t dsa; ssh-copy-id $SERVER1 to create the environment (use empty pass phrase)

Input Source

GNU Parallel's input sources support files, command lines, and standard input (stdin or pipe)

Single input source

Read input from the command line:

parallel echo ::: ABC

Output (order may vary since tasks are executed in parallel):

A
B
C

File as input source:

parallel -a abc-file echo

The output is the same as above.

STDIN (standard input) as input source:

cat abc-file | parallel echo

The output is the same as above.

Multiple input sources

GNU Parallel supports specifying multiple input sources on the command line, and it will generate all possible combinations:

parallel echo ::: ABC ::: DEF

Output:

AD
AE
AF
BD
BE
BF
CD
CE
CF

Multiple files as input sources:

parallel -a abc-file -a def-file echo

The output is the same as above.

STDIN (standard input) can be used as one of the input sources, using "-":

cat abc-file | parallel -a - -a def-file echo

The output is the same as above.

You can use "::::" instead of -a:

cat abc-file | parallel echo :::: - def-file

The output is the same as above.

::: and :::: can be mixed:

parallel echo ::: ABC :::: def-file

The output is the same as above.

Adaptation parameters

–xapply takes one argument from each input source:

parallel --xapply echo :::ABC :::DEF

Output:

AD
BE
CF

If one of the input sources is shorter, its value will be repeated:

parallel --xapply echo ::: ABCDE ::: FG

Output:

AF
BG
CF
DG
EF

Changing the parameter separator

GNU Parallel allows you to specify separators instead of ::: or ::::, which is particularly useful when these two symbols are occupied by other commands:

parallel --arg-sep ,, echo ,, ABC :::: def-file

Output:

AD
AE
AF
BD
BE
BF
CD
CE
CF

To change the parameter separator:

parallel --arg-file-sep // echo ::: ABC // def-file

The output is the same as above.

Changing parameter delimiters

By default, GNU Parallel treats one line as one parameter: it uses \n as the parameter delimiter. You can use -d to change:

parallel -d _ echo :::: abc_-file

Output:

A
B
C

\0 represents NULL:

parallel -d '\0' echo :::: abc0-file

The output is the same as above.

-0 is short for -d '\0' (commonly used to read input from find ... -print0):

parallel -0 echo ::::abc0-file

The output is the same as above.

End value in input source

GNU Parallel supports specifying a value as the end mark:

parallel -E stop echo ::: AB stop CD

Output:

A
B

Skip empty lines

Use --no-run-if-empty to skip empty lines:

(echo 1; echo; echo 2) | parallel --no-run-if-empty echo

Output:

1
2

Build command line

No command is specified which means the argument is the command.

If no command is given after parallel, then these arguments are treated as commands:

parallel ::: ls 'echo foo' pwd

Output:

[Current file list]
foo
[path to current working directory]

The command can be a script file, a binary executable file or a bash function (the function must be exported using export -f):

# Only works in Bash and only if $SHELL=.../bash
my_func() {
 echo in my_func $1
}
export -f my_func
parallel my_func ::: 1 2 3

Output:

in my_func 1
in my_func 2
in my_func 3

Replace String

5 types of replacement strings

GNU Parallel supports a variety of replacement strings. By default, {} is used:

parallel echo ::: A/BC

Output:

A/BC

Specify {}:

parallel echo {} ::: A/BC

Output is the same as above

Remove the extension {.}:

parallel echo {.} ::: A/BC

Output

A/B

Remove the path {/}:

parallel echo {/} ::: A/BC

Output:

BC

Keep only the path {//}:

parallel echo {//} ::: A/BC

Output:

A

Remove the path and extension {/.}:

parallel echo {/.} ::: A/BC

Output:

B

Output task number:

parallel echo {#} ::: A/BC

Output:

1
2
3

Change the replacement string

Use -I to change the replacement string to {}:

parallel -I ,, echo ,, ::: A/BC

Output:

A/BC

--extensionreplace replace {.}:

parallel --extensionreplace ,, echo ,, ::: A/BC

Output:

A/B

–basenamereplace replaces {/}:

parallel --basenamereplace ,, echo ,, ::: A/BC

Output:

BC

--dirnamereplace replace {//}:

parallel --dirnamereplace ,, echo ,, ::: A/BC

Output:

A

–basenameextensionreplace replace {/.}:

parallel --basenameextensionreplace ,, echo ,, ::: A/BC

Output:

B

–seqreplace replaces {#}:

parallel --seqreplace ,, echo ,, ::: ABC

Output:

1
2
3

Replace the string at the specified position

If there are multiple input sources, you can specify the parameters of a certain input source through {number}:

parallel echo {1} and {2} ::: AB ::: CD

Output:

A and C
A and D
B and C
B and D

You can use / // /. and .: to change the specified replacement string:

parallel echo /={1/} //={1//} /.={1/.} .={1.} ::: A/BC D/EF

Output:

/=BC //=A /.=B .=A/B
/=EF //=D /.=E .=D/E

Positions can be negative, indicating counting backwards:

parallel echo 1={1} 2={2} 3={3} -1={-1} -2={-2} -3={-3} ::: AB ::: CD ::: EF

Output:

1=A 2=C 3=E -1=E -2=C -3=A
1=A 2=C 3=F -1=F -2=C -3=A
1=A 2=D 3=E -1=E -2=D -3=A
1=A 2=D 3=F -1=F -2=D -3=A
1=B 2=C 3=E -1=E -2=C -3=B
1=B 2=C 3=F -1=F -2=C -3=B
1=B 2=D 3=E -1=E -2=D -3=B
1=B 2=D 3=F -1=F -2=D -3=B

Input by columns

Use --colsep to split the lines in the file into columns as input parameter. The following uses TAB (\t):

1=f1 2=f2
1=A 2=B
1=C 2=D

Specify parameter name

Use --header to use the first value in each line of input as the parameter name:

parallel --header : echo f1={f1} f2={f2} ::: f1 AB ::: f2 CD

Output:

f1=A f2=C
f1=A f2=D
f1=B f2=C
f1=B f2=D

Use --colsep to process files that use TAB as delimiters:

parallel --header : --colsep '\t' echo f1={f1} f2={f2} :::: tsv-file.tsv

Output:

f1=A f2=B
f1=C f2=D

Multi-parameter

–xargs enables GNU Parallel to support multiple arguments per line (with an upper limit):

cat num30000 | parallel --xargs echo | wc -l

Output:

2

The 30,000 parameters are divided into two lines.

The upper limit on the number of parameters in a line is specified with -s. The following specifies that the maximum length is 10000, which will be divided into 17 lines:

cat num30000 | parallel --xargs -s 10000 echo | wc -l

To achieve better concurrency, GNU Parallel distributes the parameters after the file is finished reading.

GNU Parallel starts the second task only after reading the last parameter, at which time it will evenly distribute all the parameters to the four tasks (if four tasks are specified).

The first task is the same as the above example using –xargs, but the second task will be evenly divided into 4 tasks, for a total of 5 tasks.

cat num30000 | parallel --jobs 4 -m echo | wc -l

Output:

5

The 10-point parameter is assigned to 4 tasks for a clearer view:

parallel --jobs 4 -m echo ::: {1..10}

Output:

1 2 3
4 5 6
7 8 9
10

The replacement string can be part of a word. Experience the difference between -m and -X through the following two commands:

parallel --jobs 4 -m echo pre-{}-post ::: ABCDEFG

Output:

pre-A B-post
pre-C D-post
pre-E F-post
pre-G-post

-X is the opposite of -m:

parallel --jobs 4 -X echo pre-{}-post ::: ABCDEFG


Output:

pre-A-post pre-B-post
pre-C-post pre-D-post
pre-E-post pre-F-post
pre-G-post

Use -N to limit the number of parameters per line:

parallel -N3 echo :::ABCDEFGH

Output:

ABC
DEF
GH

-N can also be used to specify a position to replace a string:

parallel -N3 echo 1={1} 2={2} 3={3} ::: ABCDEFGH

Output:

1=A 2=B 3=C
1=D 2=E 3=F
1=G 2=H 3=

-N0 reads only one argument, but does not append:

parallel -N0 echo foo ::: 1 2 3

Output:

foo
foo
foo

References

If the command line contains special characters, it needs to be protected by quotation marks.

The perl script 'print "@ARGV\n"' does the same thing as linux's echo.

perl -e 'print "@ARGV\n"' A

Output:

A

When running this command using GNU Parallel, the perl command needs to be enclosed in quotes:

parallel perl -e 'print "@ARGV\n"' ::: This won't work

Output:

[Nothing]

Use -q to protect the perl command:

parallel -q perl -e 'print "@ARGV\n"' ::: This works

Output:

This
works

You can also use ':

parallel perl -e \''print "@ARGV\n"'\' ::: This works, too

Output:

This
works,
too

Using -quote:

parallel --shellquote
parallel: Warning: Input is read from the terminal. Only experts do this on purpose. Press CTRL-D to exit.
perl -e 'print "@ARGV\n"'
[CTRL-D]

Output:

perl\ -e\ \'print\ \"@ARGV\\n\"\'

You can also use the command:

parallel perl\ -e\ \'print\ \"@ARGV\\n\"\' ::: This also works

Output:

This
also
works

Remove spaces

Use --trim to remove spaces between the arguments:

parallel --trim r echo pre-{}-post ::: ' A '

Output:

pre- A-post

Remove the spaces on the left:

parallel --trim l echo pre-{}-post ::: ' A '

Output:

pre-A-post

Remove the spaces on both sides:

parallel --trim lr echo pre-{}-post ::: ' A '

Output:

pre-A-post

Control output

Use the parameter as output prefix:

parallel --tag echo foo-{} ::: ABC

Output:

A foo-A
B foo-B
C foo-C

Modify the output prefix –tagstring:

parallel --tagstring {}-bar echo foo-{} ::: ABC

Output:

A-bar foo-A

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Specific use of GNU Parallel
  • How to use multi-core CPU to speed up your Linux commands (GNU Parallel)
  • 15-minute parallel artifact GNU Parallel Getting Started Guide

<<:  Django+mysql configuration and simple operation database example code

>>:  Sample code for using js to implement Ajax concurrent requests to limit the number of requests

Recommend

Example of automatic import method of vue3.0 common components

1. Prerequisites We use the require.context metho...

Pagination Examples and Good Practices

<br />Structure and hierarchy reduce complex...

JavaScript code to achieve a simple calendar effect

This article shares the specific code for JavaScr...

MySQL View Principle Analysis

Table of contents Updatable Views Performance of ...

How to prohibit vsftpd users from logging in through ssh

Preface vsftp is an easy-to-use and secure ftp se...

Introduction to JavaScript built-in objects

Table of contents 1. Built-in objects 2. Math Obj...

Summary of common Nginx techniques and examples

1. Priority of multiple servers For example, if e...

Examples of correct judgment methods for data types in JS

Table of contents Preface Can typeof correctly de...

Should I abandon JQuery?

Table of contents Preface What to use if not jQue...

Detailed explanation of the functions and usage of MySQL common storage engines

This article uses examples to illustrate the func...

Implementation of socket options in Linux network programming

Socket option function Function: Methods used to ...

Implementing timed page refresh or redirect based on meta

Use meta to implement timed refresh or jump of th...

Detailed explanation of basic interaction of javascript

Table of contents 1. How to obtain elements Get i...

MySQL configuration SSL master-slave replication

MySQL5.6 How to create SSL files Official documen...