A detailed introduction to wget command in Linux

A detailed introduction to wget command in Linux

Introduction: wget is a tool for downloading files in Linux. wget is an open source software developed under Linux. The author is Hrvoje Niksic and it was later ported to various platforms including Windows.

It is used from the command line. It is an indispensable tool for Linux users, especially for network administrators, who often need to download some software or restore backups from remote servers to local servers. If we use a virtual host, to handle such transactions we can only download it from the remote server to our computer disk first, and then upload it to the server using the FTP tool. This is a waste of time and energy, there is nothing you can do about it. With Linux VPS, it can be downloaded directly to the server without going through the uploading step. The wget tool is small in size but has complete functions. It supports breakpoint downloading, FTP and HTTP downloading methods, proxy servers and is easy to set up. Below we will explain how to use wget in the form of an example.

First install wget

[root@network test]# yum install -y wget

View Help Manual

[root@network test]# wget --help
GNU Wget 1.14, a non-interactive network file download tool.
Usage: wget [options]... [URL]...

Required arguments for long options are also required when using short options.

start up:
  -V, --version Display Wget version information and exit.
  -h, --help Print this help.
  -b, --background Go to background after startup.
  -e, --execute=COMMAND Run a ".wgetrc" style command.

Log and input files:
  -o, --output-file=FILE Write log information to FILE.
  -a, --append-output=FILE Append information to FILE.
  -d, --debug Print extensive debugging information.
  -q, --quiet Quiet mode (no output).
  -v, --verbose Verbose output (this is the default).
  -nv, --no-verbose Turn off verbose output, but do not enter quiet mode.
       --report-speed=TYPE Output bandwidth as TYPE. TYPE can be bits.
  -i, --input-file=FILE Download URLs from local or external FILE.
  -F, --force-html Treat input files as HTML files.
  -B, --base=URL Parse HTML input files relative to URLs (specified by the -i -F options).
       --config=FILE Specify config file to use.

download:
  -t, --tries=NUMBER Set the number of retries to NUMBER (0 means unlimited).
       --retry-connrefused Retry even if the connection is refused.
  -O, --output-document=FILE Write document to FILE.
  -nc, --no-clobber skip downloads that would download to
                                 existing files (overwriting them).
  -c, --continue Resume downloading files.
       --progress=TYPE Select the progress bar type.
  -N, --timestamping Only retrieve files that are newer than local files.
  --no-use-server-timestamps Do not use server timestamps to set local files.
  -S, --server-response Print server response.
       --spider Do not download any files.
  -T, --timeout=SECONDS Set all timeouts to SECONDS seconds.
       --dns-timeout=SECS Set DNS lookup timeout to SECS seconds.
       --connect-timeout=SECS Set the connection timeout to SECS seconds.
       --read-timeout=SECS Set read timeout to SECS seconds.
  -w, --wait=SECONDS Wait for SECONDS seconds.
       --waitretry=SECONDS Wait 1..SECONDS seconds between retries to get the file.
       --random-wait When getting multiple files, the random waiting interval is 0.5*WAIT...1.5*WAIT seconds each time.
       --no-proxy Disable the use of a proxy.
  -Q, --quota=NUMBER Set the get quota to NUMBER bytes.
       --bind-address=ADDRESS Bind to ADDRESS (hostname or IP) on the local host.
       --limit-rate=RATE Limit download rate to RATE.
       --no-dns-cache Disable DNS lookup caching.
       --restrict-file-names=OS Limit the characters in file names to those allowed by the OS.
       --ignore-case Ignore case when matching files/directories.
  -4, --inet4-only Connect to IPv4 addresses only.
  -6, --inet6-only Connect to IPv6 addresses only.
       --prefer-family=FAMILY First connect to addresses of the specified protocol FAMILY can be IPv6, IPv4 or none.
       --user=USER Set the username for both ftp and http to USER.
       --password=PASS Set the password for both ftp and http to PASS.
       --ask-password Prompt for a password.
       --no-iri disable IRI support.
       --local-encoding=ENC IRI (Internationalized Resource Identifier) ​​Use ENC as the local encoding.
       --remote-encoding=ENC Use ENC as default remote encoding.
       --unlink remove file before clobber.

Table of contents:
  -nd, --no-directories Do not create directories.
  -x, --force-directories Force creation of directories.
  -nH, --no-host-directories Do not create home directories.
       --protocol-directories Use protocol names in directories.
  -P, --directory-prefix=PREFIX Save files as PREFIX/... --cut-dirs=NUMBER Ignore NUMBER directory levels in the remote directory.

HTTP options:
       --http-user=USER Set http user name to USER.
       --http-password=PASS Set http password to PASS.
       --no-cache Do not cache data on the server.
       --default-page=NAME Change the default page (the default page is usually "index.html").
  -E, --adjust-extension Save HTML/CSS documents with appropriate extensions.
       --ignore-length Ignore the 'Content-Length' header field.
       --header=STRING Insert STRING in the header.
       --max-redirect Maximum redirects allowed per page.
       --proxy-user=USER Use USER as the proxy username.
       --proxy-password=PASS Use PASS as proxy password.
       --referer=URL Include 'Referer: URL' in the HTTP request header.
       --save-headers Save HTTP headers to file.
  -U, --user-agent=AGENT Identify as AGENT instead of Wget/VERSION.
       --no-http-keep-alive Disable HTTP keep-alive (persistent connections).
       --no-cookies Do not use cookies.
       --load-cookies=FILE Load cookies from FILE before the session starts.
       --save-cookies=FILE Save cookies to FILE after the session ends.
       --keep-session-cookies Load and save session (non-persistent) cookies.
       --post-data=STRING Use POST method; send STRING as data.
       --post-file=FILE Use POST method; send FILE content.
       --content-disposition Allow Content-Disposition header when local filename is selected (experimental).
       --content-on-error outputs the received content on server errors.
       --auth-no-challenge Send basic HTTP authentication without a first wait server challenge.

HTTPS (SSL/TLS) options:
       --secure-protocol=PR choose secure protocol, one of auto, SSLv2,
                                SSLv3, TLSv1, TLSv1_1 and TLSv1_2.
       --no-check-certificate Do not verify the server's certificate.
       --certificate=FILE Client certificate file.
       --certificate-type=TYPE Client certificate type, PEM or DER.
       --private-key=FILE Private key file.
       --private-key-type=TYPE Private key file type, PEM or DER.
       --ca-certificate=FILE File with a set of CA certificates.
       --ca-directory=DIR Directory to store the list of hashes of CA certificates.
       --random-file=FILE File with random data for generating the SSL PRNG.
       --egd-file=FILE Name of the file for the EGD socket with random data.

FTP options:
       --ftp-user=USER Set the ftp user name to USER.
       --ftp-password=PASS Set ftp password to PASS.
       --no-remove-listing Do not remove '.listing' files.
       --no-glob Do not use wildcard expansion in FTP file names.
       --no-passive-ftp Disable "passive" transfer mode.
       --preserve-permissions Preserve permissions on remote files.
       --retr-symlinks When recursing directories, get linked files (not directories).

WARC options:
       --warc-file=FILENAME save request/response data to a .warc.gz file.
       --warc-header=STRING insert STRING into the warcinfo record.
       --warc-max-size=NUMBER set maximum size of WARC files to NUMBER.
       --warc-cdx write CDX index files.
       --warc-dedup=FILENAME do not store records listed in this CDX file.
       --no-warc-compression do not compress WARC files with GZIP.
       --no-warc-digests do not calculate SHA1 digests.
       --no-warc-keep-log do not store the log file in a WARC record.
       --warc-tempdir=DIRECTORY location for temporary files created by the
                                 WARC writer.

Recursive download:
  -r, --recursive specifies recursive download.
  -l, --level=NUMBER Maximum recursion depth (inf or 0 means unlimited, i.e. download everything).
       --delete-after Delete local files after downloading is complete.
  -k, --convert-links Make links in downloaded HTML or CSS point to local files.
  --backups=N before writing file X, rotate up to N backup files.
  -K, --backup-converted Back up file X as X.orig before converting it.
  Short form of -m, --mirror -N -r -l inf --no-remove-listing.
  -p, --page-requisites Download all elements such as images used to display HTML pages.
       --strict-comments Process HTML comments in strict mode (SGML).

Recursive accept/reject:
  -A, --accept=LIST Comma separated list of acceptable extensions.
  -R, --reject=LIST Comma-separated list of extensions to reject.
       --accept-regex=REGEX regex matching accepted URLs.
       --reject-regex=REGEX regex matching rejected URLs.
       --regex-type=TYPE regex type (posix|pcre).
  -D, --domains=LIST Comma separated list of acceptable domains.
       --exclude-domains=LIST Comma-separated list of domains to exclude.
       --follow-ftp Follow FTP links in HTML documents.
       --follow-tags=LIST Comma separated list of HTML tags to follow.
       --ignore-tags=LIST Comma separated list of HTML tags to ignore.
  -H, --span-hosts Spread over external hosts when recursing.
  -L, --relative Follow only relative links.
  -I, --include-directories=LIST List of allowed directories.
  --trust-server-names use the name specified by the redirection
                                   url last component.
  -X, --exclude-directories=LIST exclude list of directories.
  -np, --no-parent Do not trace back to parent directories.

1. Download a single file using wget

The following example downloads a file from the network and saves it in the current directory

During the download process, a progress bar will be displayed, including (download completion percentage, bytes downloaded, current download speed, and remaining download time).

wget http://cn.wordpress.org/wordpress-4.9.4-zh_CN.tar.gz

2. Download using wget -O and save with a different file name

[root@network test]# wget https://cn.wordpress.org/wordpress-4.9.4-zh_CN.tar.gz
[root@network test]# ls
wordpress-4.9.4-zh_CN.tar.gz

We can use the -O parameter to specify a file name:

wget -O wordpress.tar.gz http://cn.wordpress.org/wordpress-4.9.4-zh_CN.tar.gz
wordpress.tar.gz

3. Use wget -c to resume downloading

Use wget -c to restart an interrupted download:

This is very helpful when we are downloading large files and suddenly it is interrupted due to network or other reasons. We can continue downloading instead of downloading a file again.

wget -c https://cn.wordpress.org/wordpress-4.9.4-zh_CN.tar.gz

4. Use wget -b to download in the background

When downloading very large files, we can use the parameter -b to download in the background

[root@network test]# wget -b https://cn.wordpress.org/wordpress-4.9.4-zh_CN.tar.gz
Continue to run in the background with pid 1463.
Will write output to 'wget-log'.

You can use the following command to check the download progress:

[root@network test]# tail -f wget-log
  8550K .......... .......... .......... .......... .......... 96% 814K 0s
  8600K .......... .......... .......... .......... .......... 97% 9.53M 0s
  8650K .......... .......... .......... .......... .......... 98% 86.8M 0s
  8700K .......... .......... .......... .......... .......... 98% 145M 0s
  8750K .......... .......... .......... .......... .......... 99% 67.4M 0s
  8800K .......... .......... .......... .......... .......... 99% 107M 0s
  8850K .......... ......... 100% 1.95M=16s

2018-11-10 15:39:07 (564 KB/s) - Saved "wordpress-4.9.4-zh_CN.tar.gz.2" [9082696/9082696])

5. Download by camouflaged proxy name

Some websites can deny your download request by judging that the proxy name is not a browser. However, you can disguise it by using –user-agent parameter.

6. Use wget –spider to test the download link

When you plan to schedule downloads, you should test whether the download link is valid at the scheduled time. We can add the –spider parameter to check.

wget –spider URL
If the download link is correct, wget –spider URL will be displayed.
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response… 200 OK
Length: unspecified [text/html]
Remote file exists and could contain further links,
but recursion is disabled — not retrieving.
This ensures that the download will be carried out at the scheduled time, but when you give a wrong link, it will display the following error wget –spider url
Spider mode enabled. Check if remote file exists.
HTTP request sent, awaiting response… 404 Not Found
Remote file does not exist — broken link!!!

You can use spider parameters in the following situations:

  • Check before scheduled download
  • Interval detection of website availability
  • Check website pages for dead links

7. Use wget –tries to increase the number of retries

It may also fail if there is a problem with the network or if the download is a large file. By default, wget retries 20 times to connect and download files. If needed, you can increase the number of retries using –tries .

wget –tries=40 URL

8. Download multiple files using wget -i

First, save a download link file cat > filelist.txt
url1
url2
url3
url4
Then use this file and parameter -i to download wget -i filelist.txt

9. Use wget –mirror to mirror the website

wget --mirror --convert-links --adjust-extension --page-requisites 
--no-parent http://example.org

or

wget -mkEpnp http://example.org
  • --mirror – recursively download all resources under a given website
  • --convert-links – Convert absolute links to relative links
  • --adjust-extension – adjust the file name according to the Content-Type and add the appropriate file extension
  • --page-requisites – Download other dependent CSS, Javascript, Image and other resources
  • --no-parent – ​​Do not download parent directory resources

10. Use wget –reject to filter the specified format download

You want to download a website but you don’t want to download images, you can use the following command.

wget –reject=gif url

11. Use wget -o to save download information into a log file

If you don't want the download information to be displayed directly in the terminal but in a log file, you can use the following command:

wget -o download.log URL

Example

Use wget -O to download and save with a different file name (-O: download the file to the corresponding directory and modify the file name)

wget -O wordpress.zip http://www.minjieren.com/download.aspx?id=1080 

Use wget -b to download in the background

wget -b http://www.minjieren.com/wordpress-3.1-zh_CN.zip

Note: You can use the following command to view the download progress: tail -f wget-log

Use -spider : simulate downloading, will not download, just check whether the website is good

[root@localhost ~]# wget --spider www.baidu.com #Do not download any files 

Simulate downloading print server response

[root@localhost ~]# wget -S www.baidu.com # Print server response 

Set the specified number of times

[root@localhost ~]# wget -r --tries=2 www.baidu.com (specify 2 attempts, no more attempts after 2 attempts)
[root@localhost ~]# wget -r --tries=2 -q www.baidu.com (specify attempts and do not print intermediate results) 

The above is the full content of this article. I hope it will be helpful for everyone’s study. I also hope that everyone will support 123WORDPRESS.COM.

You may also be interested in:
  • Basic usage of wget command under Linux
  • Detailed explanation of wget command in Linux
  • Detailed Introduction to wget Command in Linux
  • Detailed explanation of wget command in Linux

<<:  Implementation of MySQL index-based stress testing

>>:  Detailed explanation of Vue data proxy

Recommend

Why is your like statement not indexed?

Preface This article aims to explain the most bor...

What are the new CSS :where and :is pseudo-class functions?

What are :is and :where? :is() and :where() are p...

Django+mysql configuration and simple operation database example code

Step 1: Download the mysql driver cmd enters the ...

HTML implements read-only text box and cannot modify the content

Without further ado, I will post the code for you...

Detailed summary of web form submission methods

Let's first look at several ways to submit a ...

HTML discount price calculation implementation principle and script code

Copy code The code is as follows: <!DOCTYPE HT...

Server concurrency estimation formula and calculation method

Recently, I need to stress test the server again....

How to customize an EventEmitter in node.js

Table of contents Preface 1. What is 2. How to us...

The role of MySQL 8's new feature window functions

New features in MySQL 8.0 include: Full out-of-th...

Tips for importing csv, excel or sql files into MySQL

1. Import csv file Use the following command: 1.m...

Detailed example code of mysql batch insert loop

background A few days ago, when I was doing pagin...

MySQL download and installation details graphic tutorial

1. To download the MySQL database, visit the offi...

Detailed steps to install MySQL 5.7 via YUM on CentOS7

1. Go to the location where you want to store the...

How to use JSZip compression in CocosCreator

CocosCreator version: 2.4.2 Practical project app...