2023年3月24日 星期五

[研究]wget 1.21.2 與 wget2 v1.99.1網站下載軟體安裝、試用 (Ubuntu 22.04.2 LTS)

[研究]wget 1.21.2 與 wget2 v1.99.1網站下載軟體安裝、試用 (Ubuntu 22.04.2 LTS)

2023-03-21

本篇CentOS應該通用。

GNU Wget2:是 GNU Wget 的下一代版本,支援多執行緒下載、HTTP/2、IPv6 等功能,是一款強大的命令列工具。

Wget - 維基百科,自由的百科全書
https://zh.wikipedia.org/wiki/Wget#Wget2

GNU Wget2 2.0.0 釋出於 2021 年 9 月 26 日。比起Wget1.x支援以下協定和技術:

  • HTTP/2
  • HTTP壓縮
  • 並列連接
  • 使用HTTP頭欄位If-Modified-Since
  • TCP Fast Open

官方網站:https://www.gnu.org/software/wget/

官方網站:https://gitlab.com/gnuwget/wget2

********************************************************************************

Rocky Linux 9.1 上安裝 wget2 太麻煩了,要用 tar.gz 自己編譯,看看 Ubuntu 是否容易些。

apt有提供 wget

user1@ubuntu22042:~$ sudo apt-cache policy wget
wget:
  Installed: 1.21.2-2ubuntu1
  Candidate: 1.21.2-2ubuntu1
  Version table:
 *** 1.21.2-2ubuntu1 500
        500 http://tw.archive.ubuntu.com/ubuntu jammy/main amd64 Packages
        100 /var/lib/dpkg/status
user1@ubuntu22042:~$ 


安裝

apt有提供 wget2

user1@ubuntu22042:~$ sudo apt-cache policy wget2
wget2:
  Installed: (none)
  Candidate: 1.99.1-2.2
  Version table:
     1.99.1-2.2 500
        500 http://tw.archive.ubuntu.com/ubuntu jammy/universe amd64 Packages
user1@ubuntu22042:~$ 

安裝


user1@ubuntu22042:~$ sudo apt-get -y install wget2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libwget0
The following NEW packages will be installed:
  libwget0 wget2
0 upgraded, 2 newly installed, 0 to remove and 1 not upgraded.
Need to get 239 kB of archives.
After this operation, 643 kB of additional disk space will be used.
Get:1 http://tw.archive.ubuntu.com/ubuntu jammy/universe amd64 libwget0 amd64 1.99.1-2.2 [143 kB]
Get:2 http://tw.archive.ubuntu.com/ubuntu jammy/universe amd64 wget2 amd64 1.99.1-2.2 [95.8 kB]
Fetched 239 kB in 0s (2739 kB/s) 
Selecting previously unselected package libwget0.
(Reading database ... 176153 files and directories currently installed.)
Preparing to unpack .../libwget0_1.99.1-2.2_amd64.deb ...
Unpacking libwget0 (1.99.1-2.2) ...
Selecting previously unselected package wget2.
Preparing to unpack .../wget2_1.99.1-2.2_amd64.deb ...
Unpacking wget2 (1.99.1-2.2) ...
Setting up libwget0 (1.99.1-2.2) ...
Setting up wget2 (1.99.1-2.2) ...
Processing triggers for man-db (2.10.2-1) ...
Processing triggers for libc-bin (2.35-0ubuntu3.1) ...
user1@ubuntu22042:~$ 
值

執行

user1@ubuntu22042:~$ wget2   
Nothing to do - goodbye
user1@ubuntu22042:~$ 

檢視版本

user1@ubuntu22042:~$ wget2 --version
GNU Wget2 1.99.1 - multithreaded metalink/file/website downloader

+digest +https +ssl/gnutls +ipv6 +iri +large-file +nls -ntlm -opie +psl +iconv +idn2 +zlib +lzma +brotlidec +bzip2 +http2 +gpgme 
user1@ubuntu22042:~$ 

檢視參數

user1@ubuntu22042:~$ wget2 --help
GNU Wget2 V1.99.1 - multithreaded metalink/file/website downloader

Usage: wget [options...] <url>...

Startup:
  -a  --append-output       File where messages are appended to, '-' for STDOUT
  -B  --base                Base for relative URLs read from input-file
                              or from command line
      --config              List of config files. (default: ~/.wget2rc)
  -d  --debug               Print debugging messages.(default: off)
  -e  --execute             Wget compatibility option, not needed for Wget
      --force-atom          Treat input file as Atom Feed. (default: off) (NEW!)
      --force-css           Treat input file as CSS. (default: off) (NEW!)
  -F  --force-html          Treat input file as HTML. (default: off)
      --force-metalink      Treat input file as Metalink. (default: off) (NEW!)
      --force-rss           Treat input file as RSS Feed. (default: off) (NEW!)
      --force-sitemap       Treat input file as Sitemap. (default: off) (NEW!)
      --fsync-policy        Use fsync() to wait for data being written to
                              the pysical layer. (default: off) (NEW!)
  -h  --help                Print this help.
      --input-encoding      Character encoding of the file contents read with
                              --input-file. (default: local encoding)
  -i  --input-file          File where URLs are read from, - for STDIN.
      --local-db            Read or load databases
  -o  --output-file         File where messages are printed to,
                              '-' for STDOUT.
  -q  --quiet               Print no messages except debugging messages.
                              (default: off)
      --stats-all           Print all stats (default: off)
                              Additional format supported:
                              --stats-all[=[FORMAT:]FILE]
      --stats-dns           Print DNS stats. (default: off)
                              Additional format supported:
                              --stats-dns[=[FORMAT:]FILE]
      --stats-ocsp          Print OCSP stats. (default: off)
                              Additional format supported:
                              --stats-ocsp[=[FORMAT:]FILE]
      --stats-server        Print server stats. (default: off)
                              Additional format supported:
                              --stats-server[=[FORMAT:]FILE]
      --stats-site          Print site stats. (default: off)
                              Additional format supported:
                              --stats-site[=[FORMAT:]FILE]
      --stats-tls           Print TLS stats. (default: off)
                              Additional format supported:
                              --stats-tls[=[FORMAT:]FILE]
  -v  --verbose             Print more messages. (default: on)
  -V  --version             Display the version of Wget and exit.

Download:
  -A  --accept              Comma-separated list of file name suffixes or
                              patterns.
      --accept-regex        Regex matching accepted URLs.
      --ask-password        Print prompt for password
      --backups             Make backups instead of overwriting/increasing
                              number. (default: 0)
      --bind-address        Bind to sockets to local address.
                              (default: automatic)
      --cache               Enabled using of server cache. (default: on)
      --chunk-size          Download large files in multithreaded chunks.
                              (default: 0 (=off)) Example:
                              wget --chunk-size=1M
      --clobber             Enable file clobbering. (default: on)
      --connect-timeout     Connect timeout in seconds.
      --content-on-error    Save response body even on error status.
                              (default: off)
  -c  --continue            Continue download for given files. (default: off)
  -k  --convert-links       Convert embedded URLs to local URLs.
                              (default: off)
      --cut-file-get-vars   Cut HTTP GET vars from file names. (default: off)
      --cut-url-get-vars    Cut HTTP GET vars from URLs. (default: off)
      --delete-after        Don't save downloaded files. (default: off)
      --dns-caching         Caching of domain name lookups. (default: on)
      --dns-timeout         DNS lookup timeout in seconds.
  -D  --domains             Comma-separated list of domains to follow.
      --exclude-domains     Comma-separated list of domains NOT to follow.
      --filter-mime-type    Specify a list of mime types to be saved or ignored
      --filter-urls         Apply the accept and reject filters on the URL
                              before starting a download. (default: off)
      --follow-tags         Scan additional tag/attributes for URLs,
                              e.g. --follow-tags="img/data-500px,img/data-hires
      --force-progress      Force progress bar.
                              (default: off)
      --http2-request-window
                            Max. number of parallel streams per HTTP/2
                              connection. (default: 30)
      --ignore-case         Ignore case when matching files. (default: off)
      --ignore-tags         Ignore tag/attributes for URL scanning,
                              e.g. --ignore-tags="img,a/href
  -4  --inet4-only          Use IPv4 connections only. (default: off)
  -6  --inet6-only          Use IPv6 connections only. (default: off)
      --iri                 Wget dummy option, you can't switch off
                              international support
  -l  --level               Maximum recursion depth. (default: 5)
      --local-encoding      Character encoding of environment and filenames.
      --max-redirect        Max. number of redirections to follow.
                              (default: 20)
      --max-threads         Max. concurrent download threads.
                              (default: 5) (NEW!)
  -m  --mirror              Turn on mirroring options -r -N -l inf
      --netrc               Load credentials from ~/.netrc if not given.
                              (default: on)
  -O  --output-document     File where downloaded content is written to,
                              '-O'  for STDOUT.
  -p  --page-requisites     Download all necessary files to display a
                              HTML page
      --parent              Ascend above parent directory. (default: on)
      --password            Password for Authentication.
                              (default: empty password)
      --post-data           Data to be sent in a POST request.
      --post-file           File with data to be sent in a POST request.
      --prefer-family       Prefer IPv4 or IPv6. (default: none)
      --progress            Type of progress bar (bar, dot, none).
                              (default: none)
      --proxy               Enable support for *_proxy environment variables.
                              (default: on)
      --random-wait         Wait 0.5 up to 1.5*<--wait> seconds between
                              downloads (per thread). (default: off)
      --read-timeout        Read and write timeout in seconds.
  -r  --recursive           Recursive download. (default: off)
      --regex-type          Regular expression type. Possible types are
                              posix or pcre. (default: posix)
  -R  --reject              Comma-separated list of file name suffixes or
                              patterns.
      --reject-regex        Regex matching rejected URLs.
      --remote-encoding     Character encoding of remote files
                              (if not specified in Content-Type HTTP header
                              or in document itself)
      --report-speed        Output bandwidth as TYPE. TYPE can be bytes
                              or bits. --progress MUST be used.
      --restrict-file-names
                            unix, windows, nocontrol, ascii, lowercase,
                              uppercase, none
      --robots              Respect robots.txt standard for recursive
                              downloads. (default: on)
  -S  --server-response     Print the server response headers. (default: off)
  -H  --span-hosts          Span hosts that were not given on the
                              command line. (default: off)
      --spider              Enable web spider mode. (default: off)
      --strict-comments     A dummy option. Parsing always works non-strict.
      --tcp-fastopen        Enable TCP Fast Open (TFO). (default: on)
  -T  --timeout             General network timeout in seconds.
  -N  --timestamping        Just retrieve younger files than the local ones.
                              (default: off)
  -t  --tries               Number of tries for each download. (default 20)
      --trust-server-names  On redirection use the server's filename.
                              (default: off)
      --use-askpass         Prompt for a user and password using
                              the specified command.
      --use-server-timestamps
                            Set local file's timestamp to server's timestamp.
                              (default: on)
      --user                Username for Authentication.
                              (default: empty username)
  -w  --wait                Wait number of seconds between downloads
                              (per thread). (default: 0)
      --waitretry           Wait up to number of seconds after error
                              (per thread). (default: 10)
      --xattr               Save extended file attributes. (default: on)

HTTP related options:
  -E  --adjust-extension    Append extension to saved file (.html or .css).
                              (default: off)
      --auth-no-challenge   send Basic HTTP Authentication before challenge
  -K  --backup-converted    When converting, keep the original file with
                              a .orig suffix. (default: off)
      --compression         Customize Accept-Encoding with
                              identity, gzip, deflate, xz, lzma, br, bzip2
                              and any combination of it
                              no-compression means no Accept-Encoding
      --content-disposition
                            Take filename from Content-Disposition.
                              (default: off)
      --cookie-suffixes     Load public suffixes from file. 
                              They prevent 'supercookie' vulnerabilities.
                              See https://publicsuffix.org/ for details.
      --cookies             Enable use of cookies. (default: on)
      --default-http-port   Set default port for HTTP. (default: 80)
      --default-https-port  Set default port for HTTPS. (default: 443)
      --default-page        Default file name. (default: index.html)
      --header              Insert input string as a HTTP header in
                              all requests
      --html-extension      Obsoleted by --adjust-extension
      --http-keep-alive     Keep connection open for further requests.
                              (default: on)
      --http-password       Password for HTTP Authentication.
                              (default: empty password)
      --http-proxy          Set HTTP proxy/proxies, overriding environment
                              variables. Use comma to separate proxies.
      --http-proxy-password
                            Password for HTTP Proxy Authentication.
                              (default: empty password)
      --http-proxy-user     Username for HTTP Proxy Authentication.
                              (default: empty username)
      --http-user           Username for HTTP Authentication.
                              (default: empty username)
      --keep-session-cookies
                            Also save session cookies. (default: off)
      --load-cookies        Load cookies from file.
      --metalink            Follow a metalink file instead of storing it
                              (default: on)
      --netrc-file          Set file for login/password to use instead of
                              ~/.netrc. (default: ~/.netrc)
  -Q  --quota               Download quota, 0 = no quota. (default: 0)
      --referer             Include Referer: url in HTTP request.
                              (default: off)
      --retry-connrefused   Consider "connection refused" a transient error.
                               (default: off)
      --save-cookies        Save cookies to file.
      --save-headers        Save the response headers in front of the response
                              data. (default: off)
  -U  --user-agent          HTTP User Agent string.
                              (default: wget)

HTTPS (SSL/TLS) related options:
      --ca-certificate      File with bundle of PEM CA certificates.
      --ca-directory        Directory with PEM CA certificates.
      --certificate         File with client certificate.
      --certificate-type    Certificate type: PEM or DER (known as ASN1).
                              (default: PEM)
      --check-certificate   Check the server's certificate. (default: on)
      --check-hostname      Check the server's certificate's hostname.
                              (default: on)
      --crl-file            File with PEM CRL certificates.
      --egd-file            File to be used as socket for random data from
                              Entropy Gathering Daemon.
      --gnutls-options      Custom GnuTLS priority string.
                              Interferes with --secure-protocol.
                              (default: none)
      --hpkp                Use HTTP Public Key Pinning (HPKP). (default: on)
      --hpkp-file           Set file for storing HPKP data
                              (default: ~/.wget-hpkp)
      --hsts                Use HTTP Strict Transport Security (HSTS).
                              (default: on)
      --hsts-file           Set file for HSTS caching. (default: ~/.wget-hsts)
      --http2               Use HTTP/2 protocol if possible. (default: on)
      --https-enforce       Use secure HTTPS instead of HTTP. Legal types are
                              'hard', 'soft' and 'none'.
                              If --https-only is enabled,
                              this option has no effect. (default: none)
      --https-only          Do not follow non-secure URLs. (default: off).
      --https-proxy         Set HTTPS proxy/proxies, overriding environment
                              variables. Use comma to separate proxies.
      --ocsp                Use OCSP server access to verify server's
                              certificate. (default: on)
      --ocsp-file           Set file for OCSP chaching.
                              (default: ~/.wget-ocsp)
      --ocsp-stapling       Use OCSP stapling to verify the server's
                              certificate. (default: on)
      --private-key         File with private key.
      --private-key-type    Type of the private key (PEM or DER).
                              (default: PEM)
      --random-file         File to be used as source of random data.
      --secure-protocol     Set protocol to be used (auto, SSLv3, TLSv1, PFS).
                              (default: auto). Or use GnuTLS priority
                              strings, e.g. NORMAL:-VERS-SSL3.0:-RSA
      --tls-false-start     Enable TLS False Start (needs GnuTLS 3.5+).
                              (default: on)
      --tls-resume          Enable TLS Session Resumption. (default: off)
      --tls-session-file    Set file for TLS Session caching.
                              (default: ~/.wget-session)

Directory options:
      --cut-dirs            Skip creating given number of directory
                              components. (default: 0)
      --directories         Create hierarchy of directories when retrieving
                              recursively. (default: on)
  -P  --directory-prefix    Set directory prefix.
  -x  --force-directories   Create hierarchy of directories when not
                              retrieving recursively. (default: off)
      --host-directories    Create host directories when retrieving
                              recursively. (default: on)
      --protocol-directories
                            Force creating protocol directories.
                              (default: off)

GPG related options:
      --gnupg-homedir       Specify a directory to use as the GnuPG home
                               directory. (default: gnupg default homedir)
      --signature-extensions
                            The extension of the signature file which should be
                              downloaded. (default: sig)
      --verify-save-failed  Save target files even when their signatures fail
                              GPG validation. (default: off)
  -s  --verify-sig          Download .sig file and verify. (default: off)

Plugin options:
      --list-plugins        Lists all the plugins in the plugin search paths.
      --local-plugin        Loads a plugin with a given path.
      --plugin              Load a plugin with a given name.
      --plugin-dirs         Specify alternative directories to look
                              for plugins, separated by ','
      --plugin-help         Print help message for all loaded plugins
      --plugin-opt          Forward an option to a loaded plugin.
                              The option should be in format:
                              <plugin_name>.<option>[=value]


Example boolean option:
 --quiet=no is the same as --no-quiet or --quiet=off or --quiet off
Example string option:
 --user-agent=SpecialAgent/1.3.5 or --user-agent "SpecialAgent/1.3.5"

To reset string options use --[no-]option

user1@ubuntu22042:~$ 

測試


wget2 --no-check-certificate --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains www.xxx.idv.tw --no-parent https://www.xxx.idv.tw/

過程有紅字

WARNING: The certificate is NOT trusted. The certificate issuer is unknown. 

不過仍可正常運作,中文名稱檔案有下載成功。


(完)

相關

[研究]wget 1.21.2 與 wget2 v1.99.1網站下載軟體安裝、試用 (Ubuntu 22.04.2 LTS)
https://shaurong.blogspot.com/2023/03/wget2-v1991-ubuntu-22042-lts.html

[研究]wget2 v2.0.1網站下載軟體安裝、試用 (Rocky Linux 9.1)
https://shaurong.blogspot.com/2023/03/wget2-v201-rocky-linux-91.html

[研究]wget 1.21.1 網站下載軟體安裝、測試 (Rocky Linux 9.1)
https://shaurong.blogspot.com/2023/03/wget-1211-rocky-linux-91.html

[研究]Wget for Windows 1.21.3試用
https://shaurong.blogspot.com/2023/03/wget-for-windows-1213.html


沒有留言:

張貼留言