Active1 month ago
There is an online HTTP directory that I have access to. I have tried to download all sub-directories and files via
wget
. But, the problem is that when wget
downloads sub-directories it downloads the index.html
file which contains the list of files in that directory without downloading the files themselves.Pullman Yangon Centrepoint features a restaurant, outdoor swimming pool, a fitness center and bar in Yangon. Each room at the 5-star hotel has river views and free WiFi. Just beautiful in every way, very clean. The staff are extremely attentive and all speak very good English which makes the experience perfect.
Is there a way to download the sub-directories and files without depth limit (as if the directory I want to download is just a folder which I want to copy to my computer).
OmarOmar1,20633 gold badges1010 silver badges2121 bronze badges
7 Answers
Solution:
Explanation:
- It will download all files and subfolders in ddd directory
-r
: recursively-np
: not going to upper directories, like ccc/…-nH
: not saving files to hostname folder--cut-dirs=3
: but saving it to ddd by omittingfirst 3 folders aaa, bbb, ccc-R index.html
: excluding index.htmlfiles
Reference: http://bmwieczorek.wordpress.com/2008/10/01/wget-recursively-download-all-files-from-certain-directory-listed-by-apache/
Mingjiang ShiMingjiang Shi4,43011 gold badge2121 silver badges2929 bronze badges
I was able to get this to work thanks to this post utilizing VisualWGet. It worked great for me. The important part seems to be to check the
-recursive
flag (see image). Also found that the
mateuscbmateuscb-no-parent
flag is important, othewise it will try to download everything.5,91533 gold badges4141 silver badges6666 bronze badges
From
man wget
‘-r’‘--recursive’Turn on recursive retrieving. See Recursive Download, for more details. The default maximum depth is 5.
‘-np’‘--no-parent’Do not ever ascend to the parent directory when retrieving recursively. This is a useful option, since it guarantees that only the files below a certain hierarchy will be downloaded. See Directory-Based Limits, for more details.
‘-nH’‘--no-host-directories’Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
‘--cut-dirs=number’Ignore number directory components. This is useful for getting a fine-grained control over the directory where recursive retrieval will be saved.
Take, for example, the directory at ‘ftp://ftp.xemacs.org/pub/xemacs/’. If you retrieve it with ‘-r’, it will be saved locally under ftp.xemacs.org/pub/xemacs/. While the ‘-nH’ option can remove the ftp.xemacs.org/ part, you are still stuck with pub/xemacs. This is where ‘--cut-dirs’ comes in handy; it makes Wget not “see” number remote directory components. Here are several examples of how ‘--cut-dirs’ option works.
No options -> ftp.xemacs.org/pub/xemacs/-nH -> pub/xemacs/-nH --cut-dirs=1 -> xemacs/-nH --cut-dirs=2 -> .
--cut-dirs=1 -> ftp.xemacs.org/xemacs/...If you just want to get rid of the directory structure, this option is similar to a combination of ‘-nd’ and ‘-P’. However, unlike ‘-nd’, ‘--cut-dirs’ does not lose with subdirectories—for instance, with ‘-nH --cut-dirs=1’, a beta/ subdirectory will be placed to xemacs/beta, as one would expect.
Ryan RYellow Pages Yangon
4,8881212 gold badges6565 silver badges101101 bronze badges
Natalie NgNatalie Ng
wget
is an invaluable resource and something I use myself. However sometimes there are characters in the address that wget
identifies as syntax errors. I'm sure there is a fix for that, but as this question did not ask specifically about wget
I thought I would offer an alternative for those people who will undoubtedly stumble upon this page looking for a quick fix with no learning curve required.There are a few browser extensions that can do this, but most require installing download managers, which aren't always free, tend to be an eyesore, and use a lot of resources. Heres one that has none of these drawbacks:
'Download Master' is an extension for Google Chrome that works great for downloading from directories. You can choose to filter which file-types to download, or download the entire directory.
For an up-to-date feature list and other information, visit the project page on the developer's blog:
Peter1,36322 gold badges1414 silver badges2626 bronze badges
MoscardaMoscarda
(only usable if you don't need recursive deptch)
Use bookmarklet. Drag this link in bookmarks, then edit and paste this code:
and go on page (from where you want to download files), and click that bookmarklet.
T.ToduaT.Todua34.7k1212 gold badges153153 silver badges151151 bronze badges
You can use this Firefox addon to download all files in HTTP Directory.
Rushikesh TadeRushikesh Tade
wget generally works in this way, but some sites may have problems and it may create too many unnecessary html files. In order to make this work easier and to prevent unnecessary file creation, I am sharing my getwebfolder script, which is the first linux script I wrote for myself. This script downloads all content of a web folder entered as parameter.
When you try to download an open web folder by wget which contains more then one file, wget downloads a file named index.html. This file contains a file list of the web folder. My script converts file names written in index.html file to web addresses and downloads them clearly with wget.
Tested at Ubuntu 18.04 and Kali Linux, It may work at other distros as well.
Usage :
- extract getwebfolder file from zip file provided below
chmod +x getwebfolder
(only for first time)./getwebfolder webfolder_URL
such as
./getwebfolder http://example.com/example_folder/
Yangon Directory Yellow Pages
Byte BitterByte Bitter
protected by Jack BashfordJun 6 at 22:31
Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).
Would you like to answer one of these unanswered questions instead?
Would you like to answer one of these unanswered questions instead?