[lug] wget page-requisites
csmcdermott at gmail.com
Wed Jan 12 13:57:09 MST 2011
On Wed, Jan 12, 2011 at 1:05 PM, Davide Del Vento <
davide.del.vento at gmail.com> wrote:
> Thanks. This solves the simple single-page example, but of course life
> is always harder than simple examples. My actual wget is doing
> --mirror of the whole domain and adding the --span-hosts mess that
> What I want is a --span-host that works only for the --page-requisites
> and not for the recursion. It doesn't seem like a weird request at
> all, I want the pages that I am downloading to be complete with their
> requisites (images) even if they are hosted somewhere else, but I
> don't want to recurse the whole web (as it happens if I do a
> span-host). Any ideas?
> I guess I could count the deepest level of the domain I am mirroring,
> and use that as recursion level instead of the infinite that mirror
> uses. But if I get that wrong, I don't mirror the whole site. And then
> I have to continuously maintain that number, which is a pain. And
> then, even if not the whole internet-for-sure I am still downloading
> the world and his dog. This must be possible, isn't it?
> Using curl or anything else instead of wget is an option, if they are
> more flexible than wget.
Well, your other option is to use "--domains=" to restrict recursion to just
the comma-separated list of domains specified. Or "--exclude-domains=" if
that's easier. But that's not a huge improvement either. I agree it's
annoying. For what it's worth, this is from the man page:
Actually, to download a single page and all its requisites (even if they
> exist on separate websites), and make sure the lot displays properly
> locally, this author likes to use a few options in addition to *-p*:
> wget -E -H -k -K -p http://<site>/<document>
Not sure if that gets you closer to where you want to be...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the LUG