How would you do it?
with wget, the only way of having it crawl through websites, is to
recurse... isn't it?
I tried screwing around, and the best I came up with was this:
>#!/bin/bash
>log="/var/log/squid3/access.log"
>
>while (true); do
> echo "reading started: `date`, log file: $log"
> sudo tail -n 80 $log | grep -P "/200 [0-9]+ GET" | grep "text/html" | awk '{print $7}' | wget -q -rp -nd -l 1 --delete-after -i -
> sleep 5
> echo
>done
It's not so clean...
On Tue, Oct 5, 2010 at 11:51 AM, John Doe <jdmls_at_yahoo.com> wrote:
>
> From: flaviane athayde <flavianeathayde_at_gmail.com>
>
> > I try to put a shell script that read the Squid log, and use it to run
> > wget with "-r -l1 -p" flag, but it also get its on pages, making a
> > infinit loop, and I can't resolve it.
>
> Why recurse?
> If you take your list from the log files, you will get all accessed files
> already... no?
>
> JD
>
>
>
Received on Tue Oct 05 2010 - 13:16:13 MDT
This archive was generated by hypermail 2.2.0 : Wed Oct 06 2010 - 12:00:02 MDT