> From: Benarson Behajaina [SMTP:Benarson.Behajaina@swh.sk]
>
> I reconfigured my Squid to deny access for non standard
> browers (GetRight, Wget, Lynx, GetSmart etc ...)
>
Why? Is your HTML that badly broken? Lynx is probably
more standard (in the sense of HTML/HTTP, etc.,
compliance) than Netscape 4.x, and is actually
reccommended on the squid home page!
The only real effect of discriminating against Lynx is
a lot of mail hostile to you on the lynx-dev mailing
list and an increase in the number of people who override
the User Agent. wget users have the same option.
If this is really an attempt to exclude crawlers (and
IE 4 is a crawler, although I think it changes its user
agent string when crawling), then I admit Lynx is weak in
not supporting robots.txt, but wget when actually crawling,
is certainly compliant - it is also used as a front for
other tools. You can force wget to be badly behaved, but
then you can write your own crawler or modify the source code
quite easily.
From what I've heard of IMDB's attempts to control crawlers
and pre-fetchers, the main problem is from ones that do
not identify themself in the user agent. IMDB analyse the
log files, presumably to look for the typical access patterns.
Most of these will not be configurable like the power users tools,
Lynx and wget, but will be typical Windows plug and play
shareware.
Also, some people suppress user agent in their proxies for
privacy reasons.
The first thing to do if you don't want to be crawled is to
make sure that you:
- have a policy that makes sense to the users;
- have a robots.txt file that accurately implements that policy
and is commented to explain the policy;
- explain the policy clearly in a way accessible to interactive
browsers.
Specifically with Lynx, you should donate code to support
robots.txt when operating in crawling mode. People may still
disable it, but those people will try to frustrate any attempt
you make to shift the balance in favour our your advertisers,
etc.
(Incidentally, someone is getting quite heavily flamed by
most of the Lynx developers at the moment for trying to
defeat IMDB's measures - most of the developers are sympathetic
to the wishes of content providers.)
Hope I've read correctly between the lines here.
Received on Fri Jul 23 1999 - 09:14:53 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:47:30 MST