On Tue, Oct 26, 2010 at 10:15 PM, Amos Jeffries <squid3_at_treenet.co.nz> wrote:
> On Tue, 26 Oct 2010 16:34:52 -0400, alexus <alexus_at_gmail.com> wrote:
>> On Mon, Oct 25, 2010 at 6:38 PM, Amos Jeffries <squid3_at_treenet.co.nz>
>> wrote:
>>> On Mon, 25 Oct 2010 12:38:49 -0400, alexus <alexus_at_gmail.com> wrote:
>>>> is there a way to disallow serving of pages based on browser (agent)?
>>>> I'm getting a lot of these:
>>>>
>>>> XX.XX.XX.XX - - [25/Oct/2010:16:37:44 +0000] "GET
>>>> http://www.google.com/gwt/x? HTTP/1.1" 200 2232 "-"
>>>> "SAMSUNG-SGH-E250/1.0 Profile/MIDP-2.0 Configuration/CLDC-1.1
>>>> UP.Browser/6.2.3.3.c.1.101 (GUI) MMP/2.0 (compatible;
>>>> Googlebot-Mobile/2.1; +http://www.google.com/bot.html)"
>>>> TCP_MISS:DIRECT
>>>
>>> Of course. you may...
>>>
>>> http://www.squid-cache.org/Doc/config/cache
>>>
>>> Although you need to be aware that preventing one object caching
> operates
>>> by removing records to it after the transaction has finished. The
> effect
>>> of
>>> doing this which you can expect is that a visit by GoogleBot will empty
>>> your cache of most content.
>>>
>>> Amos
>>>
>>
>> I'm not sure what do you mean by that, it seems like I dont know how
>> but my SQUID gets hit by different bots and I was thinking to somehow
>> disallow access to them, so they dont hit me as hard... maybe it's a
>> stupid way of dealing with things...
>
> Ah. Not caching will make the impact worse. One of the things Squid offers
> is reduced web server impact from visitors. Squid is front-line software.
>
> * Start with creating a robots.txt. The major bots will obey that and you
> can restrict where they go and sometimes how often.
>
> * allowing caching of dynamic pages where possible with squid-2.6 and
> later (http://wiki.squid-cache.org/ConfigExamples/DynamicContent). Squid
> will handle the bots and normal visitors faster if it has cached content to
> serve out immediately instead of waiting.
>
> * check your squid.conf for performance killers (regex, external
> helpers), reduce the number of requests reaching those ACL tests as much as
> possible. Squid routinely handles thousands of concurrent connections for
> ISP so a visit by several bots at once should not really be any visible
> load.
>
>
> Amos
>
I'm a little confused... what is robots.txt has to do with squid?
where exactly should I place this robots.txt ?
-- http://alexus.org/Received on Wed Oct 27 2010 - 17:40:03 MDT
This archive was generated by hypermail 2.2.0 : Thu Oct 28 2010 - 12:00:04 MDT