Re: [squid-users] Occasional slow connections/timeouts from Amos Jeffries on 2014-02-20 (squid-users)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Fri, 21 Feb 2014 10:51:05 +1300

On 2014-02-21 06:10, Simon Beale wrote:
> I've got a problem at the moment with our general squid proxies where
> occasionally requests take a long time that shouldn't do. (i.e. 5+
> seconds
> or timeout, instead of milliseconds).
>
> This is most common on our proxies doing 100 reqs/sec, but happens
> overnight too when they're running at 10 reqs/sec. I've got this
> happening
> with both v3.4.2 and also with a box I've downgraded back to v3.1.10.
> For
> v3.4.2, it's happening in both multiple worker and single worker modes.
>

What sort of CPU loading do you have at ~100req/sec?
is that at or near your local installations req/sec capacity?

NP:
* slow-down at peak capacity is normal as the proxy is busy servicing
other traffic.

* slow-down at only a few req/sec is normal as Squid spends a lot of
its time in artificial I/O wait delays to prevent reading/writing
individual bytes off the network. Nothing worse for the network than to
have ~71 bytes of packet overhead for every 2 bytes of data transferred.

* slow-down randomly all the time could be network congestion, Window
scaling, ECN or MTU related. even ICMp related (ICMP is *not* optional -
though many admin block it).

* then there is bugs.
- 3.1 had a few IPv6 bugs (some major) which caused TCP retry delays in
certain circumstances. Since you are seeing it only randomly I would
suspect remote network(s) somewhere with those issues being a transit
hop occasionally. Though this is unlikely given 3.4 still shows it.

- There is a fix in the 3.4.3 release regarding connection IP failover
that may help if that is part of the issue (or it may not).

> The test is not reproducible, sadly, but I've got a cronjob running on
> localhost on these boxes testing access times to various URLs covering:
> HTTPS, non-HTTPS static content, using IP not hostname over both HTTP
> and
> HTTPS, and a URL on the same vlan as the proxies. All of these test
> cases
> have it happen occasionally, but not repeatedly/reliably.

Some ideas:
* DNS lookup delays ?
* Random TCP connection setup delays?

>
> Different boxes are either running Trend's IWSVA for it's antivirus as
> a
> cache_peer, or C-ICAP/clamd as an ICAP service. These both have it
> happen
> (as does the case where I disabled the antivirus).

* object size related? ie scanning time in the AV.
>
> The servers are all running CentOS6.4 on HP Gen8 blades with 48G RAM.
>
> Has anyone seen anything like this, or got any suggestions as to what
> might be causing this that I can investigate further?
>
> Simon

Lots of people see it for all sorts of reasons.

Amos
Received on Thu Feb 20 2014 - 21:51:14 MST

This archive was generated by hypermail 2.2.0 : Mon Feb 24 2014 - 12:00:07 MST