Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues

From: <david_at_lang.hm>
Date: Fri, 22 Apr 2011 11:10:44 -0700 (PDT)

ping, I haven't seen a response to this additional information that I sent
out last week.

squid 3.1 and 3.2 are a significant regression in performance from squid
2.7 or 3.0

David Lang

On Thu, 14 Apr 2011, david_at_lang.hm wrote:

> Subject: Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues
>
> Ok, I finally got a chance to test 2.7STABLE9
>
> it performs about the same as squid 3.0, possibly a little better.
>
> with my somewhat stripped down config (smaller regex patterns, replacing CIDR
> blocks and names that would need to be looked up in /etc/hosts with
> individual IP addresses)
>
> 2.7 gives ~4800 requests/sec
> 3.0 gives ~4600 requests/sec
> 3.2.0.6 with 1 worker gives ~1300 requests/sec
> 3.2.0.6 with 5 workers gives ~2800 requests/sec
>
> the numbers for 3.0 are slightly better than what I was getting with the full
> ruleset, but the numbers for 3.2.0.6 are pretty much exactly what I got from
> the last round of tests (with either the full or simplified ruleset)
>
> so 3.1 and 3.2 are a very significant regression from 2.7 or 3.0, and the
> ability to use multiple worker processes in 3.2 doesn't make up for this.
>
> the time taken seems to almost all be in the ACL avaluation as eliminating
> all the ACLs takes 1 worker with 3.2 up to 4200 requests/sec.
>
> one theory is that even though I have IPv6 disabled on this build, the added
> space and more expensive checks needed to compare IPv6 addresses instead of
> IPv4 addresses accounts for the single worker drop of ~66%. that seems rather
> expensive, even though there are 293 http_access lines (and one of them uses
> external file contents in it's acls, so it's a total of ~2400
> source/destination pairs, however due to the ability to shortcut the
> comparison the number of tests that need to be done should be <400)
>
>
>
> In addition, there seems to be some sort of locking betwen the multiple
> worker processes in 3.2 when checking the ACLs as the test with almost no
> ACLs scales close to 100% per worker while with the ACLs it scales much more
> slowly, and above 4-5 workers actually drops off dramatically (to the point
> where with 8 workers the throughput is down to about what you get with 1-2
> workers) I don't see any conceptual reason why the ACL checks of the
> different worker threads should impact each other in any way, let alone in a
> way that limits scalability to ~4 workers before adding more workers is a net
> loss.
>
> David Lang
>
>
>> On Wed, 13 Apr 2011, Marcos wrote:
>>
>>> Hi David,
>>>
>>> could you run and publish your benchmark with squid 2.7 ???
>>> i'd like to know if is there any regression between 2.7 and 3.x series.
>>>
>>> thanks.
>>>
>>> Marcos
>>>
>>>
>>> ----- Mensagem original ----
>>> De: "david_at_lang.hm" <david_at_lang.hm>
>>> Para: Amos Jeffries <squid3_at_treenet.co.nz>
>>> Cc: squid-users_at_squid-cache.org; squid-dev_at_squid-cache.org
>>> Enviadas: S?bado, 9 de Abril de 2011 12:56:12
>>> Assunto: Re: [squid-users] squid 3.2.0.5 smp scaling issues
>>>
>>> On Sat, 9 Apr 2011, Amos Jeffries wrote:
>>>
>>>> On 09/04/11 14:27, david_at_lang.hm wrote:
>>>>> A couple more things about the ACLs used in my test
>>>>>
>>>>> all of them are allow ACLs (no deny rules to worry about precidence of)
>>>>> except for a deny-all at the bottom
>>>>>
>>>>> the ACL line that permits the test source to the test destination has
>>>>> zero overlap with the rest of the rules
>>>>>
>>>>> every rule has an IP based restriction (even the ones with url_regex are
>>>>> source -> URL regex)
>>>>>
>>>>> I moved the ACL that allows my test from the bottom of the ruleset to
>>>>> the top and the resulting performance numbers were up as if the other
>>>>> ACLs didn't exist. As such it is very clear that 3.2 is evaluating every
>>>>> rule.
>>>>>
>>>>> I changed one of the url_regex rules to just match one line rather than
>>>>> a file containing 307 lines to see if that made a difference, and it
>>>>> made no significant difference. So this indicates to me that it's not
>>>>> having to fully evaluate every rule (it's able to skip doing the regex
>>>>> if the IP match doesn't work)
>>>>>
>>>>> I then changed all the acl lines that used hostnames to have IP
>>>>> addresses in them, and this also made no significant difference
>>>>>
>>>>> I then changed all subnet matches to single IP address (just nuked /##
>>>>> throughout the config file) and this also made no significant
>>>>> difference.
>>>>>
>>>>
>>>> Squid has always worked this way. It will *test* every rule from the top
>>>> down to the one that matches. Also testing each line left-to-right until
>>>> one fails or the whole line matches.
>>>>
>>>>>
>>>>> so why are the address matches so expensive
>>>>>
>>>>
>>>> 3.0 and older IP address is a 32-bit comparison.
>>>> 3.1 and newer IP address is a 128-bit comparison with memcmp().
>>>>
>>>> If something like a word-wise comparison can be implemented faster than
>>>> memcmp() we would welcome it.
>>>
>>> I wonder if there should be a different version that's used when IPv6 is
>>> disabled. this is a pretty large hit.
>>>
>>> if the data is aligned properly, on a 64 bit system this should still only
>>> be 2 compares. do you do any alignment on the data now?
>>>
>>>>> and as noted in the e-mail below, why do these checks not scale nicely
>>>>> with the number of worker processes? If they did, the fact that one 3.2
>>>>> process is about 1/3 the speed of a 3.0 process in checking the acls
>>>>> wouldn't matter nearly as much when it's so easy to get an 8+ core
>>>>> system.
>>>>>
>>>>
>>>> There you have the unknown.
>>>
>>> I think this is a fairly critical thing to figure out.
>
Received on Fri Apr 22 2011 - 18:10:48 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 26 2011 - 12:00:03 MDT