Re: Res: [squid-users] squid 3.2.0.5 smp scaling issues from Amos Jeffries on 2011-04-25 (squid-dev)

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 26 Apr 2011 13:28:32 +1200

On 26/04/11 12:14, david_at_lang.hm wrote:
> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>
>> On 04/25/2011 05:31 PM, david_at_lang.hm wrote:
>>> On Mon, 25 Apr 2011, david_at_lang.hm wrote:
>>>> On Mon, 25 Apr 2011, Alex Rousskov wrote:
>>>>> On 04/14/2011 09:06 PM, david_at_lang.hm wrote:
>>>>>
>>>>>> In addition, there seems to be some sort of locking betwen the
>>>>>> multiple
>>>>>> worker processes in 3.2 when checking the ACLs
>>>>>
>>>>> There are pretty much no locks in the current official SMP code. This
>>>>> will change as we start adding shared caches in a week or so, but even
>>>>> then the ACLs will remain lock-free. There could be some internal
>>>>> locking in the 3rd-party libraries used by ACLs (regex and such),
>>>>> but I
>>>>> do not know much about them.
>>>>
>>>> what are the 3rd party libraries that I would be using?
>>
>> See "ldd squid". Here is a sample based on a randomly picked Squid:
>>
>> libnsl, libresolv, libstdc++, libgcc_s, libm, libc, libz, libepol
>>
>> Please note that I am not saying that any of these have problems in SMP
>> environment. I am only saying that Squid itself does not lock anything
>> runtime so if our suspect is SMP-related locks, they would have to
>> reside elsewhere. The other possibility is that we should suspect
>> something else, of course. IMHO, it is more likely to be something else:
>> after all, Squid does not use threads, where such problems are expected.
>
>
>> BTW, do you see more-or-less even load across CPU cores? If not, you may
>> need a patch that we find useful on older Linux kernels. It is discussed
>> in the "Will similar workers receive similar amount of work?" section of
>> http://wiki.squid-cache.org/Features/SmpScale
>
> the load is pretty even across all workers.
>
> with the problems descripted on that page, I would expect uneven
> utilization at low loads, but at high loads (with the workers busy
> serviceing requests rather than waiting for new connections), I would
> expect the work to even out (and the types of hacks described in that
> section to end up costing performance, but not in a way that would scale
> with the ACL processing load)
>
>>> one thought I had is that this could be locking on name lookups. how
>>> hard would it be to create a quick patch that would bypass the name
>>> lookups entirely and only do the lookups by IP.
>>
>> I did not realize your ACLs use DNS lookups. Squid internal DNS code
>> does not have any runtime SMP locks. However, the presence of DNS
>> lookups increases the number of suspects.
>
> they don't, everything in my test environment is by IP. But I've seen
> other software that still runs everything through name lookups, even if
> what's presented to the software (both in what's requested and in the
> ACLs) is all done by IPs. It's a easy way to bullet-proof the input (if

yes very easy way to bullet proof the input.
getnameinfo(..., NUMERIC_IP) is a blocking call to the OS resolver.

It is done during parsing the config file to determine if a name
lookup is needed to convert to IPs. During ACL testing time the "src"
and "dst" test is guaranteed to be IP vs IP. The test is optimized to
work with that guarantee.

If you are thinking of the domain in URL passed into Squid. That is
passed through getnameinfo() when parsing the request. "dst" will
short-circuit with a fail result if the input domain has not yet been
resolved on "FAST" group access tests.

> it's a name it gets resolved, if it's an IP, the IP comes back as-is,
> and it works for IPv4 and IPv6, no need to have logic that looks at the
> value and tries to figure out if the user intended to type a name or an
> IP). I don't know how squid is working internally (it's a pretty large
> codebase, and I haven't tried to really dive into it) so I don't know if
> squid does this or not.
>
>> A patch you propose does not sound difficult to me, but since I cannot
>> contribute such a patch soon, it is probably better to test with ACLs
>> that do not require any DNS lookups instead.
>>
>>
>>> if that regains the speed and/or scalability it would point fingers
>>> fairly conclusively at the DNS components.
>>>
>>> this is the only think that I can think of that should be shared between
>>> multiple workers processing ACLs
>>
>> but it is _not_ currently shared from Squid point of view.
>
> Ok, I was assuming from the description of things that there would be
> one DNS process that all the workers would be accessing. from the way

There is a DNS internal component inside each worker with separate
caches etc.
They *do* share the network resolver process though, so traffic load
in the pipes and resolver capacity come into affect same as they do for
any other software doing lookups via the network.

If you are using the older obsolete "dnsserver" process, there is one of
those run dedicated for each worker. These share the OS resolver library
though, which has some severe speed limits and *is* likely to be
blocking and locking.

NP: getnameinfo() getaddrinfo() use the OS resolver library for their
numeric/domain tests and probably will be hitting these library limits
*if* you are passing raw-IP URLs into your test Squid (ie
"http://127.0.0.1/")

> it's described in the documentation it sounds as if it's already a
> separate process, so I was thinking that it was possible that if each
> ACL IP address is being put through a single DNS process, I could be
> running into contention on that process (and having to do name lookups
> for both IPv6 and then falling back to IPv4 would explain the severe
> performance hit far more than the difference between IPs being 128 bit
> values instead of 32 bit values)

The fallback behaviour is one of the known performance brakes. It should
still be linear with ACL unless you have configured in a away which
requires frequent or many DNS lookups for ACL testing. This is kind of
hard to achieve with the ACL optimizations in place, but still possible.

Amos

-- 
Please be using
   Current Stable Squid 2.7.STABLE9 or 3.1.12
   Beta testers wanted for 3.2.0.7 and 3.1.12.1

Received on Tue Apr 26 2011 - 01:28:38 MDT

This archive was generated by hypermail 2.2.0 : Tue Apr 26 2011 - 12:00:05 MDT