Re: [squid-users] Complicate ACL affect performance?

From: Henrik K <hege_at_hege.li>
Date: Sat, 18 Oct 2008 12:58:12 +0300

On Sat, Oct 18, 2008 at 12:44:46PM +0300, Henrik K wrote:
> On Fri, Oct 17, 2008 at 10:24:21PM +0200, Henrik Nordstrom wrote:
> > On tor, 2008-10-16 at 12:02 +0300, Henrik K wrote:
> >
> > > Optimizing 1000 x "www.foo.bar/<randomstuff>" into a _single_
> > > "www.foobar.com/(r(egex|and(om)?)|fuba[rz])" regex is nowhere near linear.
> > > Even if it's all random servers, there are only ~30 characters from which
> > > branches are created from.
> >
> > Right.
> >
> > Would be interesting to see how 50K dstdomain compares to 50k host
> > patterns merged into a single dstdomain_regex pattern in terms of CPU
> > usage. Probably a little tweaking of Squid is needed to support such
> > large patterns, but that's trivial. (squid.conf parser is limited to
> > 4096 characters per line, including folding)
>
> Not sure what the splay code does in Squid, didn't have time to grab it.
> But a simple test with Perl:
>
> - Grepped some hostnames from wwwlogs etc
> - Regexp::Assemble'd 50000 unique hostnames (= 560kB regex, took 22 sec)
> - Run 100000 hostnames on it in 4 seconds (25000 hosts/sec on 2.8Ghz CPU)
>
> It's pretty powerful stuff.

Oops, did it even slightly wrong.

By doing it correctly, using ^hostname$ instead of plain hostname in regex
results in 1.2 seconds, that's 80000+ hosts/sec..
Received on Sat Oct 18 2008 - 09:58:15 MDT

This archive was generated by hypermail 2.2.0 : Sat Oct 18 2008 - 12:00:03 MDT