----- Original Message -----
From: "Daniel Barron" <squidcache@jadeb.com>
>
> >
> > Seriously though, Robert has done some cool stuff with inline
content
> > modification in his filters branches, and Moez made a patch to that
to
> > modify URLs that worked quite well. So the framework for doing it
in
> > Squid is almost ready for primetime, as far as I know.
>
> Sounds a bit patchy to me. 'take this then patch that then add this
then
> patch in DG'. I'm not trying to be negative, but my aim currently
with DG
> is ease of installation. squid is avaiable as an rpm. DG2 will be
available
> as an rpm. (DG1 currently required nb++ library which would be hard
to
> rpm it).
Certainly. In fact I would not recommend using the filtering code in
production. There are _serious_ problems within squid due to legacy code
that are slowly being worked out. Once those issues go away, content
processing will become a lot more reliable. (The issues tie into content
length changes and|or range requests). That's not to say that it doesn't
work, just that it won't cleanly handle all valid HTTP requests :|.
> The other problem with patching is that major changes to squid source
> /could/ cause incompatabilities. The current design is far neater in
this
> respect. Another thing to remember that currently DG works with any
proxy,
> and some people prefer using opps. I would have to provide a patch
for
> oops as well if I was to keep those people happy.
>
> However I have no real problem with doing so. Time and the necessary
hooks
> are the only things holding me back (or slowing me down).
hooks I can do... but there is a deep rewrite in progress as mentioned
above to resolve some critical issues.
> >
> > I guess it's up to someone to go to the trouble to write the code...
>
> I'll go to the trouble, however, probably the best way would be if
squid
> provided an interface like a redirector, but have it maintain a pool
of
> processes that grown and shrink on demand as the blocking time for a
filter
> would be hugely greater than, say, squidGuard.
The filter hooks allow content processing code to be inserted in-process
with a state structure to handle all the code's persistent variables.
Out-of-process processing of content is going to be much slower than
in-process processing, but it is trivial to write a filter that utilises
the squid helper "library" to manage a pool of external process's.
Blocking I/O is not allowed inprocess, but squid has an async I/O
framework, and filters are allowed to perfom external I/O using that
async framework. (I.E. MYSQL lookups etc).
> This I would be /VERY/ interested in and IMHO is a feature that is
missing
> from squid that would help people write nice filters very easily.
There are example filters that can be used as templates in the CVS
branch that has the filtering code in it. That should help.. :]
> This interface would make squid send the client request and web server
> response to the 'filter redirector' along with other information. The
> filter then would respond to squid with what to do or similar. This
would
> need to be a provision in the main tree for ease of installation by
users.
I used the draft iCAP model - there are four hook points for each
request/response pair - Look in the squid-users/squid-dev archives for
the details, or let me know and I'll pull some doco together and put it
on squid.sourceforge.net. My goal is to get this production-ready (or
near to :) and then ask for HEAD inclusion.
> So, squid authors, whaddya think? Would you like to discuss an
interface
> design?
Sure. I'm not authoritative for the core authors though :]. However I
_think_ I've done the most code in this area, so I'll ask that you have
a _brief_ look at the existing hook style and pick up from there? Don't
look too deep because the interface has changed slightly as I prepare
the code for mainstream readiness.
> >
> > Ronald wrote:
> >
> > > Hi there,
> > >
> > >
> > >
> > > I am looking for content based filtering in Squid. Of course I can
do
> > > this using dansguardian. But my feeling is that I have to pay the
> > > performance for that. Because dansguardian parses http and again
Squid
> > > does the same. Why do not have dansguardian as a patch to Squid.
So that
> > > performance can be slightly improved by parsing http once. Any
views in
> > > this ?
> > >
>
> The performance drop due to parsing the http headers (remember squid
only
> looks at the headers) is minimal. Headers are not long. I would not
expect
> a noticable improvement in speed. Most of the time is spent either
> filtering the content (which a plugin/patch would also have to do) and
> actually shuffling the data around.
>
> As a 'filter redirector' plugin the speed of the non-filtered content
(gifs,
> jpegs etc) would be doubled as it would not need to go from squid to
dg then
> to browser and would go direct. This is to say it would in theory
take
> half as much cpu usage which on my 2Mb link and squid on a P166 never
gets
> over 20% so the limiting factor is the link in this case. So again, a
> speed increase may not be noticed.
A filter-redirector would be a nasty way to do this :]. (See my
in-process vs out-process comments above). However there is a draft for
exactly this sort of process, iCAP, or an alternative CVP, and the same
squid-side framework to allow in-process content processing, allows easy
implementation of iCAP/CVP (that was one of the drivers :]). The nice
thing about iCAP/CVP is that by integrating in a standard-based fashion,
you will still be able to let users use other proxies. iCAP also has
explicit optimisations to allow content processing to be aborted for a
given response, removing the squid-dansguardian-squid loop after the
first few k.
So the options for a squid-integrated dansguardian are:
in-process patch (too time consuming, will need maintenance, harder to
install)
in-process "module" (should be fastest). (modules will have dlopen
functionality at some point - no squid recompile needed).
out-of-process, custom interface. (squid patch or module needed).
out-of-process, iCAP/CVP. (squid that does iCAP/CVP needed)
Rob
Received on Sat Jun 23 2001 - 05:33:49 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:00:50 MST