> Technicallly it should be possible, but you need to write another
> retreiver spider for the engine knowing how to read the squid cache files
> instead of fetching from the web or indexing local files.
>
> The format of the cache files are described in the programmers guide and
> iirc there is even a perl module in CPAN for reading these files.
That was my next question; i.e. how do I read the cache?
Do you by any chance know the name of the CPAN module?
I looked at CPAN and found the Cache-2.01 module, is this the one?
> The developer list for the preferred search engine is a better place to
> ask I think. There is no modifications required to Squid but the search
> engine needs to be slightly modified to know how to read the Squid cache
> data.
>
> Each file in the cache contains
>
> a) Meta data like the URL of the file, size, time cached etc. Of this the
> search engine needs to use the URL as "name" of the indexed object.
>
> b) The object HTTP headers.
>
> c) The object contents. This is what needs to be indexed.
>
> b+c is the HTTP reply as received by Squid.
When I do a 'file' on a particular cache file, I get back that it is
DBase 3 format, is this correct, or is this just the closest that Linux
can get on determining the type of file? The question really is, how do
I put the cached file back into it's original format, with it's original
title for presentation to the server?
I looked at the 'purge' utility written by Jens-S. Vöckler since it can
decipher the squid cache, but I don't understand how it is working.
For example, I have a cache file:
/usr/local/squid/var/cache/00/09/0000092D
with header information:
^Co
Content-Length: 2173
Content-Type: image/gif
Last-Modified: Sun, 11 Jan 2004 05:20:46 GMT
Accept-Ranges: bytes
ETag: "5db8d2aa2d8c31:627d33"
Server: Microsoft-IIS/6.0
Date: Thu, 22 Jan 2004 03:02:01 GMT
Connection: close
<snip>
and from that, the 'purge' utility returns the URL of:
http://www.whitehouse.org/kids/images/tn-palm.gif
How is the URL deciphered? For the life of me, I can't figure it out.
I read in the Programming Guide that "A cache swap file consists of two
parts: the cache metadata, and the object data."
Could you please point me to the code in squid that will show me how to
get at and decipher the metadata?
I am sorry t be such a bother, but I get totally lost in the squid code,
so pointer to the correct modules to look in will be very much
apprectaited.
Thanks,
Murrah Boswell
Received on Sun Feb 08 2004 - 11:36:39 MST
This archive was generated by hypermail pre-2.1.9 : Mon Mar 01 2004 - 12:00:02 MST