[squid-users] access.log redundancies and page cost from harrylucs@dont-contact.us on 2003-09-02 (squid-users)

From: <harrylucs@dont-contact.us>
Date: Wed, 03 Sep 2003 14:09:25 +1000

('binary' encoding is not supported, stored as-is) Hi to the Squid group,

First, thanks Henrik for your answers on cache log redundancies.

In regards to proxies storing relatively the same information in the access.log
files in HTTP request path, say:

forward request path:
alpha -> beta -> theta -> gamma (HIT)

return of HTTP object
alpha <- beta <- theta <- gamma

Without yet a handle on the access.log files for this particular request
situation since I do not have a network of proxies running , I have thought of a
way that the path of a HTTP request can be traced.

First, I just want to give a little background on why I am doing this. I am
writing code to determine the page cost for a organisations network that is
heavily dependant on Squid proxies. Page cost is the dollar figure associated with
an individual HTTP request so management can get an idea of how much money
an effective network of Squid proxies can save them.

I have used a graph data structure that models the organisations proxy network.
If anyone is interested in the document, I can provide it. The nodes in the graph
represent the proxies, with weights representing the WAN lines that are leased
for some cost.

Now this is where the access.log files come in to enable me to track the path of
HTTP requests through the proxy network.

After Henrik told me that the access.log files do contain redundancies but do
provide enough information to get a path from the proxy that receives the first
HTTP request to the final proxy that either serves the request, or goes direct to
the origin server, I thought about the following algorithm.

I have been thinking that the tracing of the path through the network of Squid
proxies can be done by repeatedly (recursively) calling a function which will
keep going to the next upstream proxy until one of the following base cases are
reached, at which point the algorithm terminates:

(1) the proxy recorded a HIT (it has a copy of the http object stored)]

(2) the proxy recorded a MISS (it does not have a copy of the http object stored)

For cases (1) and (2) we retrieve the number of bytes that are returned from the
final proxy, whether a HIT or a MISS - which would imply a direct connection to
an origin server - and then calculate the number of bytes against the weight of
each edge (WAN connections) of the proxy network that the bytes were
transported over.

A little extra for case (2) is that we also calculate an external WAN cost to the
origin server and then an ISP charge per byte.

I hope my deductions make sense,
Phillip Lucs
Received on Tue Sep 02 2003 - 22:09:32 MDT

This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 17:19:30 MST