This message may be send more than once for ircache mail system have
Anti-SPAM feature and block mails from some smtp server. So I had to
use several smtp server to send and I am still not sure whether it
reached.
If it's disturb you, I feel sorry! but it's not my fault. I also
suggest the mail list administrator add a auto replay feature in mail
list system.
Hello Squid,
I have noticed that in this year bake-off the squid is slowly than
other product. Although this compare must be unfair. But Wessels
said that squid can do better if there are no bottleneck of the
filesystem. Wessels also said they are working on a new filesystem
but still remaining compatible with the Unix filesystem. I don't
think it's a good idea. The question is who need it? I mean a new
filesystem compatible with UNIX filesystem like VFS.
As we know, a cache object storage does not need even filename or
URL to access the cached object, does not need users,groups and ACL
control, don't use file last modify date(only use cache object last
modified date), and does not need about everything except file size
in other filesystem. So why you want implement a storage system
compatible with UNIX filesystem or any other filesystem? What the
benefit it is.
In my opinion, I think the best way is to create a Cache Object
Storage System or called "Hash File System" directly on device file.
(Although it's still a filesystem, but not similar to any existing
filesystem)
A cache storage system behavior is very different with normal
filesystem. First, it did not need a file name or URL to open a
cached object. Currently you use MD5 hash key (although I think so
complex method is not necessary). So you don't need normal directory
structure or directory tree. Just seek the directory item position
by the hash key. Secondary, before you save the cache object to
disk, most time you know the file size already, so you can allocate
the disk block as continuous as possible. There will be less pieces
in disk. You don't need a i-node for files, just use a pointer to
data and a flag (indicate whether store in one piece), if it stores
in several pieces, add a node table just before the file (cache
object) to point rest pieces. If this table size is not enough, then
use a chain. A separate i-node is not necessary for you never
randomize access the cache file and usually only read from begin.
In my imagination, a Cache Storage System should have a head area, a
bit FAT area, and a data area. The directories numbers must be 2^x
type like 1024,2048,4096... so you can determine open which easily
by key, and directories is average distribute in entire data area.
Directory items is indexed by hash key, and contains meta data of
cache object.(I wish you can reduce or compress meda data size).
Each file in directories will allocate the disk blocks as continuous
as possible unless it's too big (So continuous is not important) and
near the directories contain it as possible like in ext2fs. Each
disk block may be 512 bytes to 16K, I think 1k,2K or 4K is better
now. Each directory must be large than 4K may be 4K to 32K or
more(until read whole directory take more additional time). The
directory size needn't be 4K*x, but may allocate more than it need
to prevent directory full.
I have calculated the probability of file numbers in directory, it
is in Gauss distribute(may be the mathematics term is wrong for I am
speaking Chinese). If the average items numbers in one directory is
N, then the possibility of the number > N+3*sqrt(N) is 0.00135,
>N+4*sqrt(N) is about 3E-5, >N+5*sqrt(N) is about 3E-7, >N+6*sqrt(N)
is about 1E-9, so if N=100, and you allocate the space for 200, the
possibility exceed the limit is very very low. If it is exceed, you
can move the file just after the directory to another place (divide
the file into pieces if it's large) to allocate more blocks for
directory or simply remove the oldest file in this directory if the
file just after the directory is not in this directory.
Before you create the Cache Storage System, you need estimate how
many objects will be in this system. It's depend the disk (or
partition) size and mean object size. I think use mean object size
7.5K to estimate is conservative. The mean object size is depend on
maximum_object_size and how to use the cache. For example in my
cache the Mean Object Size is only 7.17 KB for it is only allow used
for browse purpose. If the maximum_object_size is limited to 400K,
mean object size may be only 5.5K. Then depend on object size and
directory size decide need how many directories. The directories
size too large will cause sort slowly(?) and read slowly (if not
cache all directories in memory), too small will cause low cache
memory utilization.
Of course you may use more than one disks. Just create Storage
System cross several disks as one disk. But each disk should have a
head,a bit FAT and data area for safe reason. And allocate file
blocks in same disk (with directory) as possible. If you lost one
disk, you may lost cached objects more than one disk, but not be too
much. Then you may need use a utility to redistribute directory and
recalculate the bit FAT. If you add a disk, the step is similar. For
this reason, may be you need a root directory point to all the hash
directory and record it's position (not seek by calculate), items
and blocks. (The bit FAT and root directory should be always
resident in memory), then add and remove disk will be quickly (may
be redistribute the directories online). You can create some utility
to import or export cache object to other filesystem. This work may
be difficult if you have only one disk when convert unless you
borrow a disk or use network for temporary, but a new compatible
filesystem is just same.
When you accept the request from browser, first generate the key,
then calculate the directory it belong to and visit it. Get the
directory items (also meta data) and determine whether to visit
original web server (depend on request URL and 3 dates). The
directory items is sorted by key, if 2 URL has same key, just put it
together, so you didn't need 128 bit for key, I think 8 bytes is
enough. If the key is wrong to match the request URL, what will you
lose? Just the time.(I assume store the URL in file head before
content and verify the URL is correct before give it to customer).
There are also may be some flag in directory items, like negative(no
cache object),partly(downloading),continuous (disk allocate in one
pieces). and flag for heap replacement like: querypage(contain
/cgi-bin/ ?),cgi(.asp,.cgi,.php3,.exe?) etc. (I think these dynamics
page should be replace first)
In this discussion I mean no swap.state file contain all cached
object meta data. And not load all the meta data to memory until you
have enough memory. It works like Novell BorderManager. if someone
don't have enough memory, a request need average 1.5 read(50% hit),
may be slowly 3 times than cache all directories in memory. Of
course you had to mange the cache you self (I guess UNIX does not
cache the device file for you) and the cache of directories will
have more high priority than files for the possibility of reuse
these files is very low.
I think directory item (meta data) 32 bytes is enough. I mean 8
bytes hash key,4 bytes file pointer,4 bytes file size(I don't think
you should cache a object large than 2G, although may be you have
this capacity, but it's not the right way to do things I mean use
cache), content-length may be store in file head because you don't
need it before give the object to browser(It depend on your IMS
policy check or not check content-length) or use one byte store the
file head size (*8 or *16) to calculate content-length, I think the
file head can not be too large. 4 bytes object(not file) last
modified date,4 bytes expires,4 bytes last validation date and 4
byte for flags and etc. if you think 4 byte is not enough to store
date, Since squid is use minutes to calculate refresh policy, I
think compress is easy and will not reduce any performance. I know
you must have your reason to use 74 bytes. But I think store all
meta data in memory really waste memory for some part of it may be
never used. May be you can use some technical like cache digest to
compress it and load in memory.
I think if you build the Storage System this way. You only need read
once from disk if the cached object is valid. So no one can make a
system quickly than you 20%(may be 5%) unless they store object only
in memory. And any people can run squid with a large disk and don't
worry not have enough memory to run it. It will be suitable for low
load cache work together with other services.
Back to begin, the question is who need a compatible filesystem?
First the squid will take no advantage from compatible, it does not
need filename,directory tree,randomize access ability and file grow
ability(which need a i-node). Second who need a SquidFS while
running other programs since there are so may filesystem, I wonder
even a web server will use it. Then the only advantage is that if
the cache is not full, people can store some other files in it. But
those people who most want SquidFS must be in heavily load and I
think their cache must be already full, to other people I worry a
new filesystem may be difficult to install and I think a
non-compatible system can use utility add or reduce it's size in
disk to meet this need. Although it's may be more difficult than in
a compatible filesystem. But running squid never be a easily work
like other GUI program although you have do many work on it.
Although I am a programer(mainly in C++). And I don't know UNIX too
much. But I think every one want his program be the best. What will
you think about? If I am wrong or you think so, please tell me.
Someone is talking about a cyclic filesystem, I wonder if a heavily
load cache mean object life is less than a week, who can take
advantage from it. It's only useful for people who have very very
large disks.
Best regards,
Wang_daqing mailto:wang_daqing@163.net
Received on Wed Feb 02 2000 - 07:07:50 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:12:21 MST