I'd definitely recommend doing the compression HTTP/1.1 style, using
Content-coding. (excerpt from draft dropped in below, for those unfamiliar with
it). If the cache looks at Accept-Encoding then we're pretty much laughing all
the way home. We implement any Content-coding we like for Squid (say LZO..I
have no personal experience with that one, but I'm taking it on faith), and if
the client says it accepts LZO Content-coding, then fine, pass through
compressed and unaltered. Otherwise send the object uncompressed (or perhaps
later, we can arrange conversion to other formats).
Whether you compress the object in storage is a separate issue, IMO.
D
-- Excerpt from draft-ietf-http-v11-spec-rev-01 follows --
3.5 Content Codings
Content coding values indicate an encoding transformation that has been
or can be applied to an entity. Content codings are primarily used to
allow a document to be compressed or otherwise usefully transformed
without losing the identity of its underlying media type and without
loss of information. Frequently, the entity is stored in coded form,
transmitted directly, and only decoded by the recipient.
content-coding = token
All content-coding values are case-insensitive. HTTP/1.1 uses content-
coding values in the Accept-Encoding (section 14.3) and Content-Encoding
(section 14.12) header fields. Although the value describes the content-
coding, what is more important is that it indicates what decoding
mechanism will be required to remove the encoding.
The Internet Assigned Numbers Authority (IANA) acts as a registry for
content-coding value tokens. Initially, the registry contains the
following tokens:
gzip An encoding format produced by the file compression program "gzip"
(GNU zip) as described in RFC 1952 [25]. This format is a Lempel-
Ziv coding (LZ77) with a 32 bit CRC.
compress
The encoding format produced by the common UNIX file compression
program "compress". This format is an adaptive Lempel-Ziv-Welch
coding (LZW).
Fielding, et al [Page 22]
INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997
Note: Use of program names for the identification of encoding
formats is not desirable and should be discouraged for future
encodings. Their use here is representative of historical practice,
not good design. For compatibility with previous implementations of
HTTP, applications should consider "x-gzip" and "x-compress" to be
equivalent to "gzip" and "compress" respectively.
deflate The "zlib" format defined in RFC 1950 [31] in combination with
the "deflate" compression mechanism described in RFC 1951 [29].
identity
The default (identity) encoding; the use of no transformation
whatsoever. This content-coding is used only in the Accept-Encoding
header, and SHOULD NOT be used in Content-Encoding header.
New content-coding value tokens should be registered; to allow
interoperability between clients and servers, specifications of the
content coding algorithms needed to implement a new value should be
publicly available and adequate for independent implementation, and
conform to the purpose of content coding defined in this section.
3.6 Transfer Codings
Transfer coding values are used to indicate an encoding transformation
that has been, can be, or may need to be applied to an entity-body in
order to ensure "safe transport" through the network. This differs from
a content coding in that the transfer coding is a property of the
message, not of the original entity.
transfer-coding = "chunked" | transfer-extension
transfer-extension = token *( ";" parameter )
Parameters may be in the form of attribute/value pairs.
parameter = attribute "=" value
attribute = token
value = token | quoted-string
All transfer-coding values are case-insensitive. HTTP/1.1 uses transfer
coding values in the TE header field (section 14.Y) and in the Transfer-
Encoding header field (section 14.40).
Transfer codings are analogous to the Content-Transfer-Encoding values
of MIME [7], which were designed to enable safe transport of binary data
over a 7-bit transport service. However, safe transport has a different
focus for an 8bit-clean transfer protocol. In HTTP, the only unsafe
characteristic of message-bodies is the difficulty in determining the
exact body length (section 7.2.2), or the desire to encrypt data over a
shared transport.
The Internet Assigned Numbers Authority (IANA) acts as a registry for
transfer-coding value tokens. Initially, the registry contains the
Fielding, et al [Page 23]
INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997
following tokens: "chunked" (section 3.6.1), "identity" (section 3.6.2),
"gzip" (section 3.5), "compress" (section 3.5), and "deflate" (section
3.5).
New transfer-coding value tokens should be registered in the same way as
new content-coding value tokens (section 3.5).
A server which receives an entity-body with a transfer-coding it does
not understand SHOULD return 501 (Unimplemented), and close the
connection. A server MUST NOT send transfer-codings to an HTTP/1.0
client.
3.6.1 Chunked Transfer Coding
The chunked encoding modifies the body of a message in order to transfer
it as a series of chunks, each with its own size indicator, followed by
an optional trailer containing entity-header fields. This allows
dynamically-produced content to be transferred along with the
information necessary for the recipient to verify that it has received
the full message.
Chunked-Body = *chunk
last-chunk
trailer
CRLF
chunk = chunk-size [ chunk-extension ] CRLF
chunk-data CRLF
chunk-size = 1*HEX
last-chunk = 1*("0") [ chunk-extension ] CRLF
chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
chunk-ext-name = token
chunk-ext-val = token | quoted-string
chunk-data = chunk-size(OCTET)
trailer = *entity-header
The chunk-size field is a string of hex digits indicating the size of
the chunk. The chunked encoding is ended by any chunk whose size is
zero, followed by the trailer, which is terminated by an empty line.
The trailer allows the sender to include additional HTTP header fields
at the end of the message. The Trailer header field can be used to
indicate which header fields are included in a trailer (see section
14.49).
A server using chunked transfer-coding in a response MUST NOT use the
trailer for other header fields than Content-MD5 and Authentication-Info
unless the "chunked" transfer-coding is present in the request as an
accepted transfer-coding in the TE field (section 14.48). The
Fielding, et al [Page 24]
INTERNET-DRAFT HTTP/1.1 Friday, November 21, 1997
Authentication-Info header is defined by RFC 2069 [32] or its successor
[43].
An example process for decoding a Chunked-Body is presented in appendix
19.4.6.
All HTTP/1.1 applications MUST be able to receive and decode the
"chunked" transfer coding, and MUST ignore chunk-extension extensions
they do not understand.
-- Note to evil sorcerers and mad scientists: don't ever, ever summon powerful demons or rip holes in the fabric of space and time. It's never a good idea. ICQ UIN: 3225440Received on Mon Nov 24 1997 - 18:25:21 MST
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:37:43 MST