Re: [PATCH] Comm::TcpReceiver - part 1, 2, and 3 from Alex Rousskov on 2014-02-07 (squid-dev)

From: Alex Rousskov <rousskov_at_measurement-factory.com>
Date: Fri, 07 Feb 2014 17:54:37 -0700

On 02/05/2014 07:55 AM, Amos Jeffries wrote:
> Here we have the combined patch for what I have been calling part 1 -
> Comm::TcpReceiver interface, part 2 - ConnStateData object API upgrade,
> and part 3 - HttpStateData object API upgrade.

I am puzzled about the status of this work. There seems to be unfinished
changes in some critical paths of the code such as
HttpStateData::mayNeedToReadMore() but you are posting this as a PATCH,
implying it is ready for commit from your point of view. Could you
please clarify whether you consider these changes ready for trunk commit
and use?

> * Kept the TcpReceiver name for now since Alex is disagreeing with
> callign the interface an "agent". I am thinking of calling it just
> "Comm::Tcp" in its current form.

Once we agree on a description for the new class, we should be able to
name it. Until then, let's keep the wrong name to minimize renaming
overheads. I do not recall objecting to Agent. Agent is the right name
if that is what you are writing (it is difficult to be sure based on the
current code but I will discuss this below).

> * The clear thing to keep in mind is that this TcpReceiver object is
> *just* the interface design for the I/O actions. It needs to be
> inherited by some other object which performs the processing logics on
> the I/O data.

So, currently, you define TcpReceiver as a base for classes that do
[TCP?] I/O. That is a start, although I hope we can be more specific
than that. Something like this might be a step forward:

'''Common base for classes reading (and writing) variable-size protocol
messages from (and to) a single TCP stream. Contains basic I/O code and
complex algorithms for coordinating I/O activity with message receiving
(and sending) states.'''

Please note that I am not saying the above description is correct! That
description still does not address SSL concerns, for example. However,
it may be better than the current nothing because it is difficult to
review a class without a stated scope or intended purpose (wrong or
otherwise).

> + MemBuf inBuf;

The above undocumented TcpReceiver data member may be critical to
defining TcpReceiver and related classes correctly. What does that
buffer contain? Raw TCP bytes? I do not think so. Do we read raw TCP
bytes into that buffer? Not quite. Those bytes did not necessarily got
there from a TCP connection. They may have been through an SSL decoding
layer using ssl_read_method() and Comm I/O API (at least).

That buffer actually contains "final protocol" bytes, for whatever
protocol the kid agent is implementing (HTTP, Gopher, etc.). The whole
TcpReceiver class is not about TCP at all! If that is correct, the class
description can be adjusted to something like:

'''Common base for classes reading (and writing) variable-size protocol
messages from (and to) a single Comm Connection. Contains basic
connection management code and complex algorithms for coordinating I/O
activity with message receiving (and sending) states.'''

Again, I am not saying the above is correct, but I think it reflects the
code you are writing better (than nothing or than the TCP-based
definition I wrote above). And once we agree on what is the correct
definition, we can discuss the class and member names to match it; I am
not focusing on the naming problems right now.

> +bool
> +Comm::TcpReceiver::doneAll() const
> +{
> + return stoppedSending() && stoppedReceiving() && !inBuf.hasContent() && AsyncJob::doneAll();
> +}

If we are done sending and receiving why would having buffered content
(the third guard above) matter? What can we do with that content (in the
scope of this class) that does not involve sending or receiving?

> +void
> +Comm::TcpReceiver::releaseConnection(const char *reason)
> +{
> + // Used by clients to release the connection before
> + // storing it in a Pconn pool for reuse.
> + comm_remove_close_handler(tcp->fd, closed_);
> + closed_->cancel(reason);

s/clients/callers/ or s/clients/kids/ to avoid the impression that this
code is specific to Client agents. It is not.

"release" is kind of undefined in this context. I think you are talking
about the need to stop monitoring connection for I/O. Consider
rephrasing the comment to be more precise.

> +void
> +Comm::TcpReceiver::releaseConnection(const char *reason)
> +{
...
> + // XXX: remove half-closed handler ??
> +}

If the method is used exclusively to prepare a connection for entering
a PconnPool (or similar actions), then the connection Must() not be
half-closed (and the method should be renamed?). Otherwise, yes, we
should remove the half-closed monitor to be in sync with the method
definition.

> +void
> +Comm::TcpReceiver::stopReadingXXX()
> +{
> + /* NP: This is a hack needed to allow TunnelStateData
> + * to take control of a socket despite any scheduled read.
> + */
> + if (reading()) {
> + comm_read_cancel(tcp->fd, reader_);
> + reader_ = NULL;
> + }
> +}

I recommend removing the above comment. We already marked the method
with XXX and documented its dangers. The above comment does not add much
value, will get stale, and raises questions like "Why is this method
here and not in the one specific caller it was designed for?". If you
remove the comment, the reader can still search for stopReadingXXX()
callers. Callers should document why they are calling an *XXX() method.

> + // useless to read() after aborting read()
> + if (stoppedReceiving())
> + return;

It is not "useless", it is wrong. And stoppedReceiving() does not mean
we aborted reading. We could just stopped reading after EOF. Consider:

// do not start now if we have already stopped

or something like that.

> + /// Inject a fake request as if the client had sent it.
> + // httpsSslBumpAccessCheckDone() should be doing explicit state setup inste
> ad of parsing.
> + bool injectFakeRequestXXX(MemBuf &fake);

As implemented, the method is not really specific to clients, requests,
fake messages, or even messages. It just prepends some protocol bytes.
Technically, there is nothing XXX about the method either! If you keep
it, consider naming it inject(). If there is something XXX about it that
I missed, let's use injectXXX().

I think we should document that the message is injected before existing
buffered content, if any.

Please move caller-specific explanation of the XXX to the caller where
it is less likely to get out of sync.

> +bool
> +Comm::TcpReceiver::injectFakeRequestXXX(MemBuf &fake)
> +{
> + if (inBuf.hasContent())
> + fake.append(inBuf.content(), inBuf.contentSize());
> +
> + inBuf.reset();
> + inBuf.append(fake.content(), fake.contentSize());
> +
> + return processReadBuffer(inBuf);
> +}

Let's not combine injection with processing. Let the caller call
processReadBuffer() if/when they are ready for that. It would make the
method more powerful but simpler. And you would not have to fix its
documentation to describe the return value, etc.

Another layer of problems with this method are two similar asserts: One
when the "fake" buffer is too small and the other one is when the inBuf
is too small to accommodate the combined content. I will focus on the
former.

We should not assume that the fake MemBuf can accommodate the entire
inBuf. MemBufs have limited capacity and it is too much to ask the
caller to give us the right buffer. If you disagree, please adjust the
method description to explain the requirements on the buffer parameter.
If you agree, make the buffer parameter constant and use a temporary
buffer (easy) or some kind of new MemBuf method with memmove() to inject
content.

The code should not assert (because I might be able to trigger such an
assert by stuffing the input buffer with a lot of content). A throwing
Must() would be OK in this hypothetical case IMO because it will just
terminate a single transaction.

Overall, I am not sure this method is needed. The caller can do the
manipulations on its own, right? It may be easier for the caller to add
the required capacity checks than implement this general code correctly.

> +Comm::TcpReceiver::TcpReceiver() :
> + AsyncJob("Comm::TcpReceiver"),
> + tcp(),
> + stoppedReceiving_(NULL),
> + stoppedSending_(NULL),
> + closed_(),
> + reader_()
> +{}

If you want to explicitly list data members with default constructors,
please list all of them, including inBuf.

> + /// Attempt to read some data.
> + /// Will call processReadBuffer() when there is data to process.
> + void readSomeData();

Let's explicitly say that the method "May do nothing." It may be
important for the caller not to rely on more data coming in after
calling this method.

> + if (!maybeMakeSpaceAvailable()) {
> + stopReceiving("full read buffer - but processing does not free any space");
> + // fall through to setup the half-closed monitoring
> + } else {
> + // schedule another read() - unless aborted by maybeMakeSpaceAvailable() failure
> + readSomeData();
> + return; // wait for the results of this attempt.
> + }

The "unless aborted by..." part is wrong -- we have already ensured that
there _is_ space available above. Otherwise, the "wait for the results
of this attempt" would become a bug -- the results would never come if
the buffer is full.

That is all I had time for today, sorry. More to come.

Thank you,

Alex.
Received on Sat Feb 08 2014 - 00:54:44 MST

This archive was generated by hypermail 2.2.0 : Sat Feb 08 2014 - 12:00:11 MST