Re: [PATCH] new request-line parser and unit tests

From: Amos Jeffries <squid3_at_treenet.co.nz>
Date: Tue, 31 Aug 2010 22:33:42 +0000

On Tue, 31 Aug 2010 09:47:18 -0600, Alex Rousskov
<rousskov_at_measurement-factory.com> wrote:
> On 08/31/2010 09:20 AM, Amos Jeffries wrote:
>> Split into two patches.
>>
>> The parser_unittests.patch adds tests for all the request line
>> permutations of content and garbage I could think of over the last few
>> days with their RFC 1945 and RFC 2616 expected outcomes. On the cases
>> which were not mentioned, ie binary crap, I've erred to leniency
>> accepting as 'unknown' content in the field with relevant cases assumed
>> to be HTTP/0.9.
>>
>>
>> The request-line parser existing in squid failed on several types of
>> test, mostly the ones centered around missing or invalid URIs.
>>
>> So, parser_changes.patch contains an upgrade/fix to
>> HttpParserParseReqLine() handling HTTP request-lines as specific by
>> section 5.1 of the above RFCs. Specifically to handle a sequence of
>> unknown bytes up to a terminating LF (\n) octet. Nothing outside that
is
>> relevant to this parser.
>>
>> Changes:
>> * The semantics as previously documented are taken on. No changes
there,
>> but documentation clarified a bit. Some things previously not erroring
>> are now doing so. External code impact is in the nature of reduced
>> special cases to be handled. Specifically raw-CR weirdness in the
>> request line fields. This occuring in URL was a vulnerability at least
>> once.
>>
>> * Prior updates to HttpParser object for other parse stages opens the
>> possibility of this parse action returning HTTP status code directly.
>> Additions are done to make use of this (with the existing status codes
>> only).
>>
>> * Input permutations where the unit-tests showed the old parser was
>> violating its own documentation have been fixed to produce expected
>> outputs.
>>
>> * Old parser operated three distinct potentially long parse loops.
Added
>> several local variables to remember various octets seen while searching
>> for the terminal LF. This removed the need for two of the parse
re-scans
>> (length of method, length of URI).
>
>
> * "// for now this hack retains that behaviour" code is actually
> disabled with "#if 0 && ". Please adjust the guard or the comment.

Okay comment changing.

>
> * CR+CRLF should be allowed because those CRs are a part of the final
> CRLF sequence. IIRC, RFC 2616 recommends allowing CR*LF whenever CRLF is

> required.

They are allowed *if* followed by a LF which marks the line terminal.

There was no mention I could see anywhere about sequences of CR*. LWS
sequences for whitespace wrapping which permits middle-line CR only applies
to headers which are outside this line.

These two and several other protocols using this type of request line do
explicitly state "No CR or LF is allowed except in the final CRLF
sequence".

The noted exception that bare-CR are permitted in the HTTP mime-headers
section implies that CR*CRLF would be possible and valid there. However,
that section and its quirks are not relevant to this parser.

>
> * The old "This is the only spot in this function where (0) is valid
> result" comment is misleading because it is not clear what "valid
> result" is and zero is always a non-error result. Consider rephrasing to

> something like "This is the only spot in this function where we request
> more data" if that is what the comment wanted to express. Or just delete

> the comment.

Um, okay I'll try to be more clear there.

>
> * Since you are rewriting the whole thing, please get rid of gotos.
> Besides making the code less safe, they prevent variable localization
> and are probably preventing some compiler optimizations. One simple
> solution would be to just return from the function and let a simple
> wrapper do PROF_*() and debugging.

Okay. I was borderline on doing that anyways.

>
> * Once the gotos are gone, please declare locally used variables
locally.

Okay. Will check that.

>
> > * TODO: making this part of HttpParser
>
> Since the current diff is pretty much useless anyway, I would recommend
> doing the above change now. It will help with goto-avoidance as well. I
> agree that you do not have to do it now though.

That involves changes the callers of this code and its API. I'd rather do
that as part of the StringNG change so the API only changes once.

>
> This will be rewritten all over again when Kinkie's string changes are
> completed, right?

The *_end/*_start could become SBuf set from the offsets. The rest will
need a close looking-at. That all depends on intrusive updates to its
callers which I have been trying to avoid here.

NP: This appears to be the fix for bug 3031, if confirmed as such it needs
to apply to all the active Squid-3 versions easily.

>
> If there are no objections, I would like the effects of this change on
> compliance to be tested before the patch is committed. If you want me to

> run the tests, please let me know when you do not intend to make any
> more changes to the patch (because you are done addressing review
> comments or do not consider them valid).

Please do. If there are any violations I would like to add any necessary
unit-tests to enforce compliance.

The only code change I'm happy making yet from your suggestions is the
goto removal. That will not change the behaviour at all (switching gotos
with {debugs(); return}.)

Amos
Received on Tue Aug 31 2010 - 22:33:45 MDT

This archive was generated by hypermail 2.2.0 : Wed Sep 01 2010 - 12:00:05 MDT