-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short requests/responses could use more granular exceptions. #23
Comments
Sure, it's hard to object to finer-grained errors :-). What exactly do you want to detect? is it specifically: any time the state machine errors out due to an unexpected event, where that event is of type |
(NB I just also just tweaked that error message slightly in master, because |
@njsmith Something like that, yeah. Basically, I'd like to be able to reliably detect violations in HTTP framing that have to do with the body: whether the body is too short or too long. Halfway through the headers bothers me less right now, though I may come back to it at some point. This is part of my work on urllib3, which has historically thrown specific errors for bodies that are too short/too long. |
Oh, I see, right, that's a little different. How do you detect bodies that are too long? Right now I think h11 will just buffer any "extra" bytes and treat them as part of the next request/response cycle... For too short, it sounds like you specifically want to detect Some possible approaches:
I think this would actually work, though it's a bit awkward. (I guess depending on the structure of your code, the first two conditions might be implicit -- if you only call Thoughts? |
Well, in principle one does that by checking whether or not there are further requests outstanding to which those bytes could belong. However, you raise a good point: the way I've written the urllib3 plugin means that it's basically not possible to detect overlong bodies except by luck: if I feed h11 more bytes than the response has left then h11 can see it, but there's always a chance I'll get it exactly right and h11 won't notice. So let's reduce the requirement on overlong bodies and say just short ones for now. As you describe all the wrinkles with this, it occurs to me that we might be better served trying to handle this at the urllib3 level. If we keep track of how many bytes we've read and how many we're expecting, we can at least correctly identify bodies that are too short. That won't help us with bad chunked bodies yet, but I can burn that bridge when I get there. |
Well, this would be a good excuse, except that h11 in client mode refuses to let you pipeline :-).
Is parsing |
(Though h11 does at least validate that there is only 1 |
Sure, but that just makes it easier: one would assume that any further bytes from the server after EndOfMessage are in violation of the spec until we've sent a new Request.
We don't really have an option here. We don't have to keep track of whether we fed an EOF, but we do have to keep track of whether that EOF was expected or not, and expected in this context is defined almost entirely by the content length. So if h11 can't tell us, that means we have no option but to do it ourselves. In particular, we have historically raised exceptions that say how many bytes we were expecting and how many we got: to correctly extract that data we need either urllib3 or h11 to keep track of that information. I should note that h2 does keep track of this information itself, and errors in both cases. Of course, h2 has an easier time of detecting bodies that are too long, but too short is usually pretty easy. |
The suggestion was that if you have an |
I guess we could put a flag onto protocol errors saying "was this triggered by an unexpected EOF yes/no". Seems like a pretty weird API, but I guess it wouldn't be technically difficult. We can make h11 be helpful, I'm just not sure what helpful looks like. |
@njsmith Right, but "unexpected EOF" is not the same as "short body". This is what I'm getting at here: urllib3 can detect "unexpected EOF", but it cannot detect why the EOF was unexpected without keeping track of content lengths. And once it starts doing that, we may as well just avoid doing the work in h11 and instead do it in urllib3. |
"unexpected EOF while reading body" is the same as "short body", isn't is? Am I missing something? |
(btw, I'm assuming but maybe should confirm: the issue here is that you want to treat connection drop in the middle of a chunked response as different from a parse error inside the chunked response -- e.g. a chunk with length |
Not necessarily, it may be "malformed chunked body", which we'd like to report differently. Complaining about "expected body of length n, received body of length m" makes little sense when we're working with chunked bodies. Essentially all I'm trying to ascertain is whether it is easier for h11 to raise exceptions that distinguish these two failure modes than it is to just say that callers should extract the relevant information from h11 if they care. Either is a reasonable answer, but there is no getting around the idea that the caller needs to either be told by h11 or keep track of appropriate state itself to understand why. |
And the reason I've been suggesting h11 do this is only that it is presumably already tracking this information, so it's just a matter of surfacing it. |
Ohhhh, I thought you wanted a truncated chunked response to count as "short body" |
@njsmith Well, if you wanted to you could argue that there are really two cases there. The first is malformed chunks (that is, any parsing logic that doesn't end up in the EXPECTING_NEXT_CHUNK state, such as a truncated chunk header or chunk body), and the second is a short chunk encoding (EOF after a complete chunk, no However, that level of granularity isn't necessarily worth surfacing. But urllib3 has historically treated those two failure modes differently. |
So this is another of those "seemed like a good idea at the time" things then, right? I can't actually think of any good reason why a client would want to distinguish between "I got only a partial response because the connection dropped unexpectedly, and I know how long the body would have been if I'd gotten all of it" versus "I got only a partial response because the connection dropped unexpectedly, and I don't know how long the body would have been if I'd gotten all of it"? I thought you wanted to distinguish between "protocol stream is malformed due to EOF" and "protocol stream is malformed due to actual bytes", e.g. a chunk header that contains non-hex characters. |
@njsmith This is a purely error-reporting concern: giving users enough information to understand what's happening in their logs. Generally speaking it's nicer to be granular even if there is very little automated action that can be taken on the distinction, if only so that a user can contact the server-author and go "why the hell are you reporting content length wrong?" |
If it's precisely and exactly that you want to detect the case of a length-delimited message body that gets an unexpected EOF, then h11 does have that information, yeah. You could add a |
Yeah, so in an ideal world h11 would also tell me how much it read and how much it expected to read, where I presume again that h11 is keeping track of both of these things. Are you amenable to having that change made? |
Technically right now it's only keeping track of how many bytes it thinks are still outstanding, but that's trivial to change. I definitely don't have any objection to adding a I hesitate a little about creating a "computer-readable" API for this (e.g. a special One non-issue is the thing I raised about I originally thought the motivation here was the desire to distinguish between connection drops -- which can happen because of bad network weather, or a server VM crash, or whatever, and so a client might want to be somewhat tolerant and retry -- versus a protocol error due to the server doing something actively illegal, where the only solution is to open twitter and send confused emoji faces at the server author. That still seems like it might be a useful distinction to make? Would adding this weirdly-specific exception class interfere with more sensible exception classes in the future? Also, now it sounds like you're saying the goal is just to have a nice human-readable message in the logs, for which a vanilla Okay, now I'll go look at the code to try and understand better what exactly the commitments here are... Hmm, it looks like the whole reason Also, it looks like right now Ok, moving on to urllib3. Oh phew. At first I thought the silently-allowing-truncated-bodies thing was actually still there in urllib3, but after a I think at this point I have like some legally mandated requirement to inform you that my professional recommendation is that you set httplib on fire and never use it again. Also does this mean you're going to want an Anyway, regarding So AFAICT the actual current rule is that urllib3 will raise Hey, it sounds like maybe h11 can emulate that API pretty well already? :-) |
You and I are in agreement here, which is why we're having this discussion. This discussion is a precursor to doing exactly that. In fact, let me tell you now that I have a branch of urllib3 that shims h11 into an interface that looks a lot like httplib that is currently passing about 300 urllib3 tests. The goal here is to get to a place where we can drop h11 in and remove httplib without too many people noticing, and that means trying to reduce the churn in the exceptions we emit.
No. It's only set that way for backward compatibility purposes, and adding h11 would be the definition of a backward-incompatible change (there's no way we totally hide that change from our users), so we can change that behaviour too.
Yes, that's true, but the metadata that may be wrong almost always comes out of httplib. Now that we're assuming httplib's job, I'd like to do better than it. Getting this answer correct is entirely do-able: it is possible to distinguish malformed chunk bodies from truncated response bodies, and in the case of the latter to entirely explain how short it came out. However, you may be right, a better approach may be to simply refuse to give that information and treat them all as "TruncatedResponse" and call it a day. I suspect we'll find that annoying from a debugging perspective: in particular, for HTTPS it becomes basically impossible to introspect what is wrong with the body without having pretty in-depth knowledge of the internals of h11 and urllib3 (you need to be able to see what code paths are being executed). For that reason if nothing else it would be nicer to provide some kind of useful information that is human-readable. We don't need it to be machine-readable, that's true, though in this case I tend to err on the side of making it available to machines once we're playing with this stuff. |
Intended as at least a first step towards gh-23.
When either the server or the client sends a HTTP message with a body that is short (that is, the body does not contain enough bytes to meet the
Content-Length
header, or the chunked encoding is aborted without the ending), an exception like this is raised:This message, while understandable, is a little hard to introspect. It would be nice if we could have a subclass of the
ProtocolError
that indicated exactly the kind of error, so that applications that would like to report these errors in a more granular way could do so.The text was updated successfully, but these errors were encountered: